Member-only story

Beginner’s Guide to Databricks File System (DBFS): What It Is and How to Use It

Suraj Jeswara
4 min readNov 27, 2024

--

If you’re new to Databricks, one of the first things you’ll encounter is DBFS, or Databricks File System. It’s the foundation of how files are stored and accessed in Databricks. In this guide, we’ll break it down in simple terms, so you can understand its purpose and how to use it effectively.

Image courtesy : https://www.thetechplatform.com/post/introduction-to-dbfs

What Is DBFS?

Think of DBFS as a “virtual hard drive” that comes with your Databricks workspace. It helps you store and access data, scripts, results, and even files uploaded by users. It’s built on top of your cloud storage (like AWS S3, Azure Blob Storage, or Google Cloud Storage), but you don’t need to worry about the cloud-specific details — DBFS makes it all look simple.

Exploring DBFS Using dbutils.fs.ls("/")

When you run the command dbutils.fs.ls("/") in a Databricks notebook, it shows you the root of DBFS. This is like opening the "My Computer" or "This PC" folder on your personal computer. Here’s what you’ll see:

Key Folders in DBFS

1. /databricks-datasets: Ready-to-Use Data for Practice

This folder is perfect for beginners. It contains sample datasets provided by Databricks, so you can practice without having to upload your own data.

  • What’s Inside? Datasets like airline delays, e-commerce transactions, and more. These are read-only, so you can’t accidentally modify them.
  • How to Use It? You can browse or load data directly into your notebook:
# List datasets dbutils.fs.ls("/databricks-datasets/") 
# Load a CSV file into a DataFrame df = spark.read.csv("/databricks-datasets/airlines/part-00000", header=True) df.show()

2. /databricks-results: Temporary SQL Query Results

Whenever you run SQL queries in Databricks, the results may be temporarily stored here. Think of it like a scratchpad for query outputs.

--

--

Suraj Jeswara
Suraj Jeswara

Written by Suraj Jeswara

Cofounder | Writer | Traveler | Vlogger | Data Engineering Consultant I am passionate about learning new things and sharing it with others. :)

Responses (1)

Write a response