Member-only story
Beginner’s Guide to Databricks File System (DBFS): What It Is and How to Use It
If you’re new to Databricks, one of the first things you’ll encounter is DBFS, or Databricks File System. It’s the foundation of how files are stored and accessed in Databricks. In this guide, we’ll break it down in simple terms, so you can understand its purpose and how to use it effectively.
What Is DBFS?
Think of DBFS as a “virtual hard drive” that comes with your Databricks workspace. It helps you store and access data, scripts, results, and even files uploaded by users. It’s built on top of your cloud storage (like AWS S3, Azure Blob Storage, or Google Cloud Storage), but you don’t need to worry about the cloud-specific details — DBFS makes it all look simple.
Exploring DBFS Using dbutils.fs.ls("/")
When you run the command dbutils.fs.ls("/")
in a Databricks notebook, it shows you the root of DBFS. This is like opening the "My Computer" or "This PC" folder on your personal computer. Here’s what you’ll see: