Hdfs.Contents
Accessing DataReturns a table of folders and files found at a folder URL from a Hadoop file system (HDFS).
Syntax
Hdfs.Contents(url as text) as tableParameters
| Name | Type | Required | Description |
|---|---|---|---|
url | text | Yes | The URL of the folder on the Hadoop Distributed File System (e.g., "hdfs://namenode:8020/data/"). |
Return Value
table — A table containing one row for each folder and file at the specified HDFS folder URL, with properties and a link to content.
Remarks
Hdfs.Contents connects to a Hadoop Distributed File System (HDFS) cluster and returns a table with one row per folder and file found at the specified folder URL. Unlike Hdfs.Files, this function returns both files and folders at the immediate level (similar to Folder.Contents for local file systems) rather than recursively listing all files.
Columns returned: Each row in the result includes:
Name(text) -- the file or folder name.Folder Path(text) -- the path of the containing folder.Date modified(datetime) -- the last modification timestamp.Extension(text) -- the file extension including the leading dot. Empty for folders.Attributes(record) -- file system metadata attributes.Content(binary) -- the binary content of the file.nullfor folders.
Authentication: Supports Anonymous access and Basic authentication (username/password). Configure credentials in the Power Query data source settings dialog using the HDFS cluster URL as the credential scope. Kerberos authentication is not natively supported in the UI but may be configured through gateway settings.
URL format: Provide the WebHDFS URL of the folder to browse. Typically this follows the format http://namenode:50070/webhdfs/v1/path/ or hdfs://namenode:8020/path/. The exact format depends on your Hadoop cluster configuration and whether WebHDFS or HttpFS is enabled.
Query folding: Not supported. Directory listing and all filtering are performed in Power Query after the folder contents are retrieved from HDFS.
Platform availability: Available in Power BI Desktop only. Not supported in Power BI Service, Excel, Dataflows, or Fabric notebooks. For scheduled refresh in Power BI Service, an on-premises data gateway with network access to the HDFS cluster is required.
Content column: The Content column is lazy-loaded. Files are not downloaded from HDFS until the Content binary is accessed. Filter to the specific files you need before expanding content to minimize data transfer.
Examples
Example 1: List files and folders at an HDFS path
Hdfs.Contents("http://namenode:50070/webhdfs/v1/data/")Example 2: Navigate to a subfolder and filter to CSV files
let
Source = Hdfs.Contents("http://namenode:50070/webhdfs/v1/data/"),
CsvFiles = Table.SelectRows(Source, each [Extension] = ".csv")
in
CsvFilesExample 3: Read a specific CSV file from HDFS
let
Source = Hdfs.Contents("http://namenode:50070/webhdfs/v1/data/reports/"),
File = Source{[Name = "sales.csv"]}[Content],
Parsed = Csv.Document(File, [Delimiter = ",", Encoding = TextEncoding.Utf8]),
Promoted = Table.PromoteHeaders(Parsed, [PromoteAllScalars = true])
in
Promoted