Hdfs.Files

D

T

The M Code Behind the Power Query M function Hdfs.Files

What is Power Query?

Power Query is a data connectivity and transformation tool that allows users to connect to various data sources, transform the data, and load it into Excel or Power BI. It provides a user-friendly interface making it easy to perform complex data transformations without the need for coding. Power Query supports a wide range of data sources, including databases, Excel files, text files, and web services.

What is Hdfs.Files?

Hdfs.Files is a Power Query M function that allows users to connect to Hadoop Distributed File System (HDFS) and retrieve a list of files from a directory in HDFS. HDFS is a distributed file system that is commonly used in big data environments. The Hdfs.Files function makes it easy to connect to HDFS and retrieve files for further processing.

The M Code Behind Hdfs.Files

The M code behind Hdfs.Files is relatively simple. The function takes two parameters: the URL of the HDFS instance and the directory path to retrieve the files from. The function uses the Web.Contents function to connect to the HDFS instance and retrieve the file list. The file list is then parsed using the Xml.Tables function to extract the file names and other information.

Here is an example of the M code for the Hdfs.Files function:


let

Source = (url as text, path as text) =>

let

hdfsUrl = url & “/webhdfs/v1” & path & “?op=LISTSTATUS”,

fileContent = Web.Contents(hdfsUrl),

xml = Xml.Tables(fileContent),

files = xml{0}[#”Table1″],

fileList = Table.FromList(files, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

fileTable = Table.TransformColumnTypes(fileList, {{“Column1”, type text}})

in

fileTable

in

Source


The function can be called using the following syntax:


Hdfs.Files(“http://hdfs-instance:50070”, “/path/to/directory”)


How to Use Hdfs.Files

Using Hdfs.Files is straightforward. Simply provide the URL of the HDFS instance and the directory path to retrieve the files from. The function will return a table with a list of files and their properties in the specified directory.

The output table includes the following columns:

- Name: The name of the file

- Type: The type of the file, such as file or directory

- Size: The size of the file in bytes

- Owner: The owner of the file

- Group: The group of the file

- Permission: The permissions of the file

- ModificationTime: The time the file was last modified

- AccessTime: The time the file was last accessed

Once the file list is retrieved, it can be further processed using Power Query transformations to filter, sort, and format the data.

The Hdfs.Files function is a powerful tool for connecting to HDFS and retrieving a list of files for further processing. With its simple M code, this function makes it easy to work with big data in Power Query. Whether you are a data analyst or a developer, understanding the M code behind this function can help you to better manage and process big data.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)