Hdfs.Contents

D

T

The M Code Behind the Power Query M function Hdfs.Contents

What is Hadoop Distributed File System (HDFS)?

Hadoop Distributed File System (HDFS) is a distributed file system that is part of the Apache Hadoop ecosystem. It is designed to store and manage large amounts of data across multiple commodity hardware nodes. HDFS is used in big data processing applications where data is stored in a distributed manner across multiple nodes for parallel processing.

What is Power Query M?

Power Query M is the functional programming language used by Power Query. It is used to transform and shape data from various data sources before loading it into a destination. Power Query M is a case-sensitive language that is similar to F# and is easy to learn and use.

What is Hdfs.Contents?

Hdfs.Contents is a Power Query M function used to connect to Hadoop Distributed File System (HDFS) and retrieve a list of files and folders in a specified directory. The function takes a single argument, the Hadoop Distributed File System (HDFS) directory path, and returns a table with the following columns:

– Name: The name of the file or folder.

– Folder Path: The fully qualified HDFS path to the folder containing the file.

– Extension: The file extension (if applicable).

– IsFolder: A Boolean value indicating whether the item is a folder or a file.

– Date Accessed: The date the item was last accessed.

– Date Modified: The date the item was last modified.

– Date Created: The date the item was created.

The M Code Behind Hdfs.Contents

The M code behind Hdfs.Contents is composed of several functions that work together to establish a connection to Hadoop Distributed File System (HDFS) and retrieve a list of files and folders in a specified directory. Here is a breakdown of the M code for Hdfs.Contents:


let

Source = (directoryPath as text) =>

let

HadoopUri = “hdfs://:“,

HadoopUserName = ““,

HadoopPassword = ““,

HadoopDirectoryPath = directoryPath,

Source = Hadoop.Data(

HadoopUri,

[

UserName=HadoopUserName,

Password=HadoopPassword,

FolderPath=HadoopDirectoryPath

]

),

Files = Table.SelectRows(Source, each [IsFolder] = false),

Folders = Table.SelectRows(Source, each [IsFolder] = true),

FilesWithExtension = Table.AddColumn(Files, “Extension”, each if [IsFolder] = false then Text.AfterDelimiter([Name], “.”) else null),

FinalResult = Table.Combine({Folders, FilesWithExtension})

in

FinalResult

in

Source


The M code above defines a function named Source that takes a single argument, directoryPath, which is the Hadoop Distributed File System (HDFS) directory path to retrieve files and folders from. The function starts by defining the HadoopUri, HadoopUserName, and HadoopPassword variables, which are used to establish a connection to Hadoop Distributed File System (HDFS).

The function then uses the Hadoop.Data function to establish a connection to Hadoop Distributed File System (HDFS) using the HadoopUri, HadoopUserName, and HadoopPassword variables. The FolderPath parameter is set to the directoryPath argument passed to the function.

The function then uses the Table.SelectRows function to filter the source table and retrieve only the files and folders. The Files variable contains the rows where IsFolder is false, and Folders variable contains the rows where IsFolder is true.

The function adds a column named Extension to the Files table using the Table.AddColumn function. The Extension column is calculated by using the Text.AfterDelimiter function to extract the file extension from the Name column. If the IsFolder column is true, the Extension column is null.

Finally, the function uses the Table.Combine function to combine the Folders table and the FilesWithExtension table into a single table named FinalResult. The FinalResult table is then returned by the function.

Hdfs.Contents is a powerful Power Query M function that allows users to connect to Hadoop Distributed File System (HDFS) and retrieve a list of files and folders in a specified directory. The M code behind Hdfs.Contents is composed of several functions that work together to establish a connection to Hadoop Distributed File System (HDFS) and retrieve the required data. By understanding the M code behind Hdfs.Contents, Power Query users can customize and optimize their data preparation and analysis workflows.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)