HdInsight.Files

D

T

The M Code Behind the Power Query M function HdInsight.Files

In this article, we’ll take a closer look at the M code behind the HdInsight.Files function and explore how it works.

Understanding Hadoop Distributed File System (HDFS)

Before we dive into the M code behind the HdInsight.Files function, let’s first discuss what Hadoop Distributed File System (HDFS) is and how it works.

HDFS is a distributed file system that is designed to store and manage large amounts of data across a network of machines. It is a core component of the Apache Hadoop framework, which is widely used for big data processing and analytics.

In HDFS, data is stored in blocks across a network of machines. Each block is replicated across multiple machines for fault tolerance, ensuring that data can be easily recovered in case of machine failure. HDFS also includes a NameNode, which tracks the location of each block of data and manages access to the file system.

How HdInsight.Files Works

Now that we have a basic understanding of HDFS, let’s explore how the HdInsight.Files function works.

The HdInsight.Files function is an M function in Power Query that allows users to connect to and retrieve data from HDFS clusters. It works by sending requests to the HDFS NameNode to retrieve information about files and directories in the file system.

When you use the HdInsight.Files function in Power Query, you’ll need to provide the following parameters:

– `url`: The URL of the HDFS NameNode.

– `path`: The path to the file or directory you want to retrieve.

– `recursive`: A boolean value that determines whether to retrieve files and directories recursively.

Once you’ve provided these parameters, the HdInsight.Files function will send a request to the HDFS NameNode to retrieve information about the file or directory you specified. It will then return a table in Power Query that contains metadata about the file or directory, such as its name, size, and modification time.

The M Code Behind HdInsight.Files

Now that we understand how the HdInsight.Files function works, let’s take a closer look at the M code behind it.

Here is an example of the M code for the HdInsight.Files function:


let

Source = Hadoop.Files(“hdfs://namenode:8020/”, [HierarchicalNavigation=true]),

#”Filtered Rows” = Table.SelectRows(Source, each ([Folder Path] = “/my/folder/”)),

#”Expanded Content” = Table.ExpandRecordColumn(#”Filtered Rows”, “Content”, {“File Name”, “Content”}, {“File Name”, “Content”}),

#”Converted to Table” = #table({“File Name”, “Content”}, {{#”Expanded Content”[File Name]{0}, #”Expanded Content”[Content]{0}}})

in

#”Converted to Table”


This code uses the Hadoop.Files function, which is a built-in function in Power Query that allows users to connect to Hadoop clusters. The function takes two parameters:

- `url`: The URL of the HDFS NameNode.

- `options`: A record that contains additional configuration options for the connection.

In the example above, the `options` parameter includes the `HierarchicalNavigation` option, which enables hierarchical navigation in the HDFS file system.

The code then filters the results to only include files and directories in the `/my/folder/` directory using the `Table.SelectRows` function. It then expands the content of each file using the `Table.ExpandRecordColumn` function.

Finally, the code converts the results to a table using the `#table` function and returns the table as the final output.

The HdInsight.Files function in Power Query is a powerful tool for connecting to and retrieving data from Hadoop Distributed File System (HDFS) clusters. By understanding the M code behind the function, you can gain a deeper understanding of how it works and how to use it effectively in your own data analysis and transformation workflows.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)