Table.Distinct

D

T

The M Code Behind the Power Query M function Table.Distinct

Table.Distinct is a powerful function that can help you remove duplicate values from your data. In this article, we will explore the M code behind the Table.Distinct function and how it can be used to clean and manipulate your data.

Understanding the Table.Distinct Function

The Table.Distinct function is used to remove duplicate rows from a table. It takes a table as an input and returns a new table with only the unique rows.

Let’s take a look at the syntax of the Table.Distinct function:


Table.Distinct(table as table, optional equationCriteria as any, optional columnsToCompare as any) as table


The function takes three arguments:

– `table`: This is the input table that you want to remove duplicates from.

– `equationCriteria`: This is an optional argument that allows you to specify the criteria for determining duplicates. By default, Table.Distinct considers all columns in the input table when determining duplicates. However, you can use the equationCriteria argument to specify a custom comparison function.

– `columnsToCompare`: This is an optional argument that allows you to specify the columns to use when determining duplicates. By default, Table.Distinct considers all columns in the input table. However, you can use the columnsToCompare argument to specify a subset of columns to compare.

Now that we understand the syntax of the Table.Distinct function, let’s take a look at the M code behind it.

Understanding the M Code Behind Table.Distinct

The M code behind the Table.Distinct function is relatively simple. It uses the Table.Buffer function to create a copy of the input table and then uses the List.Distinct function to remove duplicates.

Here is the M code for the Table.Distinct function:


(table as table, optional equationCriteria as any, optional columnsToCompare as any) =>

let

bufferTable = Table.Buffer(table),

columnNames = if columnsToCompare = null then Table.ColumnNames(bufferTable) else columnsToCompare,

equationFunction = if equationCriteria = null then (x,y) => x = y else equationCriteria,

distinctRows = List.Distinct(bufferTable, equationFunction, columnNames),

result = Table.FromRecords(distinctRows, columnNames)

in

result


The code starts by creating a copy of the input table using the Table.Buffer function. This is done to improve performance, as it allows Power Query to work with a cached version of the input table.

Next, the code checks if the columnsToCompare argument is null. If it is, the code sets columnNames to all the column names in the bufferTable. Otherwise, it sets columnNames to the columns specified in the columnsToCompare argument.

The code then checks if the equationCriteria argument is null. If it is, the code sets equationFunction to a function that checks for equality between two values. Otherwise, it sets equationFunction to the custom comparison function specified in the equationCriteria argument.

The code then uses the List.Distinct function to remove duplicates from the bufferTable using the equationFunction and columnNames arguments.

Finally, the code creates a new table from the distinct rows using the Table.FromRecords function and the columnNames argument.

Using Table.Distinct to Clean and Manipulate Data

Now that we understand the M code behind the Table.Distinct function, let’s take a look at how it can be used to clean and manipulate data.

Suppose we have a table of customer orders with columns for CustomerID, OrderID, and OrderDate. We want to remove any duplicate orders from the table based on the CustomerID and OrderID columns.

We can use the Table.Distinct function to do this as follows:


let

ordersTable = Table.FromRecords({

[CustomerID = 1, OrderID = 1001, OrderDate = #date(2022, 1, 1)],

[CustomerID = 1, OrderID = 1002, OrderDate = #date(2022, 1, 2)],

[CustomerID = 2, OrderID = 1003, OrderDate = #date(2022, 1, 3)],

[CustomerID = 2, OrderID = 1003, OrderDate = #date(2022, 1, 4)],

[CustomerID = 3, OrderID = 1004, OrderDate = #date(2022, 1, 5)],

[CustomerID = 4, OrderID = 1005, OrderDate = #date(2022, 1, 6)],

[CustomerID = 4, OrderID = 1005, OrderDate = #date(2022, 1, 7)]

}),

distinctOrdersTable = Table.Distinct(ordersTable, (x,y) => x[CustomerID] = y[CustomerID] and x[OrderID] = y[OrderID], {"CustomerID", "OrderID"})

in

distinctOrdersTable


The code creates a table of customer orders with seven rows, two of which are duplicates (orders with IDs 1003 and 1005). The code then uses the Table.Distinct function to remove these duplicates based on the CustomerID and OrderID columns.

The resulting table contains only the unique orders.

In conclusion, the Table.Distinct function is a powerful tool in Power Query that can help you remove duplicate rows from your data. Understanding the M code behind the function is important if you want to use it effectively. With this knowledge, you can use the Table.Distinct function to clean and manipulate your data with ease.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)