Table.FuzzyGroup

D

T

The M Code Behind the Power Query M function Table.FuzzyGroup

In this article, we’ll take a closer look at the M code behind the Table.FuzzyGroup function, how it works, and how you can use it to improve your data analysis workflow.

Understanding Fuzzy Matching

Before we dive into the Table.FuzzyGroup function, it’s important to understand what fuzzy matching is and why it’s useful. Fuzzy matching is a technique used to identify strings that are similar to each other, even if they’re not an exact match.

For example, let’s say you have a column of customer names in your data, and some of the names are spelled slightly differently. Fuzzy matching can help you group those similar names together, even if they’re not identical.

Fuzzy matching works by comparing strings based on a set of rules or algorithms. There are several algorithms that can be used for fuzzy matching, including the Levenshtein distance algorithm, which calculates the number of edits (insertions, deletions, and substitutions) needed to transform one string into another.

Introducing the Table.FuzzyGroup Function

The Table.FuzzyGroup function in Power Query uses a variant of the Levenshtein distance algorithm called the Jaro-Winkler distance algorithm. This algorithm compares two strings and returns a score between 0 and 1, where 0 means the strings are completely different and 1 means the strings are an exact match.

The Table.FuzzyGroup function takes a table as input and returns a new table with an additional column that groups similar strings together based on a fuzzy matching algorithm. The function takes two arguments: the name of the column to group, and a threshold value that determines how similar two strings need to be in order to be grouped together.

Here’s an example of how to use the Table.FuzzyGroup function in Power Query:


let

Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i42Kc7ZxVc+zyrRUMfUz0/MLUItK1wbT1Y2tCQjI0N9UyNzJz1Sz9JLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLzs0u1CvMSy8pSizJLz0tKLElMDS4tKTEoLilMDAwMDAwMDQwNDBiMThmNWEwMDA0BQ==", BinaryEncoding.Base64)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),

#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),

#"Grouped Rows" = Table.FuzzyGroup(#"Changed Type", {"Column1"}, 0.8)

in

#"Grouped Rows"


In this example, we start by creating a table from some sample data. We then transform the data type of the column to text, since the Table.FuzzyGroup function requires a text column. Finally, we use the Table.FuzzyGroup function to group similar strings together, with a threshold value of 0.8.

Customizing the Table.FuzzyGroup Function

The Table.FuzzyGroup function is a powerful tool for grouping similar strings together, but it’s not perfect. Depending on your data, you may need to customize the function to get the best results.

One way to customize the Table.FuzzyGroup function is to adjust the threshold value. A higher threshold value will result in more strict grouping, while a lower threshold value will result in more lenient grouping. You can experiment with different threshold values to find the best one for your data.

Another way to customize the Table.FuzzyGroup function is to use a different fuzzy matching algorithm. The Jaro-Winkler distance algorithm used by Table.FuzzyGroup is relatively effective, but there are other algorithms that may work better for your data.

The Table.FuzzyGroup function in Power Query is a powerful tool for grouping similar strings together based on a fuzzy matching algorithm. By understanding the M code behind this function and customizing it to your needs, you can improve your data analysis workflow and get better insights from your data.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)