Table.FuzzyJoin

D

T

The M Code Behind the Power Query M function Table.FuzzyJoin

What is Fuzzy Matching?

Fuzzy matching is a technique used to compare two strings of text and determine the likelihood that they represent the same thing. It is often used in data cleansing and data integration to join data from different sources with varying degrees of accuracy. Fuzzy matching algorithms consider factors such as spelling, phonetics, and synonyms to determine the similarity between two strings.

How does Table.FuzzyJoin Work?

Table.FuzzyJoin is an M function in Power Query that allows you to perform fuzzy matching on two tables. The function returns a new table that combines the rows from the two input tables based on a specified matching algorithm.

Here is the syntax for the Table.FuzzyJoin function:


Table.FuzzyJoin(

table1 as table,

key1 as any,

table2 as table,

key2 as any,

algorithm as nullable number,

optional options as nullable record

) as table


The function takes the following arguments:

– `table1` – The first table to join

– `key1` – The column or columns in `table1` to match

– `table2` – The second table to join

– `key2` – The column or columns in `table2` to match

– `algorithm` – The matching algorithm to use (optional)

– `options` – Additional options for the matching algorithm (optional)

Matching Algorithms

Table.FuzzyJoin supports several different matching algorithms that can be used to compare two strings. The `algorithm` argument specifies which algorithm to use, and can be one of the following values:

– `1` – Levenshtein distance

– `2` – Damerau-Levenshtein distance

– `3` – Jaro distance

– `4` – Jaro-Winkler distance

– `5` – Soundex

– `6` – Double Metaphone

Each of these algorithms works differently and is suitable for different types of data. For example, the Levenshtein distance algorithm is good for comparing text with spelling mistakes, while the Soundex algorithm is good for comparing names that may be spelled differently but sound similar.

Options

The `options` argument allows you to specify additional parameters for the matching algorithm. The options available depend on the algorithm used. For example, the Jaro-Winkler distance algorithm has an option called `threshold`, which specifies the minimum similarity score required for a match. The default value is 0.7.

Example

Let’s look at an example of how to use the Table.FuzzyJoin function. Suppose we have two tables, `Table1` and `Table2`, that we want to join based on the `ProductName` column.


Table1:

| ProductName | Quantity |

|------------------|----------|

| Parachute Coconut | 10 |

| Dove Soap | 15 |

| Pepsodent | 20 |

Table2:

| ProductName | Price |

|-------------|----------|

| Parachute | $5.00 |

| Dove | $3.50 |

| Pepsodent | $2.00 |


We can use the Table.FuzzyJoin function to join the two tables based on the `ProductName` column. We’ll use the Jaro-Winkler distance algorithm with a threshold of 0.8.


Table.FuzzyJoin(

Table1,

"ProductName",

Table2,

"ProductName",

4,

[threshold = 0.8]

)


This will return the following table:


| ProductName | Quantity | Price |

|------------------|----------|-------|

| Parachute Coconut | 10 | $5.00 |

| Dove Soap | 15 | $3.50 |

| Pepsodent | 20 | $2.00 |


The Table.FuzzyJoin function has successfully matched the rows in `Table1` and `Table2` based on the `ProductName` column, even though the strings are not identical.

The Table.FuzzyJoin M function is a powerful tool for matching tables based on fuzzy matching algorithms. By using this function, you can join data from different sources with varying degrees of accuracy, helping you to create better insights and analysis. By understanding the M code behind the function, you can customize it to fit your specific data needs and get the most out of your data.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)