Impala
Power BI is a powerful business intelligence tool that allows users to analyze and visualize data from a variety of sources. One of those sources is Impala, a massively parallel processing SQL query engine for Apache Hadoop. In this article, we will explore how to connect to the Impala data source from inside Power BI using Power Query M language code.
Setting Up the Impala ODBC Driver
Before we can connect to the Impala data source from Power BI, we need to set up the Impala ODBC driver on our computer. The ODBC driver can be downloaded from the Cloudera website and installed on our computer.
Once the driver is installed, we need to configure it to connect to our Impala cluster. This can be done by creating a new ODBC data source in the ODBC Data Source Administrator. The administrator can be found in the Control Panel under Administrative Tools.
In the ODBC Data Source Administrator, we need to select the System DSN tab and click Add. We can then select the Impala ODBC driver from the list of drivers and click Finish. We will be prompted to enter the connection details for our Impala cluster, including the host name, port number, and authentication method. Once we have entered this information, we can test the connection to ensure that it is working correctly.
Creating a Power BI Report
Once we have set up the Impala ODBC driver and tested the connection, we can start creating a Power BI report that connects to the Impala data source.
To do this, we need to open Power BI Desktop and click on Get Data. We can then select ODBC from the list of data sources and click Connect.
In the ODBC Connector dialog box, we need to select the Impala data source that we created earlier and click Connect. We will be prompted to enter our authentication details, which will be used to connect to the Impala cluster.
Once we have authenticated, we can select the tables and columns that we want to use in our report. We can also filter the data to include only the rows that we are interested in.
Using Power Query M Language Code
Power Query is a powerful data transformation and cleansing tool that is built into Power BI. It allows us to filter, merge, and transform data from a variety of sources.
To connect to the Impala data source using Power Query M language code, we need to select the Advanced Editor option in Power Query. This will open a new window where we can enter the code to connect to the Impala data source.
The code to connect to the Impala data source using Power Query M language is as follows:
let
Source = Odbc.DataSource(“dsn=Impala DSN”, [HierarchicalNavigation=true]),
Impala = Source{[Schema=””,Item=””]}[Data]
in
This code uses the Odbc.DataSource function to connect to the Impala data source using the ODBC data source name that we created earlier. We can then use the resulting Impala object to query the data source and retrieve the data that we need for our report.
Conclusion
In this article, we have explored how to connect to the Impala data source from inside Power BI using Power Query M language code. We have seen how to set up the Impala ODBC driver and configure it to connect to our Impala cluster. We have also seen how to create a Power BI report that connects to the Impala data source, and how to use Power Query M language code to query the data source and retrieve the data that we need for our report. With these skills, we can leverage the power of Impala to create powerful business intelligence reports in Power BI.