Web.BrowserContents

D

T

The M Code Behind the Power Query M function Web.BrowserContents

What is the Web.BrowserContents function?

The Web.BrowserContents function is a built-in M function in Power Query that allows users to retrieve the HTML contents of a webpage. It is one of several functions available in Power Query for web scraping, which is the process of extracting data from websites. The function takes a URL as input and returns the HTML contents of the webpage as a text string.

How does the Web.BrowserContents function work?

The Web.BrowserContents function uses the Internet Explorer browser engine to retrieve the contents of a webpage. It works by creating an instance of the InternetExplorer.Application COM object, navigating to the specified URL, and then retrieving the HTML contents of the page using the Document.Body.outerHTML property. The function also supports authentication and cookie handling, allowing users to retrieve contents from pages that require authentication or have session cookies.

Using the Web.BrowserContents function

To use the Web.BrowserContents function in Power Query, you need to create a new query and enter the URL of the webpage you want to retrieve data from. The following example shows how to use the function to retrieve the title and content of a Wikipedia page:


let

Source = Web.BrowserContents("https://en.wikipedia.org/wiki/Power_Query"),

#"Converted to Table" = Html.Table(Source, {{"Title", "title"}, {"Content", "p"}}, {"Title", "Content"})

in

#"Converted to Table"


In this example, we first create a new query and use the Web.BrowserContents function to retrieve the HTML contents of the Wikipedia page for Power Query. We then use the Html.Table function to parse the contents into a table format, with columns for the page title and content.

Limitations of the Web.BrowserContents function

While the Web.BrowserContents function is a useful tool for web scraping, it has a number of limitations. One of the main limitations is that it relies on the Internet Explorer browser engine, which is not the most modern or efficient browser engine. This can result in slow performance and compatibility issues with some websites.

Another limitation of the function is that it does not support JavaScript, which is used extensively by many websites to generate dynamic content. This means that the function may not be able to retrieve all the data on a webpage that uses JavaScript.

The Web.BrowserContents function is a powerful tool for web scraping in Power Query. It allows users to retrieve the HTML contents of a webpage and parse it into a table format. While the function has some limitations, such as reliance on the Internet Explorer browser engine and lack of support for JavaScript, it is still a valuable tool for extracting data from the web. By understanding the M code behind the function, users can take full advantage of its capabilities and create more effective data analysis and transformation workflows in Power Query.

Power Query and M Training Courses by G Com Solutions (0800 998 9248)

Upcoming Courses

Contact Us

    Subject

    Your Name (required)

    Company/Organisation

    Email (required)

    Telephone

    Training Course(s)

    Your Message

    Upload Example Document(s) (Zip multiple files)