Web Scraping With Golang



Using

Go is a statically typed programming language that is expressive, concise, clean, and efficient. “Web scraping is a computer software technique of extracting information from websites” “Web scraping focuses on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet.”. See more: golang scraping xpath, golang web scraper javascript, golang colly tutorial, go colly user agent, gocolly post, golang colly documentation, web scraping golang vs python, godoc gocolly, good experience working data entries, looking good help desk script, companies looking quickbooks experience, looking good programmers form. Resolving the Complexities of Web Scraping with Python Picking the right tools, libraries, and frameworks. First and foremost, I can't stress enough the utility of browser tools for visual inspection. Effectively planning our web scraping approach upfront can probably save us hours of head scratching in advance.

Follow me on twitch!

In this article we’re going to have a look at how to mock http connections while running your tests on your golang application.

Since we do not want to call the real API or crawl an actual site to make sure that our application works correctly, we want to make sure that our application works within our defined parameters.

There's a great module that can help us with the task of mocking HTTP responses for tests called httpmock

HTTP mocks for web scraping

Let's say we have a component in our application that will do some web scraping, so we might use something like goquery.

Web Scraping With Golang

In the below example we'll use a simple function that visits a website and extracts the content of the <title> tag.

filename: scrape.go

Now if we are to write a unit test for that, we can do that as follows:

filename: scrape_test.go

In the test we run the function and compare the title we expect with the title that was scraped by the function.

Now the problem with this test is, that when ever we run go test it will actually go to my website and read the title. This means two things:

  1. Our tests will be slower and more error prone than they could be
  2. I can never change my website title without changing the tests for this project
  3. Most important: We introduced a dependency outside our control for our program that doesn't have any relation to it

To fix this we commonly use mocks, a way of faking http responses, but to actually have the exchange of information happen on the computer where the tests are run, without having to rely on an external webserver or API backend to be available.

HTTP mocks for API requests

In Golang we can use httpmock to intercept any http requests made and pin the responses in our tests. This way we can verify that our program works correctly, without having to actually send a requests over the network.

To install httpmock we can add a go.mod file:

and running go mod download.

Rewriting our scrape_test.go would look like this:

after which we can run go test and it should produce the following output:

Let's go over the most important changes ot the file:

  • myMockPage :=... sets up our example response, a piece of plain text that our function will parse into a HTML and look for the title
  • httpmock.Activate() activates the mocking, before this no requests can be intercepted
  • httpmock.RegisterResponder() defines the METHOD and the URL, so GET or POST and an address at which we fake an http response
  • httpmock.NewStringResponder will need a status code and a string to respond with instead of what actually lives at that URL
  • httpmock.DeactivateAndReset() stops mocking responses for the rest of the test

If you instead want to mock an API response you can use something like this:

Golang Web Server

That's it! Our client consuming the string should take care of the JSON parsing.

If you're familiar with mocking http connections in node.js you may have heard of the nock library, which is pretty popular when building JavaScript projects.

Hope you enjoyed this little post about mocking in GO, let me know what you're building in the comments!

Golang Web Api

Web scraping with golang design

Thank you for reading! If you have any comments, additions or questions, please leave them in the form below! You can also tweet them at me

Web Scraping With Golang Command

If you want to read more like this, follow me on feedly or other rss readers





Comments are closed.