The Legality of Web Scraping

Web scraping is an automated process of collecting data from websites by using a program or a bot (hence the “automated” part). It’s an extremely efficient way of gathering large amounts of data from one or multiple online sources. Usually, the scraped web pages are first downloaded (fetched). Afterward, data is extracted from them. But – is all of this legal?

It turns out that the matter of the legality of web scraping is not a simple topic. There are various conflicting views on it. In this article, we’ll try to go over some of the most prominent cases where this issue was tackled.

Furthermore, we’ll also showcase the positive and negative aspects of web scraping. This way you’ll be able to take a more informed stance on the question of its legality. Also, you may be surprised to find out how the legality of web scraping could be relevant to you (especially if you’re a website owner)!

The Legality of Web Scraping – For & Against

Let’s go over some arguments related to both sides of the story. These arguments around the legality of web scraping are closely related to its benefits & risks. The ones who are seeing benefits from web scraping are mostly advocating for it to be legal, and vice versa.

In Favor of Legal Web Scraping

Public Availability of Data

Essentially all of the data that is gathered through web scraping is publicly available. This means that most human users can access it. So, if that’s the case, we can say that publicly available data is up for grabs for anyone or anything. The means through which we will access it and then gather it is of secondary importance.

Saving Resources

What web scraping does is that it automates the whole data gathering process and improves it. This means fewer resources being allocated to manual work. Furthermore, this means more resources for other important tasks. This fact alone is not sufficient to make a strong case for web scrapers being legal. However, when we combine it with other arguments, it’s difficult to advocate for its outlawing.

Efficiency

No single human can come close to the speed at which professional web scrapers gather data. Even large teams of people have difficulties performing numerous tasks that a single web scraper performs. A good web scraper is essentially an extremely valuable employee in your company. And an inexpensive one, at that.

If on top of that you add analytical possibilities of certain web scrapers, it’s very easy to see the appeal of such a tool.

Valuable Market Data at the Palm of Your Hand

Any type of business — small, medium, or large — needs to keep an eye out on the competition. Web scrapers make this process very easy and accessible. You see, you would probably allocate some time yourself to check the prices of your competitors manually, if you aren’t using a web scraper. However, we can argue that there is no need to reject the benefits of new technology. Instead of chasing around and spending time exploring your competitors, you can use that time to work on other parts of your business.

Case Against Web Scraping Being Legal

Data Theft

Some people think that if the information they put on the internet is collected by a non-human entity it can be considered data theft. This can hardly be the case, though. This is why it’s important to be wary of what information you put on the web (and where you put it). Knowing how well-protected websites you visit are can be very useful.

Collecting Protected Information

When a piece of information is hidden behind a login form or a similar obstacle, it’s generally safe to assume that the one who created it is not willing to share it with anyone. Generally speaking, many people have sensitive information stored in at least one of their online accounts. In this case, it makes the most sense to advocate against the legality of web scraping.

Identity Theft

Related to what we previously said, if a web scraper gathers enough information there’s a probability of it being misused. One extreme example of it is identity theft. This is why it’s important to know how protected against web scraping bots the websites you visit are. Also, if you’re a website owner, it’s very important to invest in a good bot protection solution.

The Legality of Web Scraping – LinkedIn vs. HiQ

In 2019 LinkedIn, the largest professional social network, was the target of a web scraper. However, when we say LinkedIn, we actually mean the data of its users. This is why LinkedIn decided to take legal action against HiQ Labs, data analytics company that was doing the scraping.

Before bringing the whole matter in front of a court, LinkedIn did send a cease-and-desist letter to HiQ Labs, but with no success.

There were two main legal acts that LinkedIn called upon: Digital Millennium Copyright Act (DMCA), and The Computer Fraud and Abuse Act (CFAA).

However, United States courts have found that the case against the legality of web scraping doesn’t hold its ground. This means that web scraping is legal (at least in the US). However, what’s done afterward with the gathered data is a matter for a different discussion.

Be Aware of Websites’ Bot Protection

Regardless of the answer to the question of the legality of web scraping, it’s crucial to know how well protected against bots your website is.

Also, even if you’re not a website owner, it’s important to know how well protected the websites you visit are.

A bot protection testing tool is the solution for this issue. Getting a clear score will either give you peace of mind or let you know that there’s something that needs to be fixed.