Web scraping is a tech industry term that refers to automated methods of gathering data. Simply put, web scraping is a form of data scraping that gathers information from websites, applications, and web servers. Say you’re a software designer or a marketing analyst and you want to know more about what kind of data passes through, say, Twitter on a daily basis. There are dozens of software tools out there—known as scraping engines—that can automatically crawl Twitter’s website, gather all of the data that it finds, and organize that data into a structure that is easily transferable to a spreadsheet or a database.
Even though the information scrapers gather is technically publicly available, it’s generally not data that was intended for bulk dissemination by the website being scraped.
The general legality of web scraping is debatable: whether it’s legal or illegal is something of a gray area, right now. There are a host of potential legal problems that may arise when it comes to web scraping, including criminal acts, regulatory infractions, intellectual property infringement, and more. In the largely unregulated “Wild West” of the Web, web scraping is just one more piece of tech that raises legal questions with no answers.
Most major websites and apps have provisions in their terms that explicitly prohibit web scraping or any automated data gathering. Particularly if you’re planning to profit off the data you gather, you have to consider the likely response of the company that created the data. Your new idea for an app that parses Twitter or Facebook data might sound great in theory, but the possibility of being sued by those companies might be equally high. While the outcome of such a lawsuit is by no means guaranteed—you might emerge victorious—there’s nothing quite as devastating to a nascent company than a grueling, expensive legal battle against an industry giant.
The moral of this story is simple: seek out the counsel of an experienced legal adviser before you rely on web scraping to power your new app, website, or startup idea.