Blog
Ask the Analyst: Nisos Anti-scraping Expert Scott Tessier
While web scraping often violates online platforms’ terms of service, it can be challenging for even the most sophisticated security teams to combat. In addition to exposing users’ sensitive personal data, online platforms have suffered server overloads and service disruptions due to unauthorized web scraping of their properties.
In this blog, we interview Nisos Intelligence Analyst and anti-scraping expert Scott Tessier about the rise in web scraping. Scott is a seasoned intelligence analyst who helps leading online platforms uncover and unmask threats. His work has given him unique insight into the marketplace for legitimate and underground web scraping services.
1. Who are the players in the scraping ecosystem? And how do their motivations differ?
There’s a large freelancer or sole proprietor population. Anyone with an account can go on sites like Fiverr, Freelancer, or Upwork and post a job to hire somebody to scrape a website. They can very often hire somebody for under $100, sometimes even significantly cheaper, and there are thousands of people doing this on these freelancing sites.
Large scraping cohorts of academics also exist for very different reasons, which are generally to solve some sort of social science question or to do things like train an AI or ML model, so they want a huge data set. So they’ll go out and scrape massive data sets to help train their models.
Then, marketing and business intelligence ecosystems are another one. There are vast amounts of insights to be gained from harvesting data on social media platforms, which hold insights into people’s likes, dislikes, brand resonance advertising campaigns, and all sorts of things that can be gleaned from that information.
Finally, there are cyber criminals. They scrape data to harvest and sell personally identifiable information. And they will buy scraped data to advance social engineering campaigns or other illicit activities within the underground economy.
2. How big do you think this scraping as a service is? Who do they generally serve? Who’s generally going to and paying for those services?
We’ve found it’s large. There are tens of thousands of people involved in the ecosystem who are actually doing the scraping. There are a lot of different growth forecasts for the industry, and the specifics widely differ. It’s an industry that’s generally forecasted for double-digit compound growth annually for the foreseeable future. So it’s a problem that’s really only going to get bigger.
In terms of who’s buying it, it’s a really diverse set of actors. The biggest consumer of this data is the marketing business intelligence lead generation traffic generation kind of industry because you have the ability to harvest hundreds of thousands or millions of emails. (This) is obviously very valuable for them. But, we found people from all different walks of life going out and searching for this data –from cyber criminals to even small businesses like Dance Studios – really anybody who’s looking for any sort of insights that can help them grow their business.
3. How are cybercriminals using scraped data today? And do they typically buy or try to steal the data themselves?
Very often, we’ll see scraped data sets with tens of millions or even hundreds of millions of lines of user data for sale. And for these actors, their motivation is a bit different. People posting the data sets are doing it essentially exclusively for profit. They know there’s value in the data, so they scrape it and sell it. In terms of the people buying the scraped data, ultimately, we don’t have a round truth on why a cybercriminal might be buying a scraped data set, but we can certainly make inferences the data sets are being used for social engineering campaigns, other fraud, and illicit activities, at a large scale with a large number of peoples data at once. The data can obviously be very valuable for doing things like social engineering campaigns or furthering some other sort of fraud or other capitalistic activities at a large scale where you can do it with a large number of people’s data at once.
4. What can online platforms do to combat scraping from their platform?
If you’re trying to deter cyber criminals, their sophistication varies widely, and there are a lot of people out there who I think are kind of easily deterred. If you have pretty robust anti-bot mechanisms and browser fingerprinting and things of that nature, it is going to be a significant deterrent for many people, particularly more casual and recreational scrapers. We’ve also seen a lot of companies try and go the legal route, and certainly, this can be successful. But there have been some changes in legal precedent in the last year that I think are making this route a little bit more difficult.
Companies can also work with threat intelligence companies who can provide insights into who is scraping their data, how they are scraping it, and why they are doing it. Intelligence providers can engage with the threat actors and obtain insights into their infrastructure and methods. This data can help companies better hone their defenses or conduct targeted outreach to web hosts or third-party websites that are hosting or enabling their activities to try to mitigate the threat further upstream.
5. What’s next? How will AI potentially evolve scraping and the attacks that follow?
What it is doing, however, is allowing anybody without any real degree of technical sophistication or coding experience to go into their AI platform of choice and come up with pretty sophisticated scraping code; in a lot of instances the code is very good and many models are good at troubleshooting code as well. So it’s probably not changing the degree of the most sophisticated threats, but what it’s really doing is broadening the pool of people and lowering the barriers to entry into the market.
About Nisos®
Nisos is the Managed Intelligence Company. We are a trusted digital investigations partner, specializing in unmasking threats to protect people, organizations, and their digital ecosystems in the commercial and public sectors. Our open source intelligence services help security, intelligence, legal, and trust and safety teams make critical decisions, impose real world consequences, and increase adversary costs. For more information, visit: https://www.nisos.com.