Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I didn't use this company, but there are some legitimate purposes for scraping.

For example, at a startup a few years ago, one of the many technological things we needed to do was to monitor marketplaces for suspected counterfeit and contract-violating gray market goods for ~100 brands. And we couldn't just ask for data feeds, because, well, the marketplaces make money off of all those sales. And the off-the-shelf third-party data solutions were useless crap quality, worse than your average vibe-coding. So I made a bespoke crawler that gently and accurately tracked the data we needed, including global geofencing. So gently, I never got a whiff of disapproval or countermeasures (like throttling, 403, nor data poisoning). We were putting insignificant load on the marketplaces, for the purpose of helping to make the market better for both consumers and legitimate businesses. It was like a single "secret shopper" unobtrusively walking around some parts of a store. (And I also made an iOS app that did something different for actual secret shoppers in physical stores, for legitimate supply chain traceability for customers' brands.) Personally, I love the marketplaces, and hate the counterfeits, and this was my version of PG's advice that startups should be a little bit naughty.

Two of the problems with the current AI scrapers, which are destroying servers, and inviting backlash:

1. The gold rush situation brings out many of the crappiest people in the world. And also many who aren't crappy might behave in a crappy manner. (The latter, maybe because they're just emulating what they see, or extrapolating from the ethical temperature of prior industry norms, like surveillance capitalism in everything.)

2. Many of these scrapers are shockingly bad at what they do, and grossly inefficient. Almost like they're just pounding the same unchanging resources to DoS the servers for competitors. Or to drive sites to a protection racket company that's set up so they can also monitor cleartext. Or (Occam's Razor) just plain bad at what they do, and the people who pay for the salaries and computer resources either don't know or don't care.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: