manton
manton

Evolving thoughts on web scraping. First, I figured anything goes. Later, I was hesitant to depend on any web site structure that would break, and I wouldn’t work around attempts to stop scraping. Now, I’m back to thinking if you don’t want people to see something, don’t put it on the public web.

|
Embed
Progress spinner
numericcitizen
numericcitizen

@manton there has to be a middle ground here. Sharing something without feeling having an army of people tracking your every moves... remember this ads by Apple?

|
Embed
Progress spinner
manton
manton

@numericcitizen For sure. I didn't explicitly say it but that post was inspired by my frustration with Goodreads. 🙂 There are different but overlapping issues for personal data, too.

|
Embed
Progress spinner
stupendousman
stupendousman

@manton I wantt to be out in my pretty Armanis doesnt mean I want a photographer taking a photo of me without my consent and sell it for commercial use. Sure if I am in public they can take a photo, but a) if the photo features just me and identifies me I better consent to it or b) i am a part of large enough crowd that I am indistinguishable from other faces except the suit. if you don’t want people to see something, don’t put it on the public web is akin to saying women should not wear skimpy shorts in public because its asking for trouble from men, isn it? I am sure you didnt mean it that way, but came across like that.

|
Embed
Progress spinner
manton
manton

@stupendousman I really didn't mean in that way at all, but I can see how my post was too vague to be meaningful. I was only thinking of web sites that try to "protect" their data despite it being totally public.

|
Embed
Progress spinner
pratik
pratik

@manton Does it also cover people who want to keep it from the prying eyes of search engines? @stupendousman

|
Embed
Progress spinner
renevanbelzen
renevanbelzen

@manton The problem with secrets is that once shared, they’re no longer secrets. Luckily it is cryptographically possible to confirm the knowledge of a secret without sharing its contents. Often that’s enough in practice, e.g. to confirm one’s identity without any risk of being doxed. But then governments start to complain, using scare tactics, since they inevitably want to spy on their citizens.

|
Embed
Progress spinner
jarrod
jarrod

@manton Around and around it goes. There’s several debates like this that rage on in my head over years. Not sure if I’ll ever be able to settle on where to land.

|
Embed
Progress spinner
In reply to
hjalm
hjalm

@manton This is so frought. I see all the points of view. Even if I stay inside my own yard, I could be on someone’s security camera feed. If I stay inside my home, I’m still interacting with commercial entities who have records of all kinds of transactions. The privacy we once had is truly changed.

|
Embed
Progress spinner
cambridgeport90
cambridgeport90

@jarrod @manton I can see this sort of thing for sites for things such as medical,financial,and what not; neither of those would be preferred on search engines, I know I don't want my financial or medical information available for people without proper authorization to look up and do gods-know-what with, but when it comes to information that you publish yourself on your personal site, without search engines or directories to catalogue these sites...you're just writing into thin air and potentially no one will ever learn what you have to share. Considering you choose what you publish on your site, I follow the guideline of not publishing things that I wouldn't want others to see.

|
Embed
Progress spinner
jsonbecker
jsonbecker

@manton people have conflated internet technologies with the web. There's plenty of ways to use the internet to communicate non-publicly. And if you choose instead to be on the web, you should, within reason1, expect it to be public by definition.

  1. Rate limiting, for example, to control costs. 

|
Embed
Progress spinner
manton
manton

@cambridgeport90 Totally. Medical and financial institutions have a responsibility to not leak data out to the open web. That kind of private data should always be locked down behind authorization and as secure as possible.

|
Embed
Progress spinner
manton
manton

@pratik @stupendousman Tools should respect robots.txt whenever possible. Micro.blog checks that when archiving copies of bookmarked web pages. To be clear, I wasn't thinking about personal data at all but instead generic data about things online.

|
Embed
Progress spinner
stupendousman
stupendousman

@manton yep thanks for clearing it. I am with you 100% on that one @pratik

|
Embed
Progress spinner
pratik
pratik

@manton @stupendousman Glad to hear that. Now we just need to trust the search engines and AI companies to honor the ‘robots.txt’ on our sites.

|
Embed
Progress spinner
fgtech
fgtech

@pratik @manton @stupendousman I’ve said this before, but I think we are overdue for an internet privacy law. Amongst its provisions should be legal consequences for not respecting robots.txt.

|
Embed
Progress spinner
hawaiiboy
hawaiiboy

@fgtech @pratik @manton interesting thread. I’ve accepted that due be oblivious and naive to the dangers for many years things I put on the web/internet in the early years aren’t going away. In the past decade, I’ve pulled back and been far more thoughtful about what goes up and where.

|
Embed
Progress spinner
stupendousman
stupendousman

@fgtech I dont think any such law will actually work on the Internet anymore. The dark web is supposed to be 3X size of "observable" Internet, any such laws will only make that larger. I have robots.txt and all ad choices turned off, but I am 100% sure google/facebook know more about me than me than anyone else lol. @pratik @manton

|
Embed
Progress spinner
pratik
pratik

@stupendousman What’s the Twitter of the dark web (the Upside Down)? Is it better or worse than our Twitter?

|
Embed
Progress spinner
stupendousman
stupendousman

@pratik I got no idea and I have no desire to find out lol. All I know is (and unverified) that people who use Tor are automatically flagged by ISPs as well coz its misused so much.

|
Embed
Progress spinner
pratik
pratik

@stupendousman Same. I have never ventured down there. Frankly, don't even know how to and don't want to find out.

|
Embed
Progress spinner
fgtech
fgtech

@stupendousman I’m not really talking about any mysterious dark web voodoo. There are plenty of privacy violations happening out in the open right now. We need some codified standards, and they should be enforceable.

|
Embed
Progress spinner
fgtech
fgtech

@stupendousman Also, why should we throw up our hands and say it’s impossible to enforce so why try? Letting problems fester just leads to more problems. Law enforcement has shown it can get just as creative as the dark side when needed.

|
Embed
Progress spinner