@pratik Mostly because of their size, and because they are making the decision regardless of what the authors of the content may want.

Belatedly realizing that Reddit’s robots.txt change means that Micro.blog’s bookmark feature now can’t archive a copy of pages, because we check robots.txt. This is the kind of trickle down effect when a site withdraws from the open web, it hurts other services and incentivizes breaking conventions.

2024-07-30 3:28 pm

|

Embed

yury_mol@mastodon.social

@manton Well, do you even have to follow robots.txt from a legal perspective?

2024-07-30 3:57 pm

|

Embed

manton

@yury_mol No, but it seemed like the right thing to do. Now I'm less sure.

2024-07-30 4:08 pm

|

Embed

pratik

@manton How different is Reddit’s behavior from us bloggers using robots.txt?

2024-07-30 4:46 pm

|

Embed

In reply to

manton

@pratik Mostly because of their size, and because they are making the decision regardless of what the authors of the content may want.

2024-07-30 5:06 pm

|

Embed

lostinhaste

@manton I still find the entire situation bizarre...seems like they want to become a walled garden but wouldn't that also reduce their traffic and (in theory), their profitability... Cutting off their nose to spite their face, or something like that. 🤷🏼

2024-07-30 5:21 pm

|

Embed

manton

@lostinhaste I guess they are betting on active users still being engaged, in the same way that Facebook is like a walled garden but very profitable. It does seem short-sighted, though, and generally bad for the web.

2024-07-30 5:58 pm

|

Embed

billseitz@toolsforthought.social

@manton @yury_mol I think you could justify honoring it everywhere else but Reddit

2024-07-30 6:11 pm

|

Embed

prealpinux

@manton I totally agree 💯

2024-07-30 7:10 pm

|

Embed

pratik

@manton But that’s exactly the side effect of people giving their writing to a for-profit company. Wikipedia would never do it. We shouldn’t expect corporations to act in our interest. They’ll often act in their own interest even to the detriment of their users.

2024-07-30 7:29 pm

|

Embed

dwineman@xoxo.zone

@manton You aren’t crawling pages arbitrarily though, are you? I think it’s different when the archiving is requested directly by a user. @marcoarment said recently that he makes a similar choice for Overcast.

2024-07-30 9:37 pm

|

Embed

kemayo@hachyderm.io

@manton That doesn’t sound like a robot, so I’d say you don’t really need to pay attention to it. `robots.txt` is very explicitly supposed to be consumed by automated crawlers. http://www.robotstxt.org/faq/what.html

2024-07-30 10:05 pm

|

Embed

manton

@dwineman @marcoarment Oh yeah, that was a good episode. What I'm doing is similar to Instapaper.

2024-07-30 10:32 pm

|

Embed

dwineman@xoxo.zone

@manton @marcoarment Yeah, or Pinboard or any number of similar services. I’d be hesitant to automatically follow and archive links from a bookmarked page (other than redirects), but I don’t think what you’re doing is a problem.

2024-07-30 10:39 pm

|

Embed

renevanbelzen

@manton Wasn’t robots.txt meant to optimize search results, and nothing more? My guess is that Reddit doesn’t want Google to eat Reddit’s lunch, since Reddit content has been prevailing in search results for a long time now.

2024-07-31 5:13 am

|

Embed