manton
manton

Belatedly realizing that Reddit’s robots.txt change means that Micro.blog’s bookmark feature now can’t archive a copy of pages, because we check robots.txt. This is the kind of trickle down effect when a site withdraws from the open web, it hurts other services and incentivizes breaking conventions.

|
Embed
Progress spinner
yury_mol@mastodon.social
yury_mol@mastodon.social

@manton Well, do you even have to follow robots.txt from a legal perspective?

|
Embed
Progress spinner
manton
manton

@yury_mol No, but it seemed like the right thing to do. Now I'm less sure.

|
Embed
Progress spinner
pratik
pratik

@manton How different is Reddit’s behavior from us bloggers using robots.txt?

|
Embed
Progress spinner
manton
manton

@pratik Mostly because of their size, and because they are making the decision regardless of what the authors of the content may want.

|
Embed
Progress spinner
lostinhaste
lostinhaste

@manton I still find the entire situation bizarre...seems like they want to become a walled garden but wouldn't that also reduce their traffic and (in theory), their profitability... Cutting off their nose to spite their face, or something like that. 🤷🏼

|
Embed
Progress spinner
manton
manton

@lostinhaste I guess they are betting on active users still being engaged, in the same way that Facebook is like a walled garden but very profitable. It does seem short-sighted, though, and generally bad for the web.

|
Embed
Progress spinner
billseitz@toolsforthought.social
billseitz@toolsforthought.social

@manton @yury_mol I think you could justify honoring it everywhere else but Reddit

|
Embed
Progress spinner
prealpinux
prealpinux

@manton I totally agree 💯

|
Embed
Progress spinner
pratik
pratik

@manton But that’s exactly the side effect of people giving their writing to a for-profit company. Wikipedia would never do it. We shouldn’t expect corporations to act in our interest. They’ll often act in their own interest even to the detriment of their users.

|
Embed
Progress spinner
dwineman@xoxo.zone
dwineman@xoxo.zone

@manton You aren’t crawling pages arbitrarily though, are you? I think it’s different when the archiving is requested directly by a user. @marcoarment said recently that he makes a similar choice for Overcast.

|
Embed
Progress spinner
kemayo@hachyderm.io
kemayo@hachyderm.io

@manton That doesn’t sound like a robot, so I’d say you don’t really need to pay attention to it. `robots.txt` is very explicitly supposed to be consumed by automated crawlers. robotstxt.org/faq/what.html

|
Embed
Progress spinner
manton
manton

@dwineman @marcoarment Oh yeah, that was a good episode. What I'm doing is similar to Instapaper.

|
Embed
Progress spinner
dwineman@xoxo.zone
dwineman@xoxo.zone

@manton @marcoarment Yeah, or Pinboard or any number of similar services. I’d be hesitant to automatically follow and archive links from a bookmarked page (other than redirects), but I don’t think what you’re doing is a problem.

|
Embed
Progress spinner
In reply to
renevanbelzen
renevanbelzen

@manton Wasn’t robots.txt meant to optimize search results, and nothing more? My guess is that Reddit doesn’t want Google to eat Reddit’s lunch, since Reddit content has been prevailing in search results for a long time now.

|
Embed
Progress spinner