Edent@mastodon.social
Edent@mastodon.social

🆕 blog! “WebMentions, Privacy, and DDoS - Oh My!”

Mastodon - the distributed social network - has two interesting challenges when it comes to how users share links. I'd like to discuss those issues and suggest a possible way forward. When you click on a link on my website which takes you to another website, your browser sends a Referer1. This says to …

👀 Read more: shkspr.mobi/blog/2022/11/webme

#mastodon #MastodonAPI #metadata #NaBloPoMo #ogp

Crappy line drawing explaining the above.

|
Embed
Progress spinner
pre@boing.world
pre@boing.world

@Edent That's a good summary of the situation and I certainly agree scrapers should be checking robots.txt - including Big Centralized Social should be doing it too.

I don't think randomly checking the validity is likely to work well, but perhaps just user-reports would be fine.

Or perhaps a standard where a website can publish a public key with which they sign all their OpenGraph cards so their validity can be checked?

Though guess you're still ddosing with fetching the robots.txt and or the public key. Suppose they can both be static at least.

|
Embed
Progress spinner
KevinMarks@xoxo.zone
KevinMarks@xoxo.zone

@Edent rather than OGP, oEmbed would make more sense here, as it is more flexible and not controlled by Meta (Mastodon checks for it first, and you can convert OGP into oEmbed easily enough).
There is a cache of providers to save a lookup at oembed.com/
This reminds me a bit of the Google favicon cache eg google.com/s2/favicons?domain=
and seems like a similar service to check first would be useful.

|
Embed
Progress spinner
In reply to
manton
manton

@Edent Good post and thoughts. My 2 cents, I don't like including more data sent over ActivityPub because I feel like the current post data has already become quite bloated with various Mastodon fields, making it harder for implementers to know what is required.

|
Embed
Progress spinner
davidgerard@circumstances.run
davidgerard@circumstances.run

@Edent you may have trouble getting this past the project leader - here's the bug, open five years, last comment is a highly tech-enabled person annoyed at the obnoxious software github.com/mastodon/mastodon/i

|
Embed
Progress spinner
owenblacker@mastodon.lol
owenblacker@mastodon.lol

@Edent I realised something meta and mildly interesting reading this. I usually read your blog posts by email (because what is memory?) and it's only from clicking through that i realised all the Masto embeds are missing from the email…. Not sure Twitter embeds were silently dropped like that, were they?

|
Embed
Progress spinner
Edent@mastodon.social
Edent@mastodon.social

@owenblacker Oh, interesting. The Masto ones are iFrames. The Twitter ones are blockquotes.
Might look in to a way to fix that.
Good shout - and thanks for reading!

|
Embed
Progress spinner
Edent@mastodon.social
Edent@mastodon.social

@owenblacker
Looking more deeply, the oEmbed only has an iFrame representation.
Here's your message
mastodon.lol/api/oembed?url=ht
So that's what WordPress grabs.

|
Embed
Progress spinner
bhawthorne@mastodon.lol
bhawthorne@mastodon.lol

@Edent Unless you are running a web server on an old Palm Pilot connected to the net with paper cups and string, I have a hard time considering 1000 (or even 10,000) hits to be a DDoS attack. This seems to be trying to fix something that isn’t really broken.

|
Embed
Progress spinner
Edent@mastodon.social
Edent@mastodon.social

@bhawthorne
I've seen plenty of things knocked out by HackerNews - which is a max of a couple of thousand an hour.

|
Embed
Progress spinner
rbairwell@mastodon.org.uk
rbairwell@mastodon.org.uk

@KevinMarks Problem is it is "strongly encouraged" to use discovery to find the oEmbed (instead of just download the only 288 listed providers). Discovery means fetching the page anyway to parse its headers (HEAD and http headers if you are lucky, GET and html head of not).

|
Embed
Progress spinner
KevinMarks@xoxo.zone
KevinMarks@xoxo.zone

@rbairwell True, but at some point the page is going to have to be fetched, the question is whether you can trust a cache or not. The site to oEmbed mapping presumably changes less often, so you'll be hitting that rather then the main page server, which is presumably less onerous to serve.

|
Embed
Progress spinner
stsquad@mastodon.org.uk
stsquad@mastodon.org.uk

@Edent
One of the reasons I migrated from WordPress is because it takes a considerable number of resources to serve what is essentially a static page. I would hope most static sites are more resilient.
@bhawthorne

|
Embed
Progress spinner
Edent@mastodon.social
Edent@mastodon.social

Further to the above blog post, whenever my bot posts to Mastodon, my site gets hit by multiple hits for its #oEmbed info.

There really needs to be a (decentralised!) cache for this sort of info.

Log file showing multiple ActivityPub servers hitting my site at the same time.

|
Embed
Progress spinner
imrehg@fosstodon.org
imrehg@fosstodon.org

@Edent naively wouldn't the cache invalidation part of the process be much harder in a decentralised situation?

On the origin server there is just much more flexibility to be to be both "right" with regards to the content and also managing the performance (with local caching).

Just wondering the performance vs. correctness vs. simplicity tradeoffs of this.

|
Embed
Progress spinner