sod
sod

I teamed up with OpenAl's Whisper and had a lot of fun hacking on interactive transcripts this weekend. The neural net handled the transcription, so I, the human, could focus on programming and implementing the user interface.

See the demo below or live (in Swedish).

|
Embed
tkoola
tkoola

@sod oh, that is very cool!

|
Embed
teisam
teisam

@sod And here I was slightly happy that I was able to create and save a textfile with a transcript by following the instructions at the whisper.cpp project.

But very cool. Hope most podcasts do something like this soon for searchability.

|
Embed
manton
manton

@sod That looks amazing.

|
Embed
sod
sod

@teisam I hope that too! How does Whisper handle Norwegian? I've encountered a few oddities with Swedish transcriptions, but overall I'm very impressed.

|
Embed
teisam
teisam

@sod haven’t tested that yet, but will look into that in a day or two.

|
Embed
sod
sod

@manton Imagine automatic transcripts and subtitles for microcasts and short videos hosted on Micro.blog. Making them more accessible and great for discoverability and search. That would be a pretty cool feature! 😊

|
Embed
manton
manton

@sod I would love that. We did a handful of transcripts for Micro Monday, but it was too time-consuming to ever fit into the normal routine of producing shows.

|
Embed
In reply to
holgerfrohloff
holgerfrohloff

@sod that looks great. I tried it with German text translated into English by whisper and it worked flawlessly. Even stuttering and filler-words were filtered out.

|
Embed
teisam
teisam

@sod Tested with the King's New Year Speech, since it was already transcribed at the royal website. 1100ish word. Made textfiles with one word (or one word+punctuation) per line and diffed it. 98 differences, 50% where punctuation mismatches. Which is probably a bit difficult to get from speech. 7-8 words that was totally wrong spelling for norwegian. A few mismatches in plurals. But it also showed that the royal transcription had spelling errors in a few spots. Just one total erroneous word where whisper transcribed a word with the total opposite meaning (urolig vs rolig (and not the swedish rolig))

But it also added quotation marks around two sentences that I think is natural to add them to that the royal transcription didn't have.

All in all. very impressed based on this little test. I think I would subscribe to a podcast player that did this with all podcast episodes I playes.

|
Embed
sod
sod

@teisam Impressive! Thanks for the detailed report, I might try the same exercise with the Christmas speech by the Swedish King.

I know that the podcast player Snipd offers AI-generated transcripts, but only for podcasts in English.

|
Embed
sod
sod

@holgerfrohloff Yeah, it really is remarkable!

|
Embed
teisam
teisam

@sod Barely tested it, and the description of creating snips made me think about @gr36 excellent post about Airr. But I also see that they have a searchable transcript for all?/some? podcasts, so you don't need to make snips. And kinda creates their own chapters. In fact very close to what I want.

But if someone made something like Overcast with searchable Whisper transcripts that perhaps some logging (opt in) for when I listened to the podcast. Would have been perfect.

|
Embed