This is Behind the Blog, where we share our behind-the-scenes thoughts about how a few of our top stories of the week came together. This week, we discuss LLMs and languages, and a big data breach.
JASON: As has been mentioned many times, I’ve been on vacation the last week, which has been amazing and very much needed. I’m popping in because I believe this is technically the last behind the blog before our one year anniversary, and wanted to just say again that we are endlessly thankful for all of the support you’ve given us. I am also grateful to Joseph and Sam for keeping the lights on this week; I have barely looked at the internet in like nine days.
I do have a real behind the blog before I get into the sappy stuff. I wanted to talk briefly about how I did the reporting on my latest Facebook AI slop story, in which I found a lot of the people who are making bizarre AI on Facebook. Specifically, I wanted to talk about how I parsed a bunch of instructional videos in Hindi, Urdu, and Vietnamese, all of which are languages I do not speak. I probably watched about 10 hours of videos in Hindi for this article, and quickly scanned through dozens of videos that were much longer than that in total.