Advertisement
Generative AI

Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

Wordfreq shuts down because "I don’t think anyone has reliable information about post-2021 language usage by humans.”
Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’
Image: Anne Nygård
🖥️
404 Media is an independent website whose work is written, reported, and owned by human journalists and whose intended audience is real people, not AI scrapers, bots, or a search algorithm. Sign up to support our work and for free access to this article. Learn why we require this here.

The creator of an open source project that scraped the internet to determine the ever-changing popularity of different words in human language usage says that they are sunsetting the project because generative AI spam has poisoned the internet to a level where the project no longer has any utility. 

Wordfreq is a program that tracked the ever-changing ways people used more than 40 different languages by analyzing millions of sources across Wikipedia, movie and TV subtitles, news articles, books, websites, Twitter, and Reddit. The system could be used to analyze changing language habits as slang and popular culture changed and language evolved, and was a resource for academics who study such things. In a note on the project’s GitHub, creator Robyn Speer wrote that the project “will not be updated anymore.”

Sign up for free access to this post

Free members get access to posts like this one along with an email round-up of our week's stories.
Subscribe
Advertisement