Advertisement
News

We Tried to Replace 404 Media With AI

Trying to make AI do my job revealed a vast infrastructure of cheap tricks and middlemen that have been trying to game Google Search for more than a decade.
We Tried to Replace 404 Media With AI
The front page of Prototype.Press, an AI-generated "autoblog."

Over the last week I have published dozens of news articles and blogs about technology without lifting a finger on a news website called Prototype.Press. The articles are fairly short, but written in perfect English, and as far as I can tell, accurate. They are also very nicely laid out and categorized into “tech,” “science,” “AI,” and other sections, making it easier for visitors to navigate the cornucopia of content I publish on the site every day. On Monday, June 17, I published 53 articles on everything from the “Top Internet Service Providers in South Dakota” to how “AI-generated images in Google Search Results have provided access to an alternate reality.”

If that latter story sounds familiar that’s because it is a blatant, uncredited rip-off of a story I published on 404 Media the same day. I was able to publish it alongside 52 other articles that day all by myself because I created an entirely autonomous, ChatGPT-powered technology news site that steals other people’s original reporting for just $365.63. 

It wasn’t hard to set up, and didn’t require one of the most advanced large language models in the world, but since this is the second technology news and investigations website I’m running these days, I outsourced its creation to a Fiverr freelancer in Turkey. I told him what I wanted, picked a layout, and two days later got a fully operational website.

What I learned from this experiment is that flooding the internet with an infinite amount of what could pass for journalism is cheap and even easier than I imagined, as long as I didn’t respect the craft, my audience, or myself. I also learned that while AI has made all of this much easier, faster, and better, the advent of generative AI did not invent this practice—it’s simply adding to a vast infrastructure of tools and services built by companies like WordPress, Fiverr, and Google designed to convert clicks to dollars at the expense of quality journalism and information, polluting the internet we all use and live in every day.

Luckily, after going through this process, I also learned that while doing this is profitable to some, the practice relies on a fundamental misunderstanding of what journalism is, what makes it good, and therefore gives me more confidence than ever that a fully automated blog will never be able to replace 404 Media, or other investigative news outlets.

I wanted to try to replace myself and my 404 Media cofounders with AI after the Wall Street Journal and NewsGuard published stories about how easy and cheap it was to spin up “fake news” websites with the help of freelancers on Fiverr. In the WSJ’s case, Jack Brewster showed he was able to pay just $105 for a “pink-slime” partisan news site that covered Ohio politics and was critical of Democratic Senator Sherrod Brown and in favor of his Republican opponent Bernie Moreno. The content technically followed that instruction, but resulted in many nonsense articles that shoehorned supportive statements of Moreno into everything from sports news to obituaries. My aim was a little different: I wanted AI to do my job, as AI company executives have repeatedly said is inevitable. 

Our colleagues in journalism have often complimented 404 Media for its ability to publish a lot of investigative articles quickly, but there’s still just four of us, so at most we publish between two and six articles a day. The process for every story is different, but generally publishing a story on our website involves reporting, writing, editing, and a far more ambiguous process of ideating, processing and synthesizing information, and receiving tips, which relies on decades of experience, perspective, and trust. For that reason, technically publishing a story on 404 Media takes us anywhere between less than an hour to a few months, but in a sense each story also requires an entire lifetime of talking to people, researching, reading, and staring at the internet until it feels like you’re internally bleeding but continuing to do it anyway.

By contrast, erecting Prototype.Press from scratch and publishing more than 50 stories a day there took me less than an hour. 

Here’s how I did it:

Searching Fiverr for the term “automated news” yields 471 freelancers who promise they can create an “automated news website” or “blog” with “Google News approval” and “100% insured revenue” via Google AdSense or other ad networks that can be plugged into the live site. These freelancers also promise optimized “content strategy,” RSS feeds, “auto sharing” and search engine optimization (SEO) via WordPress plug-ins like Rank Math and Yoast, which help users tweak their Google listing and make suggestions on how to improve SEO. 

These freelancers offer their services for as little as $10 and for as much as $300 per site. Searching Fiverr for “autoblog” yielded 606 freelancers who offered much the same services, but that search term also encompassed freelancers who create fully automated “Amazon affiliate marketing autopilot” websites. These sites, which are monetized by shilling Amazon affiliate links, sometimes look like news sites, but sometimes look like product review sites.

Just a few of the freelancers on Fiverr who offer autoblog services.

I decided to hire Mohamed Sawah, who NewsGuard had identified as creating four AI-generated news sites: WestObserver.com, NewYorkFolk.com, GlobeEcho.com, and TrendFool.com. When I said I was interested in his services, he quickly showed me a list of 36 websites he “made recently to see how it works:”

Some of these sites, like Lit Sports News, are just automatically copy/pasting articles from other news sources, including images, and reposting them. One story titled “Bulls active as NBA Draft nears,” for example, just copies an NBC Sports Chicago article word for word, even including ad copy at the top of the body of the article saying the story is “Presented by Nationwide Insurance agent Jeff Vukovich.”

Some of these sites appear to exist solely to promote Sawah's services and were not made for actual clients. They update constantly, but instead of real banner ads from a service like Google’s AdSense, they contain static images that look like ads for Rolex or, weirdly, Amazon the company generally and not a specific Amazon service or product. It's either impossible to click on these ad units or doing so leads to a 404 error page on the site.

Other sites Sawah made are a bit more complex. TheLiberal.News, for example, a “one-stop website for the latest news and updates about UK and the World” appears to be copying reporting from BBC and CNN but it’s hard to say for certain because the text of the articles are not copy/pasted like they are with Lit Sports News. It publishes just as many articles, however, keeping up with the news of the day, and notably includes AI-generated images at the top of each of those articles, which feature IDF soldiers with deformed limbs and a thermometer with mangled numbers illustrating a heatwave in the UK. 

Sawah has 778 five star reviews on Fiverr, which seems too good to be true, but my experience as a client was very good as well. At least one genuine review is for a site called therealcrimediary.com.

A review from the user who commissioned therealcrimediary.com on Fiverr

therealcrimediary.com, publishes dozens of articles about crime and lists the publication it lifts them from, like Fox 11 Los Angeles, as the author. Sometimes it seems to accidentally repost content in Russian. It also has an Instagram account with hundreds of posts with AI-generated images, a YouTube channel with over 100 AI-generated videos, as well as a TikTok, LinkedIn, Pinterest, and Facebook page, all pumping those platforms with AI-generated content. All these accounts and therealcrimediary.com are attempting to monetize this content with ads, merchandise like t-shirts and hoodies, and a Patreon. I can’t imagine therealcrimediary.com is making a lot of money that way, but maybe it’s a profitable form of passive income given that Sawah’s listed price on Fiverr was only $100. The administrator of the site did not respond to a request for comment.

Sawah’s told me the website he’d build for me would run on WordPress, that I’ll have full control of it from a WordPress dashboard, and that it would work by pulling content from other websites and posting it to my own after I told him what topics I wanted the site to cover. 

“We can use AI to rewrite/generate the content we pull from the other websites to make it more unique,” he told me over Fiverr’s chat function. He then told me to choose which layout I wanted from his list of sites or a list of generic themes. I picked one of the generic themes that was staged like a cryptocurrency news site

While Sawah listed the price for $100, as our conversation continued I discovered my final cost would be more than triple that. I had to buy a domain name and pay for hosting. For the domain name, I excavated a list of names we considered for the site before we settled on 404 Media, and picked Prototype.Press (that’s not even the worst name on that list), and registered that domain via a host Sawah recommended, Hostinger. The cost of hosting the site including the domain name was $59.88 for a year.

📰
Do you know anything else about AI-generated websites? I would love to hear from you. Using a non-work device, you can message me securely on Signal at ‪(609) 678-3204‬. Otherwise, send me an email at emanuel@404media.co.

Sawah said he also needed a ChatGPT API key, which we have for reporting reasons, and for me to buy and then provide him with a purchase code for a WordPress plugin called WordPress Automatic Plugin, which cost $42. 

This is the part of the process when it became apparent to me that I didn’t need what OpenAI calls a “frontier AI” model, any other modern generative AI tools, or even Sawah, to create an “autoblog.” According to Code Canyon, the marketplace where I bought the WordPress Automatic Plugin, it has been bought almost 40,000 times, and the company who makes it, ValvePress, has been selling it there since March of 2011, more than a decade before ChatGPT hit the market.

As its description on Code Canyon explains, “WordPress Automatic Plugin posts from almost any website to WordPress automatically.” Today, users can provide it with a ChatGPT API key so it will use it to reword and summarize articles repurposed for their automatic sites, but before ChatGPT came along, users could use (and can still use) an article “spinner” called The Best Spinner, or any number of other article spinner competitors.

As we explained when we started asking 404 Media readers for their emails in January, a spinner is essentially an automatic thesaurus tool that replaces words in a given text with synonyms, allowing people to instantly create thousands of slightly different versions of articles. It is not “AI,” but it accomplishes the same task of stealing someone else’s work and replicating it infinitely with slight variations. 

Sawah wasn’t coding me a site from scratch or executing some complicated implementation of ChatGPT. He took a bunch of off the shelf tools created for this exact purpose and plugged them in for a small price. I would compare it to paying someone to hang your TV on the wall. Could I do it myself? Probably. Do I trust myself to do it well? Absolutely not. 

Finally, Sawah asked me “what kind of tech news” I wanted on my website. This is a daily existential question at 404 Media, but caught off guard I said “hard news if that makes sense.” I also said I liked the stories on 404 Media, and that The Verge is also a good website. I asked if he could have the site use AI-generated images as well, but Sawah advised me against it, saying it would be more expensive “not 100 % accurate,” and require another API key from getimg.ai. He recommended we just rip images from the sites we were stealing articles from and I agreed. 

Then Sawah told me it was actually going to be $250, not $100, because the basic package does not include AI, which, after tax and service fees, is how we got to $365.63.

Two days later Sawah was back with a fully automatic Prototype.Press, already loaded with articles. He even made a logo.

The front page of Prototype.Press

I did not see what instructions Sawah gave ChatGPT to generate the articles, but based on the copy and what the site looks like on the backend, the site is just feeding ChatGPT RSS feeds of articles from other sites and asking the AI to summarize them, which creates a few problems in the copy. 

The articles are well written and didn’t contain any factual errors I could find, but sometimes referenced “the author.”

For example, in the Prototype.Press article titled “AI-generated images in Google Search Results have provided access to an alternate reality,” which is an AI rewrite of my article AI Images in Google Search Results Have Opened a Portal to Hell, the article includes sentences like “The author stumbled upon this issue while researching a story about fan pages on Facebook being taken over by inappropriate images and scams,” and “However, the effectiveness of these policies in addressing AI-generated content remains unclear based on the author’s observations.” In both cases, the article is referring to me, the person who wrote the original article, but you wouldn’t know that from reading the cloned article, which doesn’t name or link back to me.

In another instance, ChatGPT introduced some both-sidesism where it didn’t originally exist. On June 18 I published an article titled “Users ‘Jailbreak’ AI Video Generator to Make Porn,” which explains how some users bypassed safeguards on an AI video generator called Dream Machine to create porn. It’s a straightforward article that explains what Dream Machine is, what is happening in the AI video generation space, how people bypassed the safeguards, and doesn’t include any editorializing aside from my pointing to the fact that adult content is driving a lot of the interest and progress in generative AI. 

The ChatGPT rewrite of my article, titled “AI Video Generator Hacked by Users to Produce Pornographic Content,” ends with the following paragraph:

“Overall, the issue with Dream Machine highlights the complex nature of AI technology and the potential risks associated with its misuse. While these tools offer exciting possibilities for content creation and artistic expression, they also present challenges in ensuring responsible and ethical use. Developers and users alike must work together to address these concerns and implement effective safeguards to prevent the unauthorized generation of explicit content.”

This is not an idea or sentiment that’s included in my original article, and is a kind of cookie cutter platitude that concludes many of the articles on Prototype.Press.

I showed Prototype.Press to the rest of the 404 Media team, and we thought it was funny for approximately 30 seconds before we noticed that Prototype.Press wasn’t only lifting stories from our site, but an in-depth investigation from Wired. On the site’s WordPress dashboard, I could see that it was automatically creating and publishing ChatGPT generated rewrites of articles from 404 Media, Wired, Forbes’s innovation section, CNET, and The Verge. OpenAI did not respond to a request for comment.

The list of websites Prototype.Press was scraping and rewriting with AI.

For many reasons, least of all that it seems rude to rip off our colleagues who work at these websites even if Prototype.Press was getting close to zero traffic, it seemed like a bad idea to have it live and accessible, so I put it in maintenance mode. This way the site continued to automatically create content but was not publicly viewable. 

The articles on Prototype.Press are not good if you actually sit down and read them, but they are arguably serviceable if you read them in the way that people consume most articles on the internet. They read the headline, the subheadline or dek (that little summary that appears under the headline), and maybe a quick scan of the actual article, scrolling down to read the first line of a paragraph to determine whether to keep reading. 

The article you are currently reading is at this point almost 2,500 words long, so if you are reading these words you are in the minority of people who are either deeply interested in the subject and care about the details and the order in which I’ve arranged them here, or you are one of the sickos who enjoys reading. If you read articles for one of those two reasons, Prototype.Press can’t help you. 

Then, of course, there’s the issue that all generative AI has. 404 Media publishes a lot of scoops and news, and the AI-generated Prototype.News by definition is just remixing existing content. Prototype.Press wasn’t “trained” on 404 Media and isn’t scouring the open internet and synthesizing information in order to come up with article ideas it then reports out and writes. It is, much like the article spinner that came around years before generative AI, rearranging words in an article someone else already wrote. 

“I think the uses for these types of website changed a lot in the past year after AI, [e]specially for news websites,” Sawah said after I told him I’m writing an article about autoblogs, his service, and its potential impact on the health of the internet. “It's not aggregators anymore, it's a tool to help the editor find new trending topics to write about, where he can copy AI created content to his main article, or even make some changes to the article and publish it.”

Sawah added that he always recommends his clients credit the website they’re sourcing their articles from, that most of his clients use the generated content as tool to help them write, and that some sites use “fully” AI-generated content to write informative articles on certain keywords that don’t directly steal from other news sites. 

“Created content is not that great, but I don’t think it’s bad for the internet,” he said.

Muhammad Atef, the developer of the WordPress Automatic Plugin, told me that his plugin can important content from a variety of sources (YouTube, Facebook, TikTok, etc), and while it can also AI-generate content, some customers use it to automatically crosspost from Facebook to WordPress. 

“We do not encourage others to steal content using our plugin,” Atef told me. “The user is responsible for where to grab the content from and if it is allowed by the source or not so the plugin is a tool and the final product depends on the user usage. We even provide instructions on how to block our plugin from scraping a specific site.”

Atef also said that he understands why someone would be upset if they found his plugin was copying their content, because people copy his work as well. 

“Stealing anything (not just content) is nasty and against morals and religion while it existed and exists and will exist. We even find our plugins copied and sold by others (cheaper),” Atef said, and compared the practice to the way some companies create counterfeit clothes or fragrances. “[T]hat should not stop the original creator from his business and is considered a part of the game.”

“Our AI policy and our Community Standards state that Fiverr Users (meaning, freelancers and the businesses that hire them) are expected to respect intellectual property (IP) rights, including copyright and trademark protections, even when content is freely available, and to maintain a record when collaborating or using external content. Deliveries created on Fiverr should not violate the IP of others,” a Fiverr spokesperson told me in an email. “Regarding offering autoblogs, we do not ban freelancers offering this service because there are many legitimate uses for these types of AI website-building services—situations where freelancers may not be pulling from copyrighted sources—but rather open sources (such as governmental ones), content which is open for redistribution (museum institutions and cultural heritage), and educational portals (such as Wikimedia).”

When asked if Fiverr has any concerns about how these autoblogs might negatively impact the internet broadly, the spokesperson said “Yes; we take issues like this very seriously. We have a team of marketplace integrity experts working around the clock who use a combination of algorithms, third-party tools, and manual checks to remove services that don’t adhere to our policies and help us provide a safe environment for our community.”

ChatGPT, Sawah, Atef, and the hundreds of other Fiverr freelancers who make autoblogs for as little as $10 can’t, at this point, replace what we do as journalists, but they exist because they can monetize filling the internet with garbage. We can’t and are not really competing with them because their business models don’t work well enough for us. We sell some merch and some ads, but there’s not enough money in either of those things to support our work and lives, which is why the majority of our income is from paying subscribers who want to support us. 

"It depends on the traffic, topics, and traffic location. So it varies a lot," Sawah said when I asked how much money he thinks his clients make from these sites. 

“Search engines are now much more mature thanks to algorithms, AI, and machine learning to know if this current copy of the content was copied from a specific source and which exact source,” Atef said. “This [...] was a concern for search engine developers and now should be considered a closed issue for more than 10 years now. Check this tweet for example back to 2014. So successful sites usually do not care if someone else copied their content as it will be rare that this malicious site outranks the original content.”

“We build our ranking systems to surface high-quality content that’s made for people, not content that’s made to rank well on search engines,” a Google spokesperson told me in an email. “Our spam-fighting systems and policies are specifically designed to tackle low-quality, unoriginal content that’s created at scale to manipulate search ranking.”

Google also pointed me to the March 2024 update to its ranking systems, which aims to reduce spammy, unoriginal content on Search. It also emphasized that the presence of AI-generated content on a page is not inherently a policy violation and that that it evaluates content based on its value and adherence to all other existing publisher policies, not how that content was created.

AJ Kohn, who provides SEO services to companies like Pinterest and Genius with his company Blind Five Year Old, mostly agrees. He said these autoblogs, automated websites, or “splogs” (spam blogs) as they used to be called, are at least 15 years old, and that Google has gotten quite good at sniffing them out and not placing them high in search results, most of the time. However, while Google is good at spotting these sites, it’s not perfect, and people wouldn’t be making so many of these sites if they didn’t make money.

“What I will tell you is that a lot of these things work, but they work for a very short amount of time,” Kohn told me. “But here's the deal. It doesn't need to work for a long time for it to make enough money to be worthwhile [...] I don't think people need to be that creative right now to make these things work. As long as you're smart about the targeting of it, it's probably going to stay up for quite some time. I know I have seen sites that have gotten a ton of traffic very quickly and you know they’ll go away but not before they make a boatload of money.”

Using some sites that he runs on the side for comparison, Kohn estimated that some of these sites get $20 CPM (cost per thousand views), and that these sites could easily get 500,000 clicks before Google catches on, which would earn them around $10,000

The people who facilitate the tools and services that make these autoblogs possible—Fiverr, WordPress, OpenAI, Google, Sawah, Atef—are making money whether these autoblogs make money or not, and they are not hard to find.

One of the first things you see when you buy a domain and hosting services from Hostinger is its WordPress AI tool which promises to “Say goodbye to writer’s block—our AI is always having a good writing day.”

The Hostinger dashboard prompting users to try generating content for their website with AI.

RankMath, the WordPress plugin that helps users with SEO, now also offers to generate articles for SEO. Jetpack, a WordPress plugin that’s made by WordPress’s parent company Automattic, also offers an article generating AI tool that promises to help users “write smarter, not harder.” Searching the WordPress plugin directory for “AI writer” brings up 86 results for AI content generators. 

What Sawah, Atef, and generative AI more broadly have been doing to the internet for the last couple of years isn’t new. They are just pumping out more content with less thought, faster, and better than ever. It seems really bad and it is because there’s so much of it now wherever we look, but they are only taking advantage of an internet that has been hollowed out by SEO, a system for discoverability on Google that has incentivized pumping out endless articles with no effort.