Advertisement
Bluesky

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

A Hugging Face employee made a huge dataset of Bluesky posts, and it’s already very popular. 
Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

A machine learning librarian at Hugging Face just released a dataset composed of one million Bluesky posts, complete with when they were posted and who posted them, intended for machine learning research.

Daniel van Strien posted about the dataset on Bluesky on Tuesday:

First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts 🦋 📊 1M public posts from Bluesky's firehose API 🔍 Includes text, metadata, and language predictions 🔬 Perfect to experiment with using ML for Bluesky 🤗 huggingface.co/datasets/blu...

Daniel van Strien (@danielvanstrien.bsky.social) 2024-11-26T13:50:34.824Z

“This dataset contains 1 million public posts collected from Bluesky Social's firehose API, intended for machine learning research and experimentation with social media data,” the dataset description says. “Each post contains text content, metadata, and information about media attachments and reply relationships.” 

Sign up for free access to this post

Free members get access to posts like this one along with an email round-up of our week's stories.
Subscribe
Advertisement