A machine learning librarian at Hugging Face just released a dataset composed of one million Bluesky posts, complete with when they were posted and who posted them, intended for machine learning research.
Daniel van Strien posted about the dataset on Bluesky on Tuesday:
“This dataset contains 1 million public posts collected from Bluesky Social's firehose API, intended for machine learning research and experimentation with social media data,” the dataset description says. “Each post contains text content, metadata, and information about media attachments and reply relationships.”
This post is for paid members only
Become a paid member for unlimited ad-free access to articles, bonus podcast content, and more.
Subscribe
Sign up for free access to this post
Free members get access to posts like this one along with an email round-up of our week's stories.
Subscribe
Already have an account? Sign in