Thousands of Internal AI Training Datasets, Tools Exposed to Anyone on the Internet

· Oct 9, 2024 at 10:35 AM

“In addition to the ML models themselves, the exposed data can include training datasets, hyperparameters, and sometimes even raw data used to build models,” a security researcher said.

Thousands of Internal AI Training Datasets, Tools Exposed to Anyone on the Internet — Image: 404 Media.

Thousands of machine learning tools, including some belonging to large tech companies, are exposed to the open internet, letting anyone interact with them and potentially exposing sensitive data, according to a security researcher who shared their findings with 404 Media.

The news shows that even while companies and researchers barrel ahead with artificial intelligence research, securing those tools sometimes boils down to the same old account security and authentication best practices that apply to other types of accounts.

“In addition to the ML models themselves, the exposed data can include training datasets, hyperparameters, and sometimes even raw data used to build models,” Charan Akiri, the security researcher and a lead security engineer at Reddit, said in a write-up of his research.