AI TRAINING DATA

Licensed AI training datasets for machine learning, LLM fine-tuning, NLP models, generative AI, and data-driven applications.

High-quality datasets for AI model training, LLM fine-tuning, NLP systems, RAG pipelines, computer vision, predictive analytics, and generative AI applications. Includes labeled datasets, benchmarking data, and privacy-safe training data designed to improve model accuracy and accelerate AI development.

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training data product
Day By Day Recovery Resources provider on Opendatabay data collection card

Day By Day Recovery Resources

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training

This dataset provides 100 structured, scene-level behavioral records modeling the developmental traj...

Number of records

100

Size

354.0 KB

Simaihub Expert Navigation Foundation Pack data product
SimAIHub provider on Opendatabay data collection card

SimAIHub

Simaihub Expert Navigation Foundation Pack

Simaihub Expert Navigation Foundation Pack(V1) Product: Reinforcement-learning-ready expert naviga...

Number of records

1K

Size

10.6 GB

Sport Cars 2 sound recordings Audio Dataset for ML AI training data product
Krampfstadt Studio provider on Opendatabay data collection card

Krampfstadt Studio

Sport Cars 2 sound recordings Audio Dataset for ML AI training

Shift into high gear with Sport Cars 2, a premium collection featuring 10 iconic sports cars and per...

Number of records

2.2K

Size

26.7 GB

Sport Cars sound recordings Audio Dataset for ML AI training data product
Krampfstadt Studio provider on Opendatabay data collection card

Krampfstadt Studio

Sport Cars sound recordings Audio Dataset for ML AI training

Ignite your projects with the aggressive roar and precision engineering of the Sport Cars library. T...

Number of records

2.4K

Size

23.7 GB

Scooter motorcycles sound recordings Audio Dataset for ML AI training data product
Krampfstadt Studio provider on Opendatabay data collection card

Krampfstadt Studio

Scooter motorcycles sound recordings Audio Dataset for ML AI training

A collection of nine different scooter motorcycle sound recordings, all made by professional recorde...

Number of records

715

Size

7.9 GB

DinoDS Lane 05: Conversation Mode. data product
Dino provider on Opendatabay data collection card

Dino

DinoDS Lane 05: Conversation Mode.

About Dino Data Conversation Mode Preview is a focused assistant-training dataset built from Lan...

Number of records

100

Size

29.0 KB

Foundation Intelligence data product
Vivameda provider on Opendatabay data collection card

Vivameda

Foundation Intelligence

Vivameda - Longitudinal Company Evolution Panel A 70-year longitudinal panel of company workforce ...

Number of records

48M

Size

5.0 GB

JWR Sample CSV data product
JWR provider on Opendatabay data collection card

JWR

JWR Sample CSV

Over 2000 pii-free, human-written, international, professional articles including reviews: Film/DVD...

Number of records

2.1K

Size

30.6 KB

JWR 9 sample xls data product
JWR provider on Opendatabay data collection card

JWR

JWR 9 sample xls

Over 2000 pii clean, human-written articles, commentaries, reviews and famous quotes from around th...

Number of records

2.1K

Size

30.6 KB

Forbes Magazine Archive (1917–1924) — Cleaned & AI‑Ready data product
Devin Media Corp. provider on Opendatabay data collection card

Devin Media Corp.

Forbes Magazine Archive (1917–1924) — Cleaned & AI‑Ready

Train your model on the origin story of one of the most iconic business publications in American hi...

Number of records

800

Size

31.1 MB

Industrial Electric Motor Thermography Dataset data product
Dira Reliability S.L. provider on Opendatabay data collection card

Dira Reliability S.L.

Industrial Electric Motor Thermography Dataset

This data product consists of a structured dataset of real thermographic inspections of electric mot...

Number of records

5.8K

Size

991.0 MB

Afrilab Hausa Dictionary Dataset v1.0  data product
Afrilab AI Hub provider on Opendatabay data collection card

Afrilab AI Hub

Afrilab Hausa Dictionary Dataset v1.0

The Afrilab Hausa Dictionary Dataset v1.0 is a structured lexical resource containing curated Hausa...

Number of records

3.9K

Size

2.7 MB

Show More Results