Dark Mode

Home

Data Categories

Premium Quality Data

Wikipedia articles

Bright Data

Verified Data Provider

Wikipedia articles

Data Science and Analytics

Wikipedia

Text Mining

Open Data

Knowledge Graph

Machine Learning

Trend Analysis

Content Mining

Semantic Web

Digital Humanities

Open Knowledge

Brightdata

Bright

Data

Trusted By

Wikipedia articles Dataset on Opendatabay data marketplace

"No reviews yet"

£190

About

Access a wealth of information, including article titles, raw text, images, and structured references. Popular use cases include knowledge extraction, trend analysis, and content development.

Use our Wikipedia Articles dataset to access a vast collection of articles across a wide range of topics, from history and science to culture and current events. This dataset offers structured data on articles, categories, and revision histories, enabling deep analysis into trends, knowledge gaps, and content development.

Tailored for researchers, data scientists, and content strategists, this dataset allows for in-depth exploration of article evolution, topic popularity, and interlinking patterns. Whether you are studying public knowledge trends, performing sentiment analysis, or developing content strategies, the Wikipedia Articles dataset provides a rich resource to understand how information is shared and consumed globally.

Dataset Features

url: Direct URL to the original Wikipedia article.
title: The title or name of the Wikipedia article.
table_of_contents: A list or structure outlining the article's sections and hierarchy.
raw_text: Unprocessed full text content of the article.
cataloged_text: Cleaned and structured version of the article’s content, optimized for analysis.
images: Links or data on images embedded in the article.
see_also: Related articles linked under the “See Also” section.
references: Sources cited in the article for credibility.
external_links: Links to external websites or resources mentioned in the article.
categories: Tags or groupings classifying the article by topic or domain.
timestamp: Last edit date or revision time of the article snapshot.

Distribution

Data Volume: 11 Columns and 2.19 M Rows
Format: CSV

Usage This dataset supports a wide range of applications:

Knowledge Extraction: Identify key entities, relationships, or events from Wikipedia content.
Content Strategy & SEO: Discover trending topics and content gaps.
Machine Learning: Train NLP models (e.g., summarisation, classification, QA systems).
Historical Trend Analysis: Study how public interest in topics changes over time.
Link Graph Modeling: Understand how information is interconnected.

Coverage

Geographic Coverage: Global (multi-language Wikipedia versions also available)
Time Range: Continuous updates; snapshots available from early 2000s to present.

License

CUSTOM

Please review the respective licenses below:

Data Provider's License
- Bright Data Master Service Agreement

Who Can Use It

Data Scientists: For training or testing NLP and information retrieval systems.
Researchers: For computational linguistics, social science, or digital humanities.
Businesses: To enhance AI-powered content tools or customer insight platforms.
Educators/Students: For building projects, conducting research, or studying knowledge systems.

Suggested Dataset Names

Wikipedia Corpus+
Wikipedia Stream Dataset
Wikipedia Knowledge Bank
Open Wikipedia Dataset

Pricing

Based on Delivery frequency

~Up to $0.0025 per record. Min order $250

Approximately 283 new records are added each month. Approximately 1.12M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.

Monthly

New snapshot each month, 12 snapshots/year Paid monthly

Quarterly

New snapshot each quarter, 4 snapshots/year Paid quarterly

Bi-annual

New snapshot every 6 months, 2 snapshots/year Paid twice-a-year

One-time purchase

New snapshot one-time delivery Paid once

Dataset Information

VIEWS

DOWNLOADS

LICENSE

CC0

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Bright Data

Wikipedia articles

Data Science and Analytics

Related Searches

Wikipedia

Text Mining

Open Data

Knowledge Graph

Machine Learning

Trend Analysis

Content Mining

Semantic Web

Digital Humanities

Open Knowledge

Brightdata

Bright

Data

Trusted By

£190

About

Pricing

Based on Delivery frequency

Dataset Information