Synthetic Gastric Cancer Prediction Dataset
Patient Health Records & Digital Health
Related Searches
Trusted By




"No reviews yet"
£199.99
About
The Synthetic Gastric Cancer Prediction Dataset has been generated for educational and research purposes to support the analysis of clinical, pathological, and demographic factors associated with gastric cancer. This synthetic, anonymized dataset provides valuable insights into staging, histological characteristics, and invasion patterns that influence disease progression and prognosis.
Dataset Features
- Patient: Unique identifier for each individual.
- Sex: Biological sex of the patient (Male/Female).
- Age: Age of the patient (in years).
- T staging: Tumor size and extent of invasion into nearby tissues (e.g., T1, T2, T3).
- N staging: Lymph node involvement status (e.g., N0, N1, N2).
- M staging: Metastasis presence (M0: No metastasis, M1: Distant metastasis).
- Comprehensive Staging: Combined TNM stage classification.
- Histological Type: Cellular classification of the tumor (e.g., adenocarcinoma).
- Lauren Classification: Intestinal (0), Mixed (1), or Diffuse (2) type.
- Lymphovascular Invasion: Presence of cancer cells in lymphatic or blood vessels (0 = Negative, 1 = Positive).
- Venous Invasion: Tumor cells detected in veins (0 = Negative, 1 = Positive).
- Perineural Invasion: Cancer spread along or around nerves (0 = Negative, 1 = Positive).
- Stroma Quantity: Tumor stroma characteristics (Medullary = 0, Intermediate = 1, Scirrhous = 2).
- Tumor Infiltration Pattern: Qualitative assessment of the tumor's infiltration behavior.
- HER-2: Human Epidermal Growth Factor Receptor 2 status (0 = Negative, 1 = 1+, 2 = 2+, 3 = 3+).
Distribution

Usage
This dataset can be used for the following applications:
- Cancer Research: Study the relationship between staging, histological subtypes, and tumor invasiveness in gastric cancer.
- Predictive Modeling: Train machine learning models to predict outcomes such as HER-2 status, metastatic potential, or overall stage.
- Clinical Insight: Explore how combinations of clinical markers relate to patient prognosis and treatment planning.
- Educational Purposes: Provide students and researchers with hands-on experience working with real-world-like oncology data.
Coverage
The dataset contains 100,000 synthetic entries with realistic variation in clinical and pathological features. It is fully anonymized and adheres to data privacy standards, supporting exploratory and predictive analysis across oncology and medical informatics.
License
CC0 (Public Domain)
Who Can Use It
- Medical Researchers and Oncologists: To evaluate gastric cancer risk factors and invasion markers.
- Data Scientists: To develop and benchmark predictive algorithms for cancer staging and HER-2 expression.
- Healthcare Educators and Students: As a comprehensive tool for teaching cancer data analysis and medical modeling.