Synthetic Oral Cancer Prediction Dataset
Patient Health Records & Digital Health
Related Searches
Trusted By




"No reviews yet"
£199.99
About
The Synthetic Oral Cancer Prediction Dataset is designed for educational and research purposes to analyse factors associated with oral cancer risk, progression, and treatment outcomes. The dataset includes anonymised, synthetic data on various clinical, lifestyle, and demographic factors for individuals diagnosed with oral cancer.
Dataset Features
- ID: Unique identifier for each participant.
- Country: Country of residence of the participant.
- Age: Age of the participant (in years).
- Gender: Gender of the participant (Male/Female).
- Tobacco Use: History of tobacco use (Yes/No).
- Alcohol Consumption: History of alcohol consumption (Yes/No).
- HPV Infection: Presence of human papillomavirus infection (Yes/No).
- Betel Quid Use: History of Betel quid use (Yes/No).
- Chronic Sun Exposure: History of chronic sun exposure (Yes/No).
- Poor Oral Hygiene: Poor oral hygiene habits (Yes/No).
- Diet (Fruits & Vegetables Intake): Frequency of consuming fruits and vegetables (Yes/No).
- Family History of Cancer: Family history of cancer (Yes/No).
- Compromised Immune System: Whether the participant has a compromised immune system (Yes/No).
- Oral Lesions: Presence of oral lesions (Yes/No).
- Unexplained Bleeding: Presence of unexplained bleeding (Yes/No).
- Difficulty Swallowing: Difficulty in swallowing (Yes/No).
- White or Red Patches in Mouth: Presence of white or red patches in the mouth (Yes/No).
- Tumor Size (cm): Size of the tumor in centimeters.
- Cancer Stage: Stage of the oral cancer (1-4).
- Treatment Type: Type of treatment received (e.g., Surgery, Radiation, Chemotherapy).
- Survival Rate (5-Year, %): 5-year survival rate in percentage.
- Cost of Treatment (USD): Total cost of treatment in USD.
- Economic Burden (Lost Workdays per Year): Economic burden due to lost workdays each year.
- Early Diagnosis: Whether early diagnosis was made (Yes/No).
- Oral Cancer (Diagnosis): Diagnosis of oral cancer (Yes/No).
Distribution

Usage
This dataset can be used for the following applications:
- Cancer Research: Investigate the relationship between various lifestyle, clinical, and demographic factors with oral cancer risk and progression.
- Predictive Modeling: Build machine learning models to predict cancer diagnosis, survival rate, or treatment outcomes based on participant data.
- Healthcare and Public Health: Study the impact of lifestyle factors (e.g., tobacco, alcohol, diet) on the development and progression of oral cancer.
- Educational Purposes: Provide a dataset for students and researchers in oncology, medical data science, and public health fields to analyze cancer risk factors and treatment outcomes.
Coverage
This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a wide array of factors that support diverse research and analysis in the oncology and public health domains.
License
CC0 (Public Domain)
Who Can Use It
- Cancer Researchers: To explore correlations between lifestyle factors, clinical features, and treatment outcomes in oral cancer.
- Oncologists and Healthcare Providers: To analyze the effectiveness of different treatments and factors that affect prognosis and survival.
- Public Health Professionals: To study the broader societal and economic impacts of oral cancer and develop preventive measures.
- Data Scientists and Machine Learning Practitioners: To develop predictive models for diagnosing oral cancer and improving treatment planning.
- Educators and Students: As a resource for studying cancer risk analysis, healthcare data science, and public health analytics.