Synthetic Breast Cancer Data

The data is generated using the Synthetic Data Generator which generates process-based breast cancer treatment data following the distribution in a real population of breast cancer patients. The collection comprises a total of 18 data sets, nine for relational databases and nine for RDF-based knowledge graphs. For each data format, there are three different sizes of data sets:

  • Small models 1,000 patients
  • Medium-sized models 10,000 patients
  • Large models 100,000 patients

There are three data sets of each size. They differ in the parameter used for the mutation probability of the data generator. The lower this value is, the closer the data is to following the treatment guideline for breast cancer patients with an amplified HER2 gene.

BibTex: