CoderData is a cancer benchmark data package developed in Python and R. There are two aspects of this package, the backend build section and the user facing python package. The build section is a github workflow that generates four cancer datasets in a format that is easy for users and algorithms to ingest. The python package allows users to easily download the data, load it into python and reformat it as desired.
Assuming python>=3.9
is installed on the system, simply run the following command in the terminal to install the most recent release of the coderdata API:
$ pip install coderdata
A full list of available datasets can be retrieved via:
$ coderdata --list
To download datasets, simply run the following command in your terminal substituting <DATASET>
with the desired dataset (e.g. beataml
). To download all datasets use --name all
.
$ coderdata download --name <DATASET>
To download, load, and call datasets in python, simply run the following commands.
>>> import coderdata as cd
>>> cd.download(name='beataml')
>>> beataml = cd.load('beataml')
>>> beataml.experiments
source improve_sample_id improve_drug_id study time time_unit dose_response_metric dose_response_value
0 synapse 3907 SMI_11123 BeatAML 72 hrs fit_auc 0.0564
1 synapse 3907 SMI_11211 BeatAML 72 hrs fit_auc 0.9621
2 synapse 3907 SMI_12192 BeatAML 72 hrs fit_auc 0.1691
3 synapse 3907 SMI_12254 BeatAML 72 hrs fit_auc 0.4245
4 synapse 3907 SMI_12469 BeatAML 72 hrs fit_auc 0.7397
... ... ... ... ... ... ... ... ...
233775 synapse 3626 SMI_7110 BeatAML 72 hrs dss 0.0000
233776 synapse 3626 SMI_7590 BeatAML 72 hrs dss 0.0000
233777 synapse 3626 SMI_8159 BeatAML 72 hrs dss 0.1946
233778 synapse 3626 SMI_8724 BeatAML 72 hrs dss 0.0000
233779 synapse 3626 SMI_987 BeatAML 72 hrs dss 0.7165
[233780 rows x 8 columns]
For more indepth instructions view our Usage page.
Dataset | Cancer Types | Samples | Drugs | Transcriptomics | Proteomics | Mutations | Copy Number |
---|---|---|---|---|---|---|---|
Broad Sanger | 106 | 2053 | 56082 | 1697 | 1008 | 1729 | 1790 |
CPTAC | 10 | 1139 | 0 | 1113 | 1086 | 833 | 1024 |
HCMI | 29 | 758 | 0 | 396 | 0 | 289 | 282 |
BeatAML | 1 | 1022 | 163 | 707 | 210 | 871 | 0 |
MPNST | 1 | 50 | 25 | 35 | 6 | 29 | 32 |