CoderData is a cancer benchmark data package developed in Python and R. There are two aspects of this package, the backend build section and the user facing python package. The build section is a github workflow that generates four cancer datasets in a format that is easy for users and algorithms to ingest. The python package allows users to easily download the data, load it into python and reformat it as desired.
To install coderdata, simply run the following command in your terminal:
pip install coderdata
To download datasets, simply run the following command in your terminal. Remove the prefix argument if you’d like to install all datasets.
coderdata download --prefix hcmi
To download, load, and call datasets in python, simply run the following commands.
import coderdata as cd
cd.download_data_by_prefix('hcmi')
hcmi_data = cd.DatasetLoader('hcmi')
hcmi_data.transcriptomics
View our Usage page for full instructions.
Dataset | Cancer Types | Samples | Drugs | Transcriptomics | Proteomics | Mutations | Copy Number |
---|---|---|---|---|---|---|---|
Broad Sanger | 106 | 2053 | 56082 | 1697 | 1008 | 1729 | 1790 |
CPTAC | 10 | 1139 | 0 | 1113 | 1086 | 833 | 1024 |
HCMI | 29 | 758 | 0 | 396 | 0 | 289 | 282 |
BeatAML | 1 | 1022 | 163 | 707 | 210 | 871 | 0 |
MPNST | 1 | 50 | 25 | 35 | 6 | 29 | 32 |