CoderData Cancer Omics and Drug Experiment Response Data (`coderdata`) Python Package

Introduction

CoderData is a cancer benchmark data package developed in Python and R. There are two aspects of this package, the backend build section and the user facing python package. The build section is a github workflow that generates four cancer datasets in a format that is easy for users and algorithms to ingest. The python package allows users to easily download the data, load it into python and reformat it as desired.

Installation and Usage

Bash / Command Line

To install coderdata, simply run the following command in your terminal:

pip install coderdata
Bash / Command line

To download datasets, simply run the following command in your terminal. Remove the prefix argument if you’d like to install all datasets.

coderdata download --prefix hcmi
Python

To download, load, and call datasets in python, simply run the following commands.

import coderdata as cd

cd.download_data_by_prefix('hcmi')

hcmi_data = cd.DatasetLoader('hcmi')

hcmi_data.transcriptomics

View our Usage page for full instructions.

Datasets

Dataset Cancer Types Samples Drugs Transcriptomics Proteomics Mutations Copy Number
Broad Sanger 106 2053 56082 1697 1008 1729 1790
CPTAC 10 1139 0 1113 1086 833 1024
HCMI 29 758 0 396 0 289 282
BeatAML 1 1022 163 707 210 871 0
MPNST 1 50 25 35 6 29 32

Data Overview

Summary 1
Summary 2