CoderData Cancer Omics and Drug Experiment Response Data (`coderdata`) Python Package

Introduction

CoderData is a cancer benchmark data package developed in Python and R. There are two aspects of this package, the backend build section and the user facing python package. The build section is a github workflow that generates four cancer datasets in a format that is easy for users and algorithms to ingest. The python package allows users to easily download the data, load it into python and reformat it as desired.

HCMI Summary

Human Cancer Models Initiative (HCMI) data was collected though the National Cancer Institute (NCI) Genomic Data Commons (GDC) Data Portal. This data encompasses numerous cancer types and includes cell line, organoid, and tumor data. Data includes the transcriptomics, somatic mutation, and copy number datasets.

Dataset Unique_Entrez_IDs Unique_Sample_IDs
Transcriptomics 19298 479
Mutations 17702 368
Copy_number 19316 350

Visualization

HCMI Figure
HCMI Circos