CoderData Cancer Omics and Drug Experiment Response Data (`coderdata`) Python Package

Contribute to CoderData

CoderData is designed to be a customizable resources that can be altered and appended for your own needs.

Issues with current version

CoderData is indeed a work in progress. If you have specific requests or bugs, please file an issue on our GitHub page and we will begin a conversation about details and how to fix the issue. If you would like to create a new feature to address the issue, you are welcome to fork the repository and create a pull request to discuss it in more detail. These will be triaged by the CoderData team as they are received.

Add your own data

CoderData is designed to be federated and therefore you can build your own dataset that can be accessed locally. Below is an image of the current CoderData framework. Each dataset is processed by a single Docker image with a series of standard scripts

Coderdata Build

Documentation and steps

To add your own data, you must add a Docker image with the following constraints:

  1. Be named Dockerfile.[yourdataset] and reside in the /build/docker directory
  2. Possess scripts called build_omics.sh, build_samples.sh, build_drugs.sh and build_exp.sh
  3. Create tables that mirror the schema described by the LinkML YAML file.

The full process is documented on our GitHub site under ‘Adding a new dataset’.

Considerations for building a dataset

Considerations to include are:

Lastly, check out examples! We have numerous Docker files in our Dockerfile directory, and multiple datasets in our build directory.


Your contributions are essential to the growth and improvement of CoderData. We look forward to collaborating with you!