decomprolute

Decomprolute

The goal of this package is to both run and evaluate tumor deconvolution algorithms on multi-omics data. We provide the ability to assess a suite of algorithms and cell signature matrices such that you can select your algorithm in a data-driven fashion. We also provide a modular framework that enables you to add your own algorithm or cell signature. For doing this, please see our GitHub site.

Prepare your system
Deconvolve CPTAC data
Deconvolve your own data
Evaluate metrics on new algorithm or signature matrix
Contribute

Prepare your system

To run the code you will need to download Docker and a CWL interpreter such as CWL tool that supports CWL v1.2. These tools will enable the different modules to interoperate. Once you have these two tools installed you can test it by running deconvolution on a single data type as shown below:

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/prot-deconv.cwl --cancer hnscc --protAlg mcpcounter --sampleType tumor --signature LM7c

This will run the MCP-counter algorithm on proteomics data from the CPTAC breast HNSCC cohort using our LM7c signature and confirm that the system is able to run the more complex analyses. Here are more specific use cases.

Deconvolve CPTAC data

Decomprolute can be used to evaluate cell type on a specific CPTAC dataset, as we have included numerous publicly available datasets and algorithms within the framework. Specifically, you can run the prot-deconv.cwl script with the following arguments:

cancer: one of the datatypes described in the CPTAC Data section.
protAlg: one of the algorithms described in the Algorithms section.
sampleType: either tumor, normal, or all
signature: one of the signature matrices described in the Signature Matrix section.

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/prot-deconv.cwl --cancer hnscc --protAlg cibersort --sampleType tumor --signature LM9

CPTAC Data

This algorithm leverages data collected through the clinical proteomic tumor analysis consortium (CPTAC) as the foundation of its benchmarking metrics. This consortium has collected hundreds of patient tumor data, including proteomic and transcriptomic data from the same patients. Given the general confidence in transcriptomic-based tumor convolution, we can use these data to compare transcriptomic and proteomic tumor deconvolution in the same patient samples

We have collect this data via the CPTAC Python API to better match the mRNA data. This CWL tool and Docker image are in the protData and mRNAdata directories.

Below are the available tumor types:

Dataset name	Description	Data reuse status	Publication link
Brca	breast cancer	no restrictions	https://pubmed.ncbi.nlm.nih.gov/33212010/
Ccrcc	clear cell renal cell carcinoma (kidney)	no restrictions	https://pubmed.ncbi.nlm.nih.gov/31675502/
Colon	colorectal cancer	no restrictions	https://pubmed.ncbi.nlm.nih.gov/31031003/
Endometrial	endometrial carcinoma (uterine)	no restrictions	https://pubmed.ncbi.nlm.nih.gov/32059776/
Gbm	glioblastoma	no restrictions	https://pubmed.ncbi.nlm.nih.gov/33577785/
Hnscc	head and neck squamous cell carcinoma	no restrictions	https://pubmed.ncbi.nlm.nih.gov/33417831/
**Lscc	lung squamous cell carcinoma	password access only	unpublished**
Luad	lung adenocarcinoma	no restrictions	https://pubmed.ncbi.nlm.nih.gov/32649874/
Ovarian	high grade serous ovarian cancer	no restrictions	https://pubmed.ncbi.nlm.nih.gov/27372738/
**Pdac	pancreatic ductal adenocarcinoma	password access only	unpublished**

As such, datasets have been updated to following: [‘brca’, ‘ccrcc’, ‘endometrial’, ‘colon’, ‘ovarian’, ‘hnscc’, ‘luad’]

As more datasets are published we will update the list accordingly.

Algorithms

We have included numerous algorithms in this package. Docker files and requisite data are included in the existing code base.

Algorithm	Source
cibersort	Cibersort
epic	EPIC
xcell	xCell
mcpcounter	MCP Counter

Cell type signatures

There are numerous ways to define the individual cell types we are using to run the deconvolution algorithms. We will upload specific lists to compare in our workflow.

List Name	Description	Source
LM7c	Seven cell types (B, CD4 T, CD8 T, dendritic cells, granulocytes, monocytes, NK) collapsed from proteomic data	Rieckmann et al.
3’ PBMCs	Seven cell types (B, CD4 T, CD8 T (CD8 T + NK T), dendritic cells, megakaryocytes, monocytes, NK) from scRNA-seq data	Newman et al.
LM9	Ten cell types predicted by MCPCounter signature
LM22	The original matrix from cibersort	Newman et al.

Deconvolve your own data

If you have a specific dataset you’d like to deconvolve but are not sure which tool to use, you can use the tools in the metrics directory to determine and then run the best algorithm for your data. T

To identify the signature matrix/algorithm combination that agrees between your own mRNA/protein data, you can run the following (replacing the files in the best-test.yml file).

Run the algorithm/signature matrix that correlates best between mRNA and protein

To assess which algorithm/signature matrix provides the best agreement between mRNA and protein datasets, you will need to provide two matrices from your own data as input into the run-best-alg-by-cor workflow.

Here we recommend replacing the two files in the YAML file shown here to compare the mRNA and protein correlations to find the best algorithm for your data.

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/mrna-prot/run-best-alg-by-cor.cwl https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/mrna-prot/best-test.yml

Run the algorithm on simulated data

To assess which algorithm/signature matrix best agree on simulated data, you can use either mRNA or protein data as input into the run-best-alg-by-sim workflow. Below is an example using our test data.

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/data-sim/run-best-alg-by-sim.cwl --datFile https://raw.gihubusercontent.com/PNNL-CompBio/decomprolute/main/toy_data/ov-all-prot-reduced.tsv --data-type prot

Evaluate metrics on new algorithm or signature matrix.

In the manuscript we completed three separate tests of proteomic tumor deconvolution algorithms. To benchmark your own algorithm or signature matrix, follow the Contribution guide on the main GitHub page to add to our framework, then you can run the following metrics as described in our manuscript.

Performance on simulated data

We have simulated both mRNA and proteomics data from established experiments as described below. We try to evaluate mRNA data on mRNA-derived simulations, and proteomics data on proteomics-derived simulated data. The datasets themselves are stored in the simulatedData directory.

We have included two YAML files to use as test runs of each simulation.

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/data-sim/simul-data-comparison.cwl https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/data-sim/rna-sim-test.yml ##evaluate rna-based deconvolution
cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/data-sim/simul-data-comparison.cwl https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/data-sim/prot-sim-test.yml ##evaluate protein based deconvolution

These will produced the necessary summary statistics and figures.

mRNA-Proteomics Comparison

We also wanted to measure how consistent an algorithm was between mRNA and proteomics data. This iterates through all algorithms, data, and matrices to and compares how similar each cell type prediction is across mRNA vs. proteomic samples.

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/mrna-prot/mrna-prot-comparison.cwl https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/mrna-prot/alg-test.yml

This will run the evaluation in our test YAML file. To update the parameters, create your own YAML file. The algorithm currently has five parameters:

mrna-algorithms: List of algorithms to use to deconvolve mRNA data. One of epic, xcell, cibersort, mcpcounter.
prot-algorithms: List of algorithms to use to deconvolve protein data. One of epic, xcell, cibersort, mcpcounter.
cancerTypes: List of cancer types
signatures: List of signature matrices, currently found in the signature matrix directory
tissueTypes: list of tissue types: tumor, normal, or all

Pan-Immune clustering annotation

Lastly we can cross-reference known immune types with predicted cell types from the various deconvolution algorithms to ascertain how well predicted cell types align with immune populations.

cwltool https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/imm-subtypes/pan-can-immune-preds.cwl https://raw.githubusercontent.com/PNNL-CompBio/decomprolute/main/metrics/imm-subtypes/imm-args.yml

This site is open source. Improve this page.