Impute missing values based on spatial coordinates
impute_spe.Rdimpute_spe() carries out imputation for missing data in the
primary assay using a specified method from a range of methods.
Returns additional assay in the same SPE
Usage
impute_spe(
spe,
assay_name = NULL,
method = NULL,
group_colname,
k = NULL,
protein_missingness = NULL
)Arguments
- spe
SPE containing data to be imputed.
- assay_name
name of assay with data to be imputed
- method
Method of imputation to be used. See details.
- group_colname
Column name in metadata that specifies the group information to use for group_mean or knn_group. Example: ROI_abbreviation.
- k
K value to be used for k-nearest neighbor imputation
- protein_missingness
Proportion of samples allowed to have missing data for a protein in the given imputation method. Example: When method="global_mean," an protein_missingness of 0.5 indicates that any protein missing data for more than 50% of samples across the entire spatial tissue covered by all samples will be excluded from the imputation method algorithm, and that protein's missing values will not be imputed. When method="group_mean," then protein_missingness of 0.5 indicates that a protein must have data for at least 50% of samples in the specified group to be used in the imputation algorithm and to be imputed.
Details
Methods options and descriptions:
zero: replace missing values with 0
median : replace missing values with global median per protein
median_half : replace missing values with 1/2 global median per protein
mean: replace missing values with global mean per protein
group_mean: replace missing values with mean per group, e.g. group (example: ROI) for each protein
knn: imputation based on k-nearest neighbors, with proteins as neighbors, based on data from all samples across all groups. NOTE: There will still be NA values if the protein is not expressed in this group.
group_knn: imputation based on k-nearest neighbors, with proteins as neighbors, based on data from specified group (e.g. ROI, tissue). NOTE: There will still be NA values if the protein is not expressed in this group.
spatial_knn: imputation based on k-nearest neighors in space
Examples
data(pancMeta)
data(protMeta)
data(smallPancData)
# We can put all samples into the same object (for statistical power)
pooledData <- dplyr::bind_cols(smallPancData)
pooled.panc.spe <- convert_to_spe(pooledData,
pancMeta,
protMeta,
feature_meta_colname = "pancProts",
sample_id = ""
)
#> Spatial object created without spatial coordinate
#> column names provided. Distance based analysis will not be enabled.
#> Note: Only mapping metadata for 2986 features out of 3000 data points
# we can try two imputation methods and compare the difference
res <- impute_spe(pooled.panc.spe, method = "mean")
res2 <- impute_spe(pooled.panc.spe, method = "group_mean",
group_colname = "Image")
mean(assay(res, "imputed") - assay(res2, "imputed"), na.rm = TRUE)
#> [1] 0.003118943