About

The goal of this vignette is to showcase integration between two different omes in the same spatial context. For this we use a dataset from sections of rat brain measured via mass spetrometry imaging (MSI) to identify lipidomic measurements alongside proteomics from regions of interest from Vandergrift et al.

We show how spammR enables: - download and loading of datasets - visualization of features within each dataset - identification of correlated regions in this dataset

First we load the required packages.

Load proteomics data

The proteomics data can be found on zenodo and downloaded directly. From there.

Format data

We require formatting the data as a matrix for loading.

path <- tempfile()
bfc <- BiocFileCache(path, ask = FALSE)

Now we download the data we need.

#brain data
finame <- paste0('https://zenodo.org/records/13345212/files/DESI+spatial',
                 '%20proteomics%20rat%20brain%20proteomics%20',
                 'result.xlsx?download=1')


pdat <- bfcadd(bfc, "proteomics", fpath=finame)
#read in data, convert to matrix 
dat <- readxl::read_xlsx(pdat)#'dat.xlsx') 

feature_data <- select(dat, c(Protein,`Protein ID`,`Entry Name`,Gene)) |>
  dplyr::distinct() |>
  tibble::column_to_rownames('Protein')

dat <- select(dat, c(Protein, starts_with('ROI'))) |>
  tibble::column_to_rownames('Protein')

The image itself we downloaded separately from metaspace (see below) and stored it in the package.

##read in image
img <- system.file("extdata","brain_img_0.png",package = 'spammR')

Now that the image is loaded we can visualize it (not done here to save space) to verify it is what we expect.

cowplot::ggdraw() + cowplot::draw_image(system.file("extdata",
                                                    "brain_img_0.png",
                                                    package = "spammR"))

#ggsave('brain_image.pdf')

As you can see, there are squares chopped out of the image. These were the samples that were measured via proteomics. The next step is to map these coordinates so we can compare them with the MSI data.

Loading image coordinates rom GEOJson

We used the program [QuPath][(https://qupath.readthedocs.io/en/stable/) to collect the regions of interest (ROI) coordinates on the image. This program allowed us to highlight the regions of interest and export them in GEOJson format, which can then be used as input into spammR.

We store the GEOJson file in the package for you to load with this example should you choose to load teo GEOJson using the geojsonR package.

#remove this once we can include it in spammR
 
  library(geojsonR)
  jsondat <- FROM_GeoJson(system.file('extdata','brain_roi.geojson',
                                      package = 'spammR'))
 
  ##herr er hry yhr
  coords <- do.call(rbind, lapply(jsondat$features,function(x){
     roi <- x$properties$name
     xv <- x$geometry$coordinates[,1]
     yv <- x$geometry$coordinates[,2]
     if (is.null(roi))
        roi = ""
     return(list(ID = roi, x_pixels = min(xv), y_pixels = min(yv),
                  spot_height = max(yv) - min(yv),
                 spot_width = max(xv) - min(xv)))
     })) |>
     as.data.frame() |>
     subset(ID != "") |>
     tidyr::separate(ID,into = c('ROI','Replicate'), 
                     sep = '_', 
                     remove = FALSE) |>
     tibble::remove_rownames() |>
     tibble::column_to_rownames('ID')
 
 ##now for each ROI we want an x, y, cell height and cell_width
 #y-coordinates are from top, so need to udpate
  coords$y_pixels = 263 - unlist(coords$y_pixels) - unlist(coords$spot_height)
  coords$x_pixels = unlist(coords$x_pixels)
  coords$spot_height = unlist(coords$spot_height)
  coords$spot_width = unlist(coords$spot_width)
  write.csv(as.data.frame(coords),file='../inst/extdata/bcoords.csv')

To avoid this we just load the csv directly:

coords <- read.csv(system.file('extdata','bcoords.csv',
                               package = 'spammR'))
rownames(coords) <- paste(coords$ROI, coords$Replicate,sep = '_')

Now that we have the coordinates we can use spammR to create the SpatialExperiment object and plot a protein.

Create SpatialExperiment Object

Here we load the proteomics and create the object.

   #create an SFE
  spe <- spammR::convert_to_spe(dat = dat, 
                      feature_meta = feature_data,
                      sample_meta = coords,
                      spatial_coords_colnames = c('x_pixels','y_pixels'),
                      assay_name = 'proteomics',
                      sample_id = 'rat_brain', 
                      image_files = img,
                      image_id = 'rat_brain')

  # myelin
  spammR::spatial_heatmap(spe, feature = 'P02688', 
                          feature_type = 'Protein ID',
                sample_id = 'rat_brain', 
                image_id = 'rat_brain')

The protein data was collected from smaller regions of interest within the larger image.

Load lipidomics data

Now that we have the image and protein data loaded, we can begin to ingest the lipid data from metaspace. This requires an additional command to pull the information directly.

Retrieve metaspace data

We created a wrapper to the Metaspace python package to enable download and formatting of metaspace measurements for spammR.

The result is a SpatialExperiment. This takes too long to run as a vignette so we show the code and then load the file separately.

##get lipid data from metaspace 

mspe <- spammR::retrieve_metaspace_data("2024-02-15_20h37m13s", 
                                fdr = 0.2, 
                                assay_name = 'lipids',
                                sample_id = 'rat_brain',
                                rotate = TRUE, 
                                drop_zeroes = TRUE)

Now we can downlod the object from figshare and load it.

mspe.url <- 'https://api.figshare.com/v2/file/download/60993349'
mcache <- bfcadd(bfc, "mspe", fpath = mspe.url)
## Error while performing HEAD request.
##    Proceeding without cache information.
mspe <- readRDS(mcache)

Add image data

Lastly we add the image file to the data downloaded by metaspace and visualize a lipid of interest.

## add in image to check 
mspe <- SpatialExperiment::addImg(mspe, img, scaleFactor = NA_real_,
                                  sample_id = 'rat_brain',
                                  image_id = 'rat_brain')

meta <- rowData(mspe) 
meta$Name <- vapply(meta$moleculeNames, function(x) {
 x[1]}, character(1))

rowData(mspe) <- meta

spatial_heatmap(mspe, feature = 'C41H78NO7P-H-',
                 assay = 'lipids',
                 plot_log = TRUE,
                   metric_display = 'Lipid expression',
                sample_id = 'rat_brain',
                image_id = 'rat_brain')#plot with an ion
## Warning: Removed 17673 rows containing missing values or values outside the scale range
## (`geom_label()`).

#ggsave('lipidExpr.pdf')

Here we can see the overlay of specific lipids.

Comparing Protein and Lipids

Now we can compare lipids in proteins in the same image.

Merging images to single coordinate system

Since the proteomics is measured in discrete ROI and the lipids are measured on a per-pixel basis, to carry out pure multiomic comparisons we need to map them to the same coordinate space.

To do this we use the spat_reduce function to reduce the lipidomics data to the same regions as the proteomics. Here is an example of the same ion we saw before.

Now we can start to analyze the relationship between lipids and proteins.

Correlation analysis

One of the innovative analyses we can do with the spatial data is to evaluate correlations within molecules across space. We include the spatial_network function that can compute correlations between groups of features across omic domains and show how to use it below.

Looking for lipids associated with known proteins

The genes and proteins of the brain have been well studied, and proteins such as glial marker Gfqp are known to play a significant role in the central nervous system and can be used as markers for disease. We can now search for lipids that are correlated with myelin spatially using our network tool.

#let's look at the correlation between the protein and the lipids
mbp_cor <- spatial_network(reduced, 
                      assay_names = c("proteomics","lipids"), 
                      query_feature = 'P47819', #protein
                      target_features = rowData(altExp(reduced))[,'Name'],#lipids
                      feature_names = c('Protein ID','Name'))

mbp_cor |>
  activate(edges) |> 
  arrange(desc(abs(corval)))
## # A tbl_graph: 22 nodes and 21 edges
## #
## # A rooted tree
## #
## # Edge Data: 21 × 3 (active)
##     from    to corval
##    <int> <int>  <dbl>
##  1     4    22 -0.728
##  2    14    22 -0.695
##  3     3    22 -0.691
##  4    18    22 -0.688
##  5     6    22 -0.68 
##  6    19    22 -0.679
##  7    15    22 -0.565
##  8     2    22  0.468
##  9    10    22 -0.468
## 10     1    22  0.456
## # ℹ 11 more rows
## #
## # Node Data: 22 × 2
##   name                                   class 
##   <chr>                                  <chr> 
## 1 PS(28:6(10Z,13Z,16Z,19Z,22Z,25Z)/12:0) lipids
## 2 PI(18:1(11Z)/20:3(8Z,11Z,14Z))         lipids
## 3 Arachidonate                           lipids
## # ℹ 19 more rows
mbp_cor |>
  ggraph(layout = 'stress')  +
  geom_edge_link(aes(colour = corval)) + 
  geom_node_point(aes(color = class)) +
  geom_node_label(aes(label = name, color = class)) +
  scale_color_manual(values=c(lipids = 'darkorange',proteomics = 'slateblue3'))

#ggsave('gfap_cor_graph.pdf')
nodes <- mbp_cor |> activate(nodes) |> as_tibble()

nodes[c(4,2),]
## # A tibble: 2 × 2
##   name                           class 
##   <chr>                          <chr> 
## 1 22:6(4Z,7Z,10Z,13Z,16Z,19Z)    lipids
## 2 PI(18:1(11Z)/20:3(8Z,11Z,14Z)) lipids

Now we can now double check the lipid measurements of these values. As a refresher here is myelin.

spatial_heatmap(reduced,
                feature = 'P47819', 
                assay_name = 'proteomics',
                feature_type = 'Protein ID',
                metric_display = 'Protein expression',
                plot_title = 'Gfap expression',
                sample_id = 'rat_brain', image_id = 'rat_brain')
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_label()`).

#ggsave('gfap_brain.pdf')

And here we can see how the correlated lipids are expressed throughout.

## anti-correlated
a1 <- spatial_heatmap(reduced,
                feature = '22:6(4Z,7Z,10Z,13Z,16Z,19Z)', 
                assay_name = 'lipids',
                feature_type = 'Name',
                metric_display = 'Lipid expression',
                sample_id = 'rat_brain', image_id = 'rat_brain')


a2 <- spatial_heatmap(mspe,
                feature = '22:6(4Z,7Z,10Z,13Z,16Z,19Z)', 
                assay_name = 'lipids',
                feature_type = 'Name',
                metric_display = 'Lipid expression',
                sample_id = 'rat_brain', image_id = 'rat_brain')

cowplot::plot_grid(a1, a2, ncol=2)
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_label()`).
## Warning: Removed 61192 rows containing missing values or values outside the scale range
## (`geom_label()`).

#ggsave('antiCorrelatedLipids.pdf', width = 10)

#correlated
c1 <- spatial_heatmap(reduced,
                feature = 'PI(18:1(11Z)/20:3(8Z,11Z,14Z))', 
                assay_name = 'lipids',
                feature_type = 'Name',
                metric_display = 'Lipid expression',
                sample_id = 'rat_brain', image_id = 'rat_brain')

c2 <- spatial_heatmap(mspe,
                feature = 'PI(18:1(11Z)/20:3(8Z,11Z,14Z))', 
                assay_name = 'lipids',
                feature_type = 'Name',
                metric_display = 'Lipid expression',
                sample_id = 'rat_brain', image_id = 'rat_brain')

cowplot::plot_grid(c1, c2, ncol = 2)
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_label()`).
## Warning: Removed 68010 rows containing missing values or values outside the scale range
## (`geom_label()`).

#ggsave('correlatedLipids.pdf', width = 10)

This approach, if used on larger samples, can be used to identify lipid biomarkers based on known protein function.

Ascribing function to unknown lipids

We can also use the correlation analysis to assess the biological function of specific lipids, by seeking out which pathways are over-represented among proteins who are correlated with a lipid of interest.

First we look to see which of our lipids are most variable across the tissue.

lvs <- sort(apply(assay(mspe,'lipids'),1,var,na.rm = TRUE),decreasing = TRUE)[1:10]

lvn <- rowData(altExp(reduced))[names(lvs),'Name']

lvn[1:5]
## [1] "PS(28:6(10Z,13Z,16Z,19Z,22Z,25Z)/12:0)"
## [2] "Arachidonate"                          
## [3] "PI(18:1(11Z)/20:3(8Z,11Z,14Z))"        
## [4] "PI(14:0/20:1(11Z))"                    
## [5] "22:6(4Z,7Z,10Z,13Z,16Z,19Z)"

We’ll focus on phosphatidylserine “PS(28:6(10Z,13Z,16Z,19Z,22Z,25Z)/12:0)”, which is the most variable across the sample.

#number 1 seems to have an interesting pattern
spatial_heatmap(mspe,feature = lvn[1], assay_name = 'lipids', 
                feature_type = 'Name',
                image_id = 'rat_brain', sample_id = 'rat_brain', 
                metric_display = 'Lipid expression',
                plot_log = TRUE)
## Warning: Removed 68794 rows containing missing values or values outside the scale range
## (`geom_label()`).

spatial_heatmap(reduced,feature = lvn[1], assay_name = 'lipids', 
                feature_type = 'Name',
                image_id = 'rat_brain', sample_id = 'rat_brain', 
                metric_display = 'Lipid expression',
                plot_log = TRUE)
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_label()`).

Now we can ask what proteins correlate with this lipid in the reduced object.

#let's look at the correlation between the lipid and proteins
fl <- c(lvn[1], rowData(reduced)[,'Gene'])


cg <- spatial_network(reduced, 
                      assay_names = c("proteomics","lipids"), 
                      query_feature = lvn[1], #lipid
                      target_features = rowData(reduced)[,'Gene'],
                      feature_names = c('Gene','Name'))
## Joining with `by = join_by(rowval)`
##we can then use trick above to get the neighborhood of our lipid
gt <- cg %>%
  activate(nodes) |>
  as_tibble()

neigh_weights <-  cg |>
  activate(edges) |>
  #now sure how to or from is assigned, might want 
  #to check both
  filter(to == which(gt$name == lvn[1])) 

#then we can get the edge weights and nodes 
edges <- neigh_weights |> as_tibble()

nodes <- neigh_weights |>
  activate(nodes) |>
  as_tibble()

##create a table with the protein correlationm
prot_cor <- data.frame(nodes[edges$to,],lipid_cor = edges$corval) |>
  dplyr::rename(Gene = 'name') |>
  subset(Gene %in% rowData(reduced)[,'Gene'])

neigh_weights |>
  activate(edges) |> filter(abs(corval) > 0.75) |>
  activate(nodes) |>
  mutate(degree = centrality_degree(mode = 'all')) |>
  filter(degree > 0) |>
  ggraph(layout = 'stress')  +
  geom_edge_link(aes(colour = corval)) + 
  geom_node_point(aes(color = class)) +
  geom_node_label(aes(label = name, color = class))

Now we have the correlation value - both negative and positive - that we can use as input to a functional enrichment test.

We can look at the pathway enriched by these correlation values using leapR via the enrichment_in_order method.

##add back to data set

prot_cor <- prot_cor |> 
  group_by(Gene) |> 
  summarize(lipid_cor = mean(lipid_cor))

rowData(reduced) <- rowData(reduced) |>
  as.data.frame() |>
  left_join(prot_cor) |>
  mutate(upperGene = toupper(Gene))
## Joining with `by = join_by(Gene)`
data("krbpaths")
enrich <- leapR::leapR(geneset = krbpaths, enrichment_method = 'enrichment_in_order',
                       eset = reduced, primary_column = 'lipid_cor', method = 'ks', 
                       id_column = 'upperGene', minsize = 5)

paths <- subset(enrich, BH_pvalue < 0.3)
#subset(enrich,ingroup_mean > 0.25) |> 
#  subset(pvalue <0.05)

print(paths)
##  [1] ingroup_n       ingroupnames    ingroup_mean    outgroup_n     
##  [5] outgroup_mean   zscore          oddsratio       pvalue         
##  [9] BH_pvalue       SignedBH_pvalue background_n    background_mean
## <0 rows> (or 0-length row.names)

We can now look at pathways that are the most statistically significant.

Plot proteins

We first check the expression of the proteins as well and see if they are all correlated in the same direction.

prots <- unlist(stringr::str_split(paths[1,'ingroupnames'], pattern = ', '))

pv <- assay(subset(reduced,upperGene %in% prots),'proteomics')
lv <- assay(subset(altExp(reduced),Name == lvn[1]),'lipids')
 
##proteins seem anti-correlated
mean(cor(t(pv),t(lv), method = 'spearman'))
## [1] 0.01484068

We can see that they are negative correlated.

p2 <- spatial_heatmap(reduced, assay_name = 'proteomics',
                feature = prots,
                feature_type = 'upperGene',
                sample_id = 'rat_brain',
                image_id = 'rat_brain', 
                plot_log = TRUE, 
                title_size = 10,
                plot_title = rownames(paths)[1],
                metric_display = 'Protein expression')


p2
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_label()`).

The heatmap confirms that the lipid of interest is indeed associated with this glucose and suger transporter pathway.

Summary

This vignette explores how one can use the spammR package to analyze multiomics measurements captured in a spatial context.