Normalising Nanostring data

This is a quick R guide to learn about Nanostring technology (nCounter) and how to pre-process the data profiled on this platform.

Description

The nCounter system from Nanostring Technologies is a direct, reliable and highly sensitive multiplexed measurement of nucleic acids (DNA and RNA) based on a novel digital barcode technology. It involves Custom Codeset of genes or off-the-shelf preassembled panels and on single cell (more details on NanoString website).

Each mRNA Expression CodeSet contains probes designed against fourteen ERCC transcript sequences.

– Six of these sequences are used as positive hybridization controls and eight are designed as negative controls.

– These positive controls and negative controls are present in each CodeSet independent of the sample. These help in normalising for any technical/systemic variability.

– In addition, the codesets can contain some housekeeping genes which can be used for normalising sample variability (biological normalisation) i.e. to correct for differences in sample input between assays. It is based on the assumption that the target sequences of the house keeping genes are consistent in their expression levels.

Note:  Read the nCounter guide available in the the link for more
details: (https://www.nanostring.com/application/files/1214/8942/4642/MAN-C0011-03_nCounter_Gene_Expression_Data_Analysis_Guidelines.pdf)

Load the dataset

The data produced by the nCounter Digital Analyzer (nanostring) are exported as a Reporter Code Count (RCC) file which is a comma-separated text (.csv) file that contains the counts for each gene in a sample. Each cartridge has 12 lanes  i.e. 12 samples can be profiled on one nanostring cartridge.

For processing the data one can apply the normalization steps recommended by the company (using NanoStringNorm R package). Alternatively, the data can be treated as regular digital counts (RNA-seq) and can be analysed using edgeR TMM normalisation approach. However, in our experience former works better then the latter as it accounts for cross-hybridization related biases by allowing user to do background correction.

# Load the required R packages
library(NanoStringNorm)

You can read the RCC files in two different ways i.e.use the excel import function read.xls.RCC to read directly from nCounter output files if provided in .xls format by the facility. However, do ensure that you are using the worksheet with the raw counts and not something that has been processed. An example dataset can be downloaded from GEO (GSE51488).

# read the raw counts from the RCC excel spreadsheet output by the nCounter platform
df <-read.xls.RCC("GSE51488_GAMA_Nanostring_RAW_Spleen_1.xls", sheet = 1)

or,

you can use the following to process single sample markup RCC files (example:GSE95100) and merge the individual .RCC files together in one variable.

# read the raw counts from individual RCC files from the directory (path of .RCC files )
df <-read.markup.RCC(rcc.path = ".",rcc.pattern = "*.RCC|*.rcc",exclude = NULL,include = NULL,nprobes = -1)

Pre-processing

Firstly, remove systemic biases by using geometric mean.

# use geometric mean for technical normalisation
all_samples_gm <- NanoStringNorm(x = df,anno = NA,CodeCount = 'geo.mean',Background = 'none',SampleContent = 'none', round.values = FALSE, take.log =FALSE,return.matrix.of.endogenous.probes =FALSE)

Then, correct for cross-hybridization and normalise for sample variability by using background correction and house keeping genes respectively.

# use housekeeping genes along with background correction(mean+2SD) for biological normalisation---#
normalised_df <- NanoStringNorm(x = all_samples_gm,anno = NA,CodeCount = 'none',Background = 'mean.2sd',SampleContent = 'housekeeping.geo.mean', round.values = FALSE,is.log = FALSE, take.log = TRUE, return.matrix.of.endogenous.probes = TRUE )

This returns the normalised values in log2 scale. If you want the data to be on linear scale then change take.log = FALSE

# save the normalised data in a file---#
write.table(normalised_df,"Normalised_data_nanostring.csv",sep=",",quote=F,row.names = T,col.names = T)

The information about the R packages can be found below.

# print the package versions used ---#
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.12.5 (Sierra)

## locale:

## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

## attached base packages:

## [1] parallel  stats     graphics  grDevices utils     datasets  
   [7] methods  base     

## other attached packages:
## [1] NanoStringNorm_1.1.21 vsn_3.40.0            Biobase_2.32.0    ## [4] BiocGenerics_0.18.0   gdata_2.17.0         

## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.9           knitr_1.16       magrittr_1.5     
##  [4] zlibbioc_1.18.0       munsell_0.4.3    lattice_0.20-34  
##  [7] colorspace_1.3-2      stringr_1.1.0     plyr_1.8.4       
## [10] tools_3.3.1           grid_3.3.1        gtable_0.2.0     
## [13] affy_1.50.0           htmltools_0.3.5   gtools_3.5.0   
## [16] assertthat_0.1        yaml_2.1.14       lazyeval_0.2.0   
## [19] rprojroot_1.2         digest_0.6.12     preprocessCore_1.34.0
## [22] tibble_1.2            affyio_1.42.0     ggplot2_2.2.1    
## [25] evaluate_0.10         rmarkdown_1.5     limma_3.28.21    
## [28] stringi_1.1.2         BiocInstaller_1.22.3  scales_0.4.1     ## [31] backports_1.0.4

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: