Next Generation Sequencing Data Quality Checks

Analysing a variety of Next Generation Sequencing (NGS) data sets from different projects over the past years, we have developed a general workflow to assess data quality. This is a guideline and can be applied at various steps of the analysis, starting with raw FASTQ file checks. FASTQ Quality Checks: Generally the simplest tool to... Continue Reading →

Advertisements

Pattern Recognition using PCA: Variables and their Geometric Relationships

Principal component analysis is a commonly used technique in multi-variate statistics and pattern recognition literature. In this post I try to merge ideas of Geometric and Algebraic interpretation of data as vectors in a vector space and its relationship with PCA. The 3 major sources used in this blog are: [1]┬áThomas D. Wickens (1995). The... Continue Reading →

Compare Transformations & Batch Effects in Omics Data

While analysing high dimensional data, e.g. from Omics (Genomics, Transcriptomics, Proteomics etc.) - we are essentially measuring multiple response variables (i.e. genes, proteins, metabolites etc.) in multiple samples, resulting in a $latex rXn$ matrix X with r variables and n samples. The data capture can lead to multiple batches or groups in the data -... Continue Reading →

Powered by WordPress.com.

Up ↑