Analysing a variety of Next Generation Sequencing (NGS) data sets from different projects over the past years, we have developed a general workflow to assess data quality. This is a guideline and can be applied at various steps of the analysis, starting with raw FASTQ file checks. FASTQ Quality Checks: Generally the simplest tool to... Continue Reading →

# Hierarchical Models: A Binomial Model with Shrinkage

The material in this post comes from various sources, some of which can be found in [1] Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan, second edition. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition. http://doi.org/10.1016/B978-0-12-405888-0.09999-2 [2] Gelman, A., Carlin, J. B., Stern,... Continue Reading →

# Pattern Recognition using PCA: Variables and their Geometric Relationships

Principal component analysis is a commonly used technique in multi-variate statistics and pattern recognition literature. In this post I try to merge ideas of Geometric and Algebraic interpretation of data as vectors in a vector space and its relationship with PCA. The 3 major sources used in this blog are: [1] Thomas D. Wickens (1995). The... Continue Reading →

# Plausible Reasoning for Scientific Problems: Belief Driven by Priors and Data.

Plausible reasoning requires constructing rational arguments by use of syllogisms, and their analysis by deductive and inductive logic. Using this method of reasoning and expressing our beliefs, in a scientific hypothesis, in a numerical manner using probability theory is one of my interest. I try to condense the material from the first 4 chapters of... Continue Reading →

# Regression & Finite Mixture Models

I wrote a post a while back about Mixture Distributions and Model Comparisons. This post continues on that theme and tries to model multiple data generating processes into a single model. The code for this post is available at the github repository. There were many useful resources that helped me understand this model, and some... Continue Reading →

# Hierarchical Linear Regression – 2 Level Random Effects Model

Regression is a popular approach to modelling where a response variable is modelled as a function of certain predictors - to understand the relations between variables. I used a linear model in a previous post, using the bread and peace model - and various ways to solve the equation. In this post, I want to fit... Continue Reading →

# Compare Transformations & Batch Effects in Omics Data

While analysing high dimensional data, e.g. from Omics (Genomics, Transcriptomics, Proteomics etc.) - we are essentially measuring multiple response variables (i.e. genes, proteins, metabolites etc.) in multiple samples, resulting in a $latex rXn$ matrix X with r variables and n samples. The data capture can lead to multiple batches or groups in the data -... Continue Reading →

# Model Checking: Scoring and Comparing Models

This is another post in the series of model checking posts. Previously we looked at which aspects of the data and model are compatible, using posterior predictive checks. Once we have selected a model or a set of models for the data, we would like to score and compare them. One aspect of comparison using... Continue Reading →

# Model Checking: Posterior Predictive Checks

Once a model is fit and parameters estimated, we would look at how well the model explains the data and what aspects of the data generation process in nature are not captured by the model. Most of the material covered in this post follows the examples from: [1] Gelman, A., Carlin, J. B., Stern, H. S.,... Continue Reading →