Recently I came across the problem with suspiciously large difference in the averages of two groups while analysing some Omics data. An article dealing with similar issues can be seen here. The data distribution is shown below in Figure 1 (FYI: the fold change was around 6 - which is very large for this kind... Continue Reading →

# High Dimensional Data & Hierarchical Regression

In a high-throughput experiment one performs measurements on thousands of variables (e.g. genes or proteins) across two or more experimental conditions. In bioinformatics, we come across such data generated using technologies like Microarrays, Next generation sequencing, Mass spec etc. Data from these technologies have their own pre-processing, normalising and quality checks (see here and here... Continue Reading →

# Logistic “Aggression”: binary classification problems

Binary problems, where the outcome can be either True or False are very common in data analysis, from an inference or classification point of view. A previous post on binomial modelling deals with a similar problem, but this time we frame the problem from a regression or generalized linear model (GLM) view point. Previously we... Continue Reading →

# Hierarchical Models: A Binomial Model with Shrinkage

The material in this post comes from various sources, some of which can be found in [1] Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan, second edition. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition. http://doi.org/10.1016/B978-0-12-405888-0.09999-2 [2] Gelman, A., Carlin, J. B., Stern,... Continue Reading →

# Pattern Recognition using PCA: Variables and their Geometric Relationships

Principal component analysis is a commonly used technique in multi-variate statistics and pattern recognition literature. In this post I try to merge ideas of Geometric and Algebraic interpretation of data as vectors in a vector space and its relationship with PCA. The 3 major sources used in this blog are: [1] Thomas D. Wickens (1995). The... Continue Reading →

# Plausible Reasoning for Scientific Problems: Belief Driven by Priors and Data.

Plausible reasoning requires constructing rational arguments by use of syllogisms, and their analysis by deductive and inductive logic. Using this method of reasoning and expressing our beliefs, in a scientific hypothesis, in a numerical manner using probability theory is one of my interest. I try to condense the material from the first 4 chapters of... Continue Reading →

# Regression & Finite Mixture Models

I wrote a post a while back about Mixture Distributions and Model Comparisons. This post continues on that theme and tries to model multiple data generating processes into a single model. The code for this post is available at the github repository. There were many useful resources that helped me understand this model, and some... Continue Reading →

# Hierarchical Linear Regression – 2 Level Random Effects Model

Regression is a popular approach to modelling where a response variable is modelled as a function of certain predictors - to understand the relations between variables. I used a linear model in a previous post, using the bread and peace model - and various ways to solve the equation. In this post, I want to fit... Continue Reading →

# Model Checking: Scoring and Comparing Models

This is another post in the series of model checking posts. Previously we looked at which aspects of the data and model are compatible, using posterior predictive checks. Once we have selected a model or a set of models for the data, we would like to score and compare them. One aspect of comparison using... Continue Reading →