Diving into the world of computational biology without a traditional computer science background has been a bit like learning to swim in the deep end. I have learned that one of the best ways to grasp a concept is to try and explain it. In the past year, I have quickly entered the fun yet sometimes scary world of probabilistic programming and I thought it would be a good time to take a step back and make sure I understand the basics. In this post, we will start off simple and slowly walk through the Bayesian approach to a linear regression before diving into variational inference.

Linear Regression: A Primer

Linear regression is a statistical method enabling us to predict a dependent variable using one or more independent variables. It assumes a linear relationship between these variables, summarized by the formula:

$$ Y = \beta_0 + \beta_1X + \epsilon $$

Where:

For multiple independent variables, the equation extends to:

$$ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n + \epsilon $$

Fitting the Model: The process of "fitting" a linear regression model involves finding the values of the β coefficients that minimize the error term (ϵ). This is typically done using the method of least squares, which calculates the best-fitting line by minimizing the sum of the squares of the vertical distances (residuals) of the points from the line.

Below is a basic scatter plot of data points that follow a linear trend, and then fit a linear regression line (the best fit line) through these points.

Untitled

By quantifying the strength and direction of the relationship between predictors and a dependent variable, linear regression provides a powerful tool for prediction and for understanding how each predictor influences the outcome. However, the commonly used least squares approach assumes a linear relationship and is particularly sensitive to outliers, which can distort the slope and intercept of the best-fit line. Least absolute deviation (which minimizes absolute errors rather than squared errors) is a variant of linear regression that is less influenced by outliers.

Transitioning to Bayesian Thinking: Prior vs. Posterior

Although classical linear regression can capture uncertainty through confidence intervals—either by assuming normal errors or using resampling methods like the bootstrap —these approaches remain rooted in frequentist concepts. Bayesian methods, including variational inference, instead model a full probability distribution over parameter values, offering a more comprehensive view of uncertainty. Rather than asking, “What is the single best estimate of our parameter?” or “What is the confidence interval around it?” Bayesian approaches pose the question, “What is the probability distribution of possible parameter values, given the data?” Grounded in Bayes’ theorem, this perspective updates our beliefs about parameters when new data is observed. Central to this framework are the prior and posterior distributions, which represent our knowledge about the parameters before and after seeing the data.

Prior Distribution, p(θ)