Understanding linear regression in machine learning

Understanding Linear Regression in Machine Learning

Overview
Linear Regression
Types of Linear Regression
Ridge vs Lasso
Limitations of Linear Regression
Hyperparameter Tuning in Linear Regression

Overview

A statistical technique for examining the relationship between two continuous variables is called linear regression. You wish to predict a dependent variable based on an independent variable using simple linear regression. Finding the best-fitting straight line to represent the relationship between the independent and dependent variables is the aim of linear regression.

Linear Regression

This question explains the terms “linear” and “regression” in two sections. We defined “linear” earlier. Sir Francis Galton coined the term “regression” to characterize the downward trend in parental height and that of their offspring. While short parents typically have short children who are taller than them, tall parents typically have tall children who are shorter than themselves. “Regression towards mediocrity” was how he described it. The term “regression effect” refers to this and is typically used in regressions where the response and explanatory variables are repeated measurements made on the same item (in this case, families).

The odd thing is that we may flip the terms “parents” and “children,” so, for instance, tall parents typically have shorter children and tall children typically have shorter parents.

The Y model = (a X 2) + (b X + (c) + (ϵ), where (ϵ) N (0, σ 2) is thought to represent linear regression.

When it comes to the parameters, linear regressions are linear. (The equation f(a u + b v) = a f(u) + b f(v) holds for linear functions, where u and v can be either numbers or vectors.)

Types of Linear Regression

Simple Linear Regression

A statistical technique called simple linear regression is used to predict the connection between one independent variable and one dependent variable. It is an essential method in machine learning and statistics with applications across many domains. Several practical uses for basic linear regression consist of:

Economics: The relationship between variables like income and consumption, price and demand, or interest rates and investment can be examined using basic linear regression.
Finance: In the field of finance, basic linear regression can be used to examine the effects of economic indicators on financial markets, forecast stock values based on historical data, and examine the relationship between interest rates and bond prices.
Marketing: The efficiency of advertising campaigns, the relationship between marketing expenses and sales, and the prediction of client behavior based on demographic information may all be studied using simple linear regression.
Healthcare: Using biomarkers to forecast the course of a disease or examine the effects of treatment on patient recovery, basic linear regression can be used to examine the association between patient attributes and health outcomes.

The linear function defines simple linear regression:

Y = β0*X + β1 + ε

The regression slope is represented by the two unknown constants β0 and β1, whereas the error term is ε (epsilon). To model the relationship between two variables, such as these, you can use simple linear regression:

precipitation and crop productivity
Children’s age and height
mercury metal temperature and expansion in a thermometer

Multiple Linear Regression

The most popular type of linear regression analysis is multiple linear regression. To explain the link between one continuous dependent variable and two or more independent variables, multiple linear regression is employed as a predictive approach.

Simple linear regression is extended into multiple regression. When we wish to forecast a variable’s value based on the values of two or more other variables, we employ it. The dependent variable is the one we are trying to predict (often termed the outcome, target, or criteria variable).

Assumptions:

The residuals of a regression must have a normal distribution.
It is expected that the dependent and independent variables have a linear relationship.
The residuals have a roughly rectangular shape and are homoscedastic.
The model assumes that there is no multicollinearity, which means that there is little to no correlation between the independent variables.

The following modifications to the linear regression line function add more factors:

Y is equal to β0*X0 + β1X1 + β2X2 + nXn + ε

The β constants rise in proportion to the number of predictor variables.

Multiple variables and their effects on an outcome are modeled by multiple linear regression:

Temperature, rainfall, and fertilizer application on crop yield
Exercise and nutrition for heart disease
Inflation and wage growth concerning house loan rates

Logistic Regression

With regression, we may forecast an outcome given a set of inputs. For example, we may estimate a person’s height from their father’s and mother’s height. Because the outcome variable in this sort of regression is a continuous real number, it is referred to as linear regression.

However, what if our objective was to forecast something that wasn’t a continuous number? Suppose we wish to forecast whether or not it will rain tomorrow. Since it doesn’t make sense to treat our result as a continuous number in this situation—it will either rain or it won’t—using standard linear regression won’t work.

We employ logistic regression in this instance since our outcome variable is a category. In general, logistic regression functions similarly to linear regression, multiplying each input by a coefficient, adding a constant, and adding the total. The result of linear regression is extremely simple to understand. When it comes to height prediction, our result is just the person’s estimated height.

However, the log of the odds ratio is what is obtained via logistic regression. The odds ratio in the case of forecasting whether or not it will rain tomorrow is calculated by dividing the probability of rain by the probability of not raining. Then, much like with linear regression, we take the log of this ratio to get a continuous real number as our result. Although this output is not as intuitively clear, we can apply the following transformation to some output y:

(exp(y)/(1 + exp(y)))

to obtain the event’s probability (in this case, we’re merely reversing the log and the odds ratio).

Here are a few instances:

The likelihood of winning or losing a game of sports
The likelihood of passing or not passing an exam
The likelihood that a picture represents a fruit or an animal

Ridge vs Lasso

Consider the definition of collinearity. Your independent variables aren’t diverse enough to allow you to distinguish between different effects.

Consider the following scenario: you are regressing the maximum educational attainment of individuals, and the independent variables are the mother’s and father’s educational attainment.

Couples frequently have comparable educational backgrounds, therefore it could be difficult to tell the difference between the father’s and mother’s educational influences. If that is the focus of your research, find participants whose parents have different educational backgrounds instead of searching for some magic-fitting procedure to get answers from your data that don’t exist. Rather than a fitting problem, it is a data problem.

If your study is not primarily concerned with differentiating between the father and mother effects, compare the univariate coefficients. If they are reasonably close, substitute one variable for the other, such as the average of the mother’s and father’s levels. Alternatively, you may wish to consider utilizing the mother’s educational level as well as the residual from the father’s educational level fitting the mother’s educational level.

It is true that in increasingly complex analyses, it is sometimes difficult to determine what additional data might be helpful or how to reorganize the independent variables to produce more trustworthy fits. These challenges can be approached in a variety of ways, and I believe that experimenting with them will usually teach you more than just putting everything into a black-box fitting method.

Ridge regression is a good option, though, if you’ve decided to trust an algorithm rather than try to solve the problem on your own. Generally, it will prevent collinearity from inflating your results and leaving you with absurd results, such as a mother’s education level coefficient of +1,000 and a father’s education level coefficient of -998. However, I don’t particularly believe that it will create a good weight distribution among your independent variables—just that it will typically steer clear of blatantly absurd ones.

Lasso and ridge regression are excellent general-purpose tools. However, neither one can take the place of comprehending your data or provide solutions for your data-related issues.

Limitations of Linear Regression

We will include a couple more that are frequently disregarded while creating linear regression models:

In linear regressions, anomalies can cause anomalies. For example, if the majority of your data falls between 20 and 50 on the x-axis, but you have one or two points outside of this range at x = 200, this could greatly affect the findings of your regression. Similarly, this is a pretty considerable extrapolation and may not be accurate if you create your regression on the range x in (20,50) and then try to use that model to predict a y-value for x = 200.
Overfitting: Rather than only modeling the relationship between the variables, it is simple to overfit your model to the point where your regression starts to describe the random error (noise) in the data. This typically occurs when there are too many parameters about the total number of samples.
The purpose of linear regressions is to explain linear connections between variables. You will therefore have a poor model if there is a nonlinear link. On the other hand, you can occasionally make up for this by applying a log, square root, or other modification to some of the parameters.

Hyperparameter Tuning in Linear Regression

Hyperparameter tuning is a critical phase in the creation of a machine-learning model. Hyperparameters in machine learning are settings made before the start of the learning process. They exert substantial influence over the model’s performance and manage the learning process itself. Hyperparameters are specified by the data scientist or machine learning engineer, not by the data, unlike model parameters.

Finding the ideal set of hyperparameters to maximize the machine learning model’s performance on a particular dataset is the aim of hyperparameter tweaking. Effective hyperparameter tweaking will increase the model’s robustness, decrease overfitting, and increase predictive accuracy.

Generally, grid search, random search, Bayesian optimization, or more sophisticated approaches like genetic algorithms or reinforcement learning are used for hyperparameter tuning. These methods look for the set of hyperparameter combinations that yield the highest model performance by scanning a predetermined hyperparameter space.

In machine learning, hyperparameter tuning is generally used to optimize the model’s performance on unknown data, enhance its generalization skills, and fine-tune its behavior.