Introduction
Regression analysis is a type of predictive modelling technique that looks at how an independent variable (s) and a dependent variable (target) are related (predictor). This method is used to discover the causal relationship between the variables and for forecasting, time series modelling, and forecasting. Regression is the ideal method for studying, for instance, the relationship between rash driving and the number of accidents a driver causes on the road.
For data analysis and modelling, regression analysis is a crucial technique. Here, we attempt to minimise the discrepancies between the data points' varying distances from the curve or line by fitting a curve or line to them.
Regression algorithms and techniques are one of the first algorithm new learners choose to learn but they often forget that there are multiple techniques that can be used to perform regression analysis. Regression techniques can not only helpful in performing linear regression tasks but also are used in classification tasks.
In this blog, we will see 7 such regression techniques that one should know
Table of contents:
- Linear regression
- Logistic regression
- Polynomial regression
- Stepwise regression
- Ridge regression
- Lasso regression
- ElasticNet regression
- Conclusion
- Linear regression
It is one of the most well-known modelling techniques. When learning predictive modelling, linear regression is typically one of the first few topics people choose. This technique uses a continuous dependent variable, one or more continuous or discrete independent variables, and a linear regression line.
The association between the dependent variable (Y) and one or more independent variables (X) is established using the best-fit straight line in linear regression (also known as a regression line).
It is represented by the equation Y=a+b*X + e, where a denotes the intercept, b is the slope of the line, and e is the error term. Based on the provided predictor variable(s), this equation can be used to predict the value of the target variable.
- Logistic regression
Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1. It is widely used in binary classification problems.
In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (the coefficients in the linear combination). Formally, in binary logistic regression, there is a single binary dependent variable, coded by an indicator variable, where the two values are labelled "0" and "1", while the independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value).
During the modelling process, to avoid underfitting and overfitting we should include all the significant variables from the dataset. Logistics regression requires a large sample size and it won’t work with a small sample size dataset.
- Polynomial regression
A regression equation is a polynomial regression equation if the power of the independent variable is more than one. The equation of polynomial regression could be y=a + bx^n, where n >= 2.
In this regression technique, the best fit line is not a straight line. It is rather a curve that fits into the data points.
In polynomial regression, there might be a temptation to fit a higher degree polynomial to get a lower error, this can result in over-fitting. Always plot the relationships to see the fit and focus on making sure that the curve fits the nature of the problem. Here is an example of how plotting can help.
- Stepwise regression
This stepwise regression is used when we deal with multiple independent variables. With the aid of an automated procedure and with no human involvement, this technique chooses the independent variables.
This accomplishment is accomplished by identifying significant variables by looking at statistical values like R-square, t-statistics, and AIC metric. In essence, stepwise regression involves fitting the regression model by introducing/removing covariates one at a time in accordance with a predetermined criterion.
- Ridge regression
When the data exhibits multicollinearity, the Ridge Regression method is applied (independent variables are highly correlated). Even though the least squares estimates (OLS) in multicollinearity are unbiased, their enormous variances cause the observed value to differ much from the true value. Ridge regression lowers the standard errors by biasing the regression estimates to some extent.
Ridge regression uses the shrinkage parameter to address the multicollinearity issue (lambda).
Below is the question of Ridge regression with a penalty term.
- Lasso regression
Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing variability and improving the accuracy of linear regression models.
Below is the question of Lasso regression with a penalty term.
The assumption of Lasso regression is the same as the least squared regression but normality is not assumed. Lasso is the regularization technique that used L1 regularization.
- ElasticNet regression
ElasticNet is a hybrid of Lasso and Ridge Regression techniques. It is trained with L1 and L2 prior to regularizes. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.
ElasticNet’s practical advantage of trading off between lasso and ridge is that ElasticNet inherits some of Ridge’s stability under rotation.
- Conclusion
In this article, we learned that regression is a type of predictive modelling technique that is used in predicting continuous variables given a certain set of features. It is widely used in finance, retail and marketing industries to predict stock prices, customer churn, revenue prediction, sales volume, etc. we saw 7 different regression techniques that one can use in various scenarios to optimize and build models.
References:
[1] https://mindmajix.com/regresssion-analsysis
[2] https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/#h2_3
[3] https://en.wikipedia.org/wiki/Logistic_regression