<aside> <img src="/icons/table_red.svg" alt="/icons/table_red.svg" width="40px" /> Table of Contents
</aside>
<aside> π‘
<aside> π‘
<aside> π‘
What Is Linear Regression?
Linear regression is an algorithm that provides a linear relationship between an independent variable and a dependent variable to predict the outcome of future events. It is a statistical method used in data science and machine learning for predictive analysis.
Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X) by fitting a linear equation to observed data.
The independent variable is also the predictor or explanatory variable that remains unchanged due to the change in other variables. However, the dependent variable changes with fluctuations in the independent variable. The regression model predicts the value of the dependent variable, which is the response or outcome variable being analyzed or studied.
Thus, linear regression is a supervised learning algorithm that simulates a mathematical relationship between variables and makes predictions for continuous or numeric variables such as sales, salary, age, product price, etc.
This analysis method is advantageous when at least two variables are available in the data, as observed in stock market forecasting, portfolio management, scientific analysis, etc.
A sloped straight line represents the linear regression model.
In the above figure,
Linear Regression
The simple linear regression equation is:
$$ Y = \beta_0 + \beta_1 X + \epsilon $$
Example Problem
Given Data:
X (Hours Studied) | Y (Exam Score) |
---|---|
1 | 50 |
2 | 60 |
3 | 70 |
4 | 80 |
Step 1: Compute Necessary Sums
$$ n = 4 \\ {} \\ \sum X = 1 + 2 + 3 + 4 = 10 \\ {} \\ \sum Y = 50 + 60 + 70 + 80 = 260 \\ {} \\ \sum XY = (1 \times 50) + (2 \times 60) + (3 \times 70) + (4 \times 80) = 50 + 120 + 210 + 320 = 700 \\ {} \\ \sum X^2 = 1^2 + 2^2 + 3^2 + 4^2 \\ {} \\ = 1 + 4 + 9 + 16 = 30 $$
Step 2: Compute Slope ($\beta_1$)
$$ \beta_1 = \frac{n \sum XY - \sum X \sum Y}{n \sum X^2 - (\sum X)^2} \\ {} \\ = \frac{4 \times 700 - 10 \times 260}{4 \times 30 - 10^2} \\ {} \\ = \frac{2800 - 2600}{120 - 100} \\ {} \\ = \frac{200}{20} = 10 $$
Step 3: Compute Intercept ($\beta_0$)
$$ \beta_0 = \frac{\sum Y - \beta_1 \sum X}{n} = \frac{260 - 10 \times 10}{4} = \frac{160}{4} = 40 $$
Final Regression Equation:
$$ Y = 40 + 10X $$
Prediction: If $X = 5$, then $Y = 40 + 10 \times 5 = 90$.