Linear Regression Calculator

Calculate slope, intercept, R-squared, and correlation coefficient. Generate best-fit line equations for your data.

Quick Facts

Linear Regression
y = mx + b
Best-fit line equation
R-squared Range
0 to 1
1 = perfect fit
Correlation (r)
-1 to +1
Direction & strength
Minimum Points
2 data points
More is better

Regression Results

Calculated
Best-Fit Line Equation
y = 0x + 0
R-squared (R²)
0
Coefficient of Determination
Correlation (r)
0
Pearson Coefficient
Standard Error
0
Prediction Accuracy
Slope (b1)
0
Rate of Change
Intercept (b0)
0
Y-axis Intercept
Sample Size (n)
0
Data Points

Interpretation

Enter data to see interpretation.

Key Takeaways

  • Linear regression finds the best-fit straight line through your data points
  • R-squared tells you what percentage of variance is explained by the model
  • The slope shows how much Y changes for each unit increase in X
  • Correlation coefficient (r) ranges from -1 to +1, indicating direction and strength
  • More data points generally lead to more reliable regression results

What is Linear Regression?

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. Simple linear regression uses a single independent variable to predict the value of a dependent variable. It finds the best-fitting straight line through a set of data points, minimizing the sum of squared differences between observed and predicted values.

This technique is fundamental to statistics, data science, and machine learning. Linear regression helps researchers understand relationships between variables, make predictions, and identify trends in data. It serves as the foundation for more complex regression models and is essential for anyone working with quantitative data.

The Linear Regression Equation

y = b0 + b1 * x
y = Predicted value of the dependent variable x = Independent variable b0 = Y-intercept (value of y when x = 0) b1 = Slope (change in y for each unit change in x)

Understanding Regression Coefficients

Slope (b1)

The slope tells you how much the dependent variable changes for each one-unit increase in the independent variable. A positive slope indicates a positive relationship (as x increases, y increases), while a negative slope indicates an inverse relationship.

Slope Formula

b1 = (n * Sum(xy) - Sum(x) * Sum(y)) / (n * Sum(x^2) - (Sum(x))^2)

Y-Intercept (b0)

The y-intercept represents the predicted value of y when x equals zero. Depending on your data context, this may or may not have a meaningful interpretation. For example, if x represents years of experience and y represents salary, a y-intercept would represent the starting salary with zero experience.

b0 = mean(y) - b1 * mean(x)
Where mean(x) and mean(y) are the averages of x and y values respectively.

Coefficient of Determination (R-squared)

R-squared measures how well the regression line fits the data. It represents the proportion of variance in the dependent variable that is explained by the independent variable. R-squared values range from 0 to 1:

  • R-squared = 1: Perfect fit, the line explains 100% of the variance
  • R-squared = 0.9: Excellent fit, explains 90% of the variance
  • R-squared = 0.7: Good fit, explains 70% of the variance
  • R-squared = 0.5: Moderate fit, explains 50% of the variance
  • R-squared = 0: No linear relationship

Correlation Coefficient (r)

The correlation coefficient measures the strength and direction of the linear relationship between two variables. Values range from -1 to +1:

r Value Interpretation
0.9 to 1.0Very strong positive
0.7 to 0.9Strong positive
0.5 to 0.7Moderate positive
0.3 to 0.5Weak positive
-0.3 to 0.3Little to no correlation
-0.5 to -0.3Weak negative
-0.7 to -0.5Moderate negative
-0.9 to -0.7Strong negative
-1.0 to -0.9Very strong negative

Note that r-squared = R-squared for simple linear regression with one independent variable.

Assumptions of Linear Regression

Linearity

The relationship between x and y must be linear. Plot your data first to verify this assumption. If the relationship appears curved, consider polynomial regression or data transformation.

Independence

Observations must be independent of each other. This assumption is often violated in time-series data where consecutive observations may be correlated.

Homoscedasticity

The variance of residuals should be constant across all levels of x. If the spread of residuals increases or decreases with x, the assumption is violated (heteroscedasticity).

Normality

For statistical inference (hypothesis tests, confidence intervals), residuals should be normally distributed. This is less critical for prediction purposes with large samples.

Practical Example

Example: Study Hours vs. Exam Scores

X (Hours): 1, 2, 3, 4, 5, 6, 7, 8

Y (Score): 52, 58, 65, 71, 75, 82, 87, 91

Results:

Equation: y = 46.64 + 5.57x

R-squared = 0.993 (99.3% of variance explained)

r = 0.996 (very strong positive correlation)

Interpretation: Each additional hour of study is associated with an approximately 5.57-point increase in exam score.

Applications of Linear Regression

Business and Economics

  • Predicting sales based on advertising spend
  • Estimating demand based on price
  • Forecasting economic indicators

Science and Research

  • Analyzing experimental data
  • Establishing dose-response relationships
  • Calibrating measurement instruments

Healthcare

  • Predicting patient outcomes
  • Analyzing treatment effectiveness
  • Modeling disease progression

Limitations of Linear Regression

Correlation vs. Causation

A strong correlation does not imply causation. The regression relationship only describes association; establishing causality requires experimental design or additional evidence.

Extrapolation Risks

Predictions outside the range of observed data (extrapolation) may be unreliable. The linear relationship may not hold beyond the data range.

Outlier Sensitivity

Linear regression is sensitive to outliers, which can significantly influence the slope and intercept. Always examine your data for outliers and consider their impact.

Frequently Asked Questions

Correlation measures the strength of the linear relationship (r). Regression goes further by providing an equation to predict y from x. Correlation is symmetric (x and y can be swapped), while regression has distinct dependent and independent variables.

Use linear regression when you want to predict a continuous outcome variable from one or more predictor variables, and the relationship appears linear. Verify assumptions before relying on results for inference.

While you can calculate regression with as few as 2 points, meaningful analysis requires more. A common rule of thumb is at least 10-20 observations per predictor variable for reliable estimates.

A low R-squared doesn't necessarily mean the regression is useless. It may indicate that other variables affect y, or that y has high inherent variability. Consider adding predictors (multiple regression) or accepting that prediction precision is limited.