Jan 21, 2025 6 min read

Regression in Machine Learning

Regression in Machine Learning

Generally, machine learning can be used to analyze data in four different ways. Two of which are predictive and two of which are descriptive:

  • Predictive: With these methods, supervised learning enables the machine to forecast outcomes based on established patterns. Predictive analysis includes:
  1. Classification: Assigning items to different labeled classes
  2. Regression: Identifying the connection between a dependent variable and one or more independent variables
  • Descriptive: With these methods, unsupervised learning enables the machine to detect patterns that reveal deeper insights into the data. Descriptive analysis includes:
  1. Clustering: Creating groups of like things
  2. Association: Identifying associations between things

Understanding Regression Analysis

To understand machine learning regression analysis, imagine those tube-shaped balloons you see at children's parties. You squeeze one end, and the other end expands. Release, and the balloon returns to normal. Squeeze both ends, the center expands. Release one end, and the expanded area moves to the opposite end. Each squeeze is an independent variable. Each bulge is a dependent variable; it differs depending on where you squeeze.

Now imagine a talented balloon sculptor twisting together five or six of these balloons to form a giraffe. Now the relationship between squeezing and expanding is more complex. If you squeeze the body, maybe the tail expands. If you squeeze the head, maybe two legs expand. Each change to the independent variable results in a change to one or more dependent variables. Sometimes that relationship is easy to predict, and other times may be very difficult.

Business Applications of Regression Analysis

Regression analysis is commonly used in the financial industry to analyze risk. For example, I once worked for a credit card company that was looking for a way to predict which customers would struggle to make their monthly payments. They used a regression algorithm to identify relationships between different variables and discovered that many customers start to use their credit card to pay for essentials just before they have trouble paying their bills. A customer who typically used their card only for large purchases, such as a television or computer, would suddenly start using it to buy groceries and gas and pay their electric bill. The company also discovered that people who had a lot of purchases of less than five dollars were likely to struggle with their monthly payments.

The dependent variable was whether the person would have enough money to cover the monthly payment. The independent variables were the items the customer purchased and the purchase amounts. Based on the results of the analysis, the credit card company could then decide whether to suspend the customer's account, reduce the account's credit line, or maintain the account's current status in order to limit the company's exposure to risk.

Businesses often use regression analysis to identify which factors contribute most to sales. For example, a company may want to know how it can get the most bang for its buck in terms of advertising; should it spend more money on its website, on social media, on television advertising, on pay-per-click (PPC) advertisements, and so on. Regression analysis can identify which items contribute most to not at all. The company can then use the results of that analysis to predict how its various advertising investments will perform.

Identifying the Dependent and Independent Variables

When performing regression analysis, the first step is to identify the dependent and independent variables:

  • Dependent variable is what you are trying to understand or predict; for example, whether a customer is about to miss one of his credit card payments.
  • Independent variables are factors that may have an impact on the dependent variable; for example, the customer's spending habits or purchase amounts prior to the date on which the next credit card payment is due.

An Important Reminder

Keep in mind that correlation does not prove causation. Just because regression analysis shows a correlation between an independent and a dependent variable, that does not mean that a change in the independent variable caused the change observed in the dependent variable, so avoid the temptation to assume it does.

Instead, perform additional research to prove or disprove the correlation or to dig deeper to find out what's really going on. For example, regression analysis may show a correlation between the use of certain colors on a web page and the amount of time users spend on those pages, but other unidentified factors may be contributing and perhaps to a greater degree. A web designer would be wise to run one or more experiments first before making any changes.

While regression analysis is very useful for identifying relationships among a dependent variable and one or more independent variables, use these relationships as a starting point for gathering more data and developing deeper insight into the data. Ask what the results mean and what else could be driving those results before drawing any hard and fast conclusions.

Frequently Asked Questions

What is regression in machine learning?

Regression in machine learning is a supervised learning technique that is used to predict a continuous outcome (dependent variable) based on one or more input variables (independent variables).

It is used to understand the relation between input and output data points, and is commonly applied in forecasting, trend analysis, and other predictive modeling tasks.

What are the different types of regression in machine learning?

There are different types of regression in machine learning:

  • linear regression
  • logistic regression
  • polynomial regression
  • lasso regression
  • ridge regression.

How linear regression used in machine learning?

We use linear regression to find a pattern between a thing we want to know (the dependent variable) and other things that might affect it (the independent variables).

We use linear regression to find a simple equation that fits the data we have. This helps us to make predictions about what might happen in the future. It also helps us to understand how each of the independent variables affects the dependent variable.

For example, we might use linear regression to predict how much a house will cost based on its size, location, and age. It's a powerful tool that helps us make sense of complex data and make informed decisions.

What is the difference between linear regression and logistic regression?

Linear regression predicts continuous outcomes, while logistic regression predicts binary outcomes (e.g., yes/no, true/false).

What are multiple linear regression and simple linear regression?

Simple linear regression involves modeling the relationship between one independent variable and one dependent variable.

Multiple linear regression, on the other hand, extends this concept by modeling the relationship between multiple independent variables and one dependent variable, allowing for a more comprehensive analysis of how multiple factors influence the outcome.

This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or LLMs. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and data ethics.

This newsletter is 100% human written 💪 (* aside from a quick run through grammar and spell check).

More sources

  1. https://www.seldon.io/machine-learning-regression-explained
  2. https://www.geeksforgeeks.org/regression-in-machine-learning/
  3. https://www.techtarget.com/searchenterpriseai/feature/What-is-regression-in-machine-learning
  4. https://www.javatpoint.com/regression-analysis-in-machine-learning
  5. https://www.simplilearn.com/regression-vs-classification-in-machine-learning-article
  6. https://www.upgrad.com/blog/types-of-regression-models-in-machine-learning/
  7. https://www.appier.com/en/blog/5-types-of-regression-analysis-and-when-to-use-them
  8. https://www.javatpoint.com/linear-regression-vs-logistic-regression-in-machine-learning
  9. https://www.spiceworks.com/tech/artificial-intelligence/articles/linear-regression-vs-logistic-regression/
  10. https://www.simplilearn.com/tutorials/machine-learning-tutorial/linear-regression-vs-logistic-regression
  11. https://www.javatpoint.com/machine-learning-polynomial-regression
  12. https://www.w3schools.com/python/python_ml_polynomial_regression.asp
  13. https://www.geeksforgeeks.org/python-implementation-of-polynomial-regression/
  14. https://www.simplilearn.com/what-is-python-polynomial-regression-in-machine-learning-article
  15. https://towardsdatascience.com/machine-learning-basics-polynomial-regression-3f9dd30223d1
  16. https://crunchingthedata.com/regression-overfitting/
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to The Human Side of Tech.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.