Supervised and Unsupervised Learning Algorithms

Like people, machines can learn through supervised and unsupervised machine learning, but human learning differs from machine learning. With humans, supervised learning consists of formal education. An instructor presents the material, students study it and are tested on it, and areas of weakness are addressed, hopefully to the point at which students achieve mastery in that given subject area. Unsupervised learning is experiential. You venture out in the world and engage in daily activities, learning on your own and from making mistakes.

Machine learning differs in that it involves only a couple forms of learning, and those are determined by what you want the machine to do:

Supervised learning if you want to use the machine for classification (assigning items to different labeled classes) or regression (identifying the connection between an independent and a dependent variable).
Unsupervised learning if you want to use the machine for clustering (creating groups of like things) and association (identifying associations between things).

Supervised Machine Learning Model

With supervised learning techniques, a human trainer labels items in a small data set often referred to as the training data set. The machine has an advantage of knowing how the human trainer has classified the data. For example, suppose you want to train a machine to be able to distinguish between spam email messages and not-spam email messages. You feed several examples of spam messages into the machine and tell the machine, "These are spam." Then, you feed several examples of not-spam messages into the machine and tell it, "These are not spam."

The machine identifies patterns in both message groups. There are certain patterns that are characteristic of spam and other patterns characteristic of not-spam. Now, when you feed a message into the machine that is not labeled spam or not-spam, the machine should be able to tell whether the message is or is not spam.

Unfortunately, machines make mistakes. A certain message may not have a clear pattern that characterizes it as either spam or not-spam, so it may send some messages that are not spam to the Spam folder and allow some spam messages to reach your Inbox. Your machine clearly needs more training.

Additional training occurs when you mark a message in the Spam folder as "not spam," or when you move a spam message from your inbox to the spam folder. This provides the machine with valuable feedback that enables it to fine-tune its neural network, increasing its accuracy.

Applications of Supervised Learning Model

Supervised learning tends to be more useful than unsupervised learning in the following applications:

Classification: Assigning items to different classes, such as spam and not-spam.
Regression: Identifying the connection between an independent and a dependent variable; for example, a customer's spending habits and their ability to pay the mortgage.

Classification and regression are both considered to be predictive because they can be used to forecast the probability that a given input will result in a given output. For example, if you use a regression algorithm to identify a relationship between family income and high school graduation rates, you can use that relationship to predict a student's likelihood of graduating by looking at the student's family income.

Unsupervised Machine Learning

With supervised learning, you feed the machine a data set and instruct it to group like items without providing it with labeled groups; the machine must determine the groups based on similarities and differences among the items in the data set.

For example, you might feed 1000 medical images into a machine and have it group the images based on patterns it detects in those images. The machine creates 10 groups and assigns images to each group. A doctor can then examine the different groups in an attempt to figure out why the machine grouped the images as it did. This examination is part of the unsupervised learning process. The benefit here is that the machine may identify patterns that doctors never thought to look for. These patterns may provide insights into diagnosis and treatment options.

Unsupervised learning tends to be more useful than supervised learning in the following applications:

Clustering: Splitting the data set into groups based on their similarities, as in the medical images example provided above, is an application of unsupervised learning techniques.
Identifying associations: Identifying two or more patterns that tend to occur together in a data set. For example, a retailer may use unsupervised learning to identify products that are often purchased together, in order to develop marketing strategies to increase the sales of those products.

Clustering and association are considered to be descriptive, as opposed to predictive, because they identify patterns that reveal insights into the data.

AI and Machine Learning

When starting your own artificial intelligence project, carefully consider the available data and what you want to do with that data. If you already have well defined categories that you want the machine to use to classify input, you probably want to stick with supervised learning. If you’re unsure how to group and categorize the data or you want to look at the data in a new way, unsupervised learning is probably the better approach because it enables the computer to identify similarities and differences you may have never considered otherwise.

Frequently Asked Questions

What is the difference between supervised and unsupervised learning in machine learning?

The main difference between supervised and unsupervised learning is that supervised learning uses labeled data to train models, while unsupervised learning works with unlabeled data to find patterns and relationships.

Supervised learning involves mapping inputs to already known outputs, whereas unsupervised learning aims to discover the underlying structure in data.

How do supervised and unsupervised machine learning models differ in their approach?

Supervised learning models are trained on input data along with the corresponding output labels, which the model learns to predict for new data.

Unsupervised learning models, on the other hand, do not have preset labels and instead identify hidden patterns or group data points into clusters based on their features.

What types of supervised algorithms are used in machine learning?

Common supervised learning algorithms include decision trees, linear regression, logistic regression, and support vector machines. These algorithms are designed to learn from labeled datasets and make accurate predictions on new, unseen data.

What are some examples of unsupervised learning model algorithms?

Examples of unsupervised learning algorithms include k-means clustering, hierarchical clustering, and association rule learning.

What are the advantages and disadvantages of supervised learning algorithms?

Advantages of supervised learning include high accuracy and the ability to provide specific outputs based on labeled data. Disadvantages include the need for large amounts of labeled data, which can be time-consuming and costly to collect.

In what scenarios would you use supervised vs unsupervised learning?

Supervised learning is typically used when the prediction or classification of data is needed, such as in spam detection, image recognition, or medical diagnosis where the labels are known and can provide meaningful outputs.

What is a learning model in machine learning algorithms?

A learning model in machine learning is an algorithm or a mathematical framework that learns patterns from input data. This model is trained using specific techniques, such as supervised or unsupervised learning, to make predictions or identify patterns in new data.

This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or LLMs. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and data ethics.

This newsletter is 100% human written 💪 (* aside from a quick run through grammar and spell check).