Cost Functions in Neural Networks

Machines often learn the same way humans do. They make mistakes and often pay the price for doing so. For example, when you’re first learning to drive, you merge onto the highway and are driving 55 mph in a 65 mph zone. This scenario can be likened to an initial iteration of logistic regression in artificial intelligence applications. Other drivers are honking at you, passing you on the left and right, giving you dirty looks, and making rude gestures. You get the message and start driving the speed limit, much like an artificial neural network adjusts its parameters based on error rates.

Cars are still passing you on the left and right, and their drivers appear to be annoyed. You start driving 75 mph to blend in with the traffic. Using artificial intelligence in cars could one day optimize this process. You are rewarded by feeling the excitement of driving faster and by reaching your destination more quickly. Soon, you are so comfortable driving 75 mph that you start driving 80 mph.

One day, you hear a siren, and you see a state trooper’s car close behind you with a flashing red light. You get pulled over and issued a ticket for $200, so you slow it down and now routinely drive about 5 to 9 mph over the speed limit.

During this entire scenario, you learn through a process of trial and error by paying for your mistakes. You pay by being embarrassed for driving too slowly or you pay by getting pulled over and issued a warning or ticket or by getting into or causing an accident. You also learn by being rewarded.

Cost Function in Machine Learning

With machine learning, your goal is to make your machine as accurate as possible. This is true whether the machine’s purpose is to make predictions, identify patterns in medical images, or drive a car. One way to improve accuracy in machine learning is to use the cost function. This is a mathematical operation that compares the network’s output (the predicted answer) to the targeted output (the correct answer) to determine the accuracy of the machine.

In other words, the cost function tells the network how wrong it was, so the network can make adjustments to be less wrong (and more right) in the future. As a result, the network pays for its mistakes and learns by trial and error. The cost is higher when the network is making bad or sloppy classifications or predictions — typically early in its training phase.

What Does the Machine Learn? - Machine Learning Model

Machines learn different lessons depending on the model, which often employs various cost functions to use for different optimizations.

In a simple linear regression model, the machine learns the relationship between an independent variable and a dependent variable; for example, the relationship between the size of a home and its cost. With linear regression, the relationship can be graphed as a straight line, as shown in the figure.

During the learning process, the machine can adjust the model in several ways. It can move the line up or down, left or right, or change the line’s slope, so that it more accurately represents the relationship between home size and square footage. The resulting model is what the machine learns. It can then use this model to predict the asking price of a home when provided with the home’s size.

Cost Function Limitation

The cost function has one major limitation, it does not tell the machine what to adjust, by how much, or in which direction. However, using a loss function can help mitigate this issue. It only indicates the accuracy of the output. For the machine to be able to make the necessary adjustments, the cost function must be combined with another function that provides the necessary guidance, such as gradient descent, which I'll cover in my next newsletter.

Frequently Asked Questions

What are some common types of cost functions used in machine learning Algorithms?

Common types of cost functions used in machine learning include mean squared error (MSE), mean absolute error (MAE), cross-entropy, and hinge loss. These functions are tailored to specific types of problems, such as regression or classification problems.

How does mean squared error work in linear regression?

Mean squared error (MSE) in linear regression measures the average squared difference between the predicted values and the actual values. By minimizing the MSE, the algorithm optimizes weights and biases to improve model accuracy.

What role does the activation function play in a neural network?

The activation function determines the output value of a neuron given an input value.

The activation function adds non-linearity to the neural network, which allows the neural network to learn and model complex data.

Can you explain how gradient descent helps in minimizing the cost function?

Gradient descent is an optimization algorithm that helps minimize the cost function. It does this by iteratively adjusting weights and biases.

It calculates the gradient (partial derivatives) of the cost function with respect to the model parameters. It then adjusts them in the opposite direction to reduce the value of the cost function.

How does the learning rate affect the minimization of the cost function?

The learning rate controls how big the steps are during gradient descent.

A higher learning rate makes the steps bigger. This can help you reach the lowest point faster. But, it can also make you miss the minimum value.

A lower learning rate makes the steps smaller. This makes the updates more exact. But, it can take more steps to reach the minimum value.

This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or LLMs. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and data ethics.

This newsletter is 100% human written 💪 (* aside from a quick run through grammar and spell check).