Artificial Intelligence has generated a lot of excitement recently due to the advances made by large language models (LLM). This success has motivated a large number of people to try to enter the field to benefit from its growth. However, most texts do not address the fundamental basis of these neural networks: the artificial neuron. We believe that this knowledge is the foundation for a solid understanding of artificial neural networks. In this tutorial, we will describe the functioning of an artificial neuron, also called logistic regression. Despite its simplicity, the artificial neuron is very useful for solving various classification problems, such as spam detection, diabetes prediction, credit granting, among others.
Classification of Machine Learning Systems
To better understand this type of technique, it is important to be familiar with a way of classifying machine learning models. Machine Learning is a sub-field of Artificial Intelligence, aiming at the development of systems that can learn and improve automatically from data or information acquisition. We can categorize machine learning models into supervised, unsupervised, and reinforcement learning.
In supervised models, the system learns from examples. In the case of unsupervised techniques, the system detects patterns by examining data without these patterns being presented beforehand. Finally, in the third class of models, reinforcement learning, the system learns from its actions and the feedback received in terms of rewards.
The Artificial Neuron, in the form of logistic regression, is a supervised learning technique. Supervised models can be further divided into classification systems and regression systems.
Logistic regression
In classification models, the system tries to identify which class is correct given an input. For example, based on a person’s financial data, the system attempts to determine whether it’s appropriate to lend money or deny the loan. Another example is when the system receives data about a specific animal and, based on that information, identifies whether it’s a mammal, reptile, bird, or fish.
In the case of regression, the system attempts to output a value based on the received data. For instance, using financial data, the system might try to predict the inflation rate — a technique commonly employed in the financial market.
Despite its name, logistic regression is used for classification. Classification can be binary, where there are only two classes, such as yes or no, positive or negative. It can also be multiclass, for example, classifying whether a word is a verb, noun, adjective, adverb, and so forth.
To distinguish logistic regression from linear regression, we can observe the graphical difference using an example with two inputs or dimensions. Using only two inputs makes visualization easier. In the case of linear regression applied to a set of points in a plane, our objective is to establish a line that effectively captures the underlying trend of point distribution in the plane.
Once this line is adjusted, we can use it to predict one axis value based on the other. If it’s a three-dimensional space, we’ll try to fit a plane. If there are more dimensions, we’ll attempt to fit a hyperplane.
In the case of logistic regression, what we aim for is to return a decision, such as yes or no, or a classification. So, drawing a line won’t help. Consider this simple example where a decision needs to be made on whether to lend money to an individual based on their salary. With previous loan data, it becomes challenging to fit a line that can answer this question.
Nevertheless, if you use a curve in an “S” shape, as opposed to a straight line, it becomes easier to make this adjustment. When entering the salary value into the curved function, if the value is closer to the upper part of the curve, the answer will be yes. Otherwise, it will be no. To transform the line into a curve, it is necessary to introduce non-linearity.
A widely used function to introduce this nonlinearity is the logistic function, hence the name logistic regression. Here we can see the general formula for this function . We would like to highlight the fact that it is a fraction, with the numerator being the number 1, and the denominator being equal to 1 or greater. This means that the value of the function is limited between 0 and 1. In the denominator, there is an exponentiation with the base being a mathematical constant called Euler’s number, whose value is approximately 2.718.
The logistic function has interesting characteristics. Firstly, when plotted on a two-dimensional graph, it takes the form of an “S”. This is why it is referred to as a sigmoid function. The second characteristic is that the values returned by the function range between 0 and 1. This makes it very suitable for binary classifications, where there are two classes for classification, such as yes or no, positive or negative, lend or deny a loan.
The third advantage is that it is a continuous function. This means that at any point on the curve, you can draw a tangent line and calculate the slope of the tangent at that point. This characteristic is used during the model fitting stage, that is, during its learning. If the model calculates a wrong value, we can calculate the slope at the point that was predicted and determine the direction in which we should adjust the model to reduce the error. We will see how this is done in future posts.
Calculation Example
Let’s now explore how logistic regression works in detail. To start, we need a dataset designed for classification purposes. For example, consider a scenario where we collect data from various individuals to assess whether they qualify for a loan. This dataset might include features such as their salary and the amount of money they wish to borrow.
In reality, companies use a much larger set of information to make such decisions. But for our example, these two pieces of information will be sufficient. Each type of information is called a feature.
We will employ a pre-classified dataset that denotes whether loans were approved. They are divided into two groups: one for learning, which will be used to train the model, and another for testing the model. At the end of the learning process, and after passing the tests with a predefined accuracy level, we can say that the system is ready to approve or deny new loan requests. Let’s now build our logistic regression model.
In the first part of the model, we take each input value, which in our example is the salary and loan request values. We multiply each by a weight, and then add them to a value known as bias. The resulting value is referred to as Z.
You may be wondering where these weight values and the bias value come from. Initially, these values are random and are adjusted during the model’s learning stage. They are called system parameters. Thus, the model learns which values should be assigned to the weights and bias to produce a correct output. The weights determine the importance assigned to each input attribute, while the bias corresponds to a general adjustment of the model.
Let’s illustrate this calculation with an example. Suppose a person earns $3000 and wants a loan of $10,000. Assume both weight 1 and weight 2 are set to 0.01. Also, suppose the bias value is set to one. In this case, the value of Z would be 131. This value doesn’t convey much information, and what we want to determine is whether we should or should not grant the loan. To do this, we will input the value of Z into the sigmoid function in the second step of the model’s execution.
In the second step, we will use the value of Z as an exponent in the denominator of the sigmoid formula. Upon performing the calculation, the final value will be close to 1.
This can be interpreted as a suggestion that the loan should be granted. In other words, any value equal to or greater than 0.5 can be considered a Yes, while values below 0.5 may be considered a No. However, the data table indicates that the final value should be 0, meaning the loan should be denied. An error has occurred, and the model needs to be adjusted to correct this mistake. This correction is carried out during the learning stage, and we will explain how this step is performed a bit later. Now, let’s provide a more graphical explanation of what has been done.
We can view the logistic regression calculation as a flow. We have input attributes, which can be seen as a sequence of values x1, x2, up to xn. These values are multiplied by their respective weights w1, w2, up to wn. The results of the multiplication are summed along with a bias value, generating a value Z.
This value is then applied to the sigmoid function, represented here by the Greek letter sigma. This function is also called the activation function, and, as we will see later, there are other possible activation functions besides the sigmoid function. The value generated by the activation function is the output value emitted by the network, represented here by the letter epsilon with a circumflex, also called epsilon hat. We use this notation to indicate that it is an estimated or calculated value, differentiating it from the expected or real value.
This operation is a metaphor for the functioning of a neuron. A neuron is connected to other neurons through filaments called dendrites. Neurons provide inputs to others through electrochemical stimuli. The strength of each stimulus depends on the strength of each connection, which is equivalent to the role played by weights in logistic regression. In a neuron, if the received stimuli surpass a certain threshold, the neuron fires, emitting an electrochemical signal through its axon, which is transmitted to other neurons. Due to this superficial resemblance and not reflecting the complexity of a neuron, logistic regression can be seen as an artificial neuron. And, as we will see later, the composition of these neurons in a network forms an artificial neural network.
Before we continue, let’s briefly discuss the notation used in our formulas. When a variable represents a single value, we use a regular lowercase letter. When it represents a vector or matrix, we use a lowercase letter in bold.
Now that we understand the calculation performed by logistic regression, let’s describe this step in a slightly more formal way. Given an input vector x with values x1, x2, up to xn, a weight vector w with values w1, w2, up to wn, and a bias value b. Then, the linear function Z is defined as the multiplication of the transpose of vector w with vector x, added to the bias.
The transpose of a vector is simply the transformation of a column vector into a row vector to facilitate multiplication. It is a transformation that is particularly helpful in matrix multiplication.
The multiplication of two vectors involves multiplying each element pair-wise and then summing these values, resulting in a single scalar value. When we apply the logistic function, or sigma activation function, using the value of Z as an argument, we obtain a value between 0 and 1.
Let’s consider another example. Given a vector with a salary value of 1 and a loan request value of 4, both values in thousands, and also given a weight vector with values 0.2 and 0.1, along with a bias value of 0.1, in this case, the value of Z would be 0.7. Applying the logistic function to this value yields a resulting value of 0.67.
Example of calculation in logistic regression.
Artificial Neuron Implementation
Now, let’s demonstrate how to implement this calculation using the Python programming language. First, let’s import the exp
function from the math module
, which, when given a number, returns Euler’s number raised to that power. We use this function to create our sigmoid function.
from math import exp
def sigmoide(x):
return 1 / (1 + exp(-x))
# Input X[0] Wage, x[1] Loan
X = [[3,10],[1.5,11.8],[5.5,20.0],[3.5,15.2],[3.1,14.5],
[7.6,15.5],[1.5,3.5],[6.9,8.5],[8.6,2.0],[7.66,3.5]]
Y = [0 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1, 1]
The input will be defined by a matrix of 10 rows and 2 columns, where the index 0 column contains the salary value, and the index 1 column contains the loan request value. The 10 rows represent the ten loan request cases.
To train the network, we also need the expected outputs, represented as a 10-position vector. In this vector, a value of 0 indicates that the request should be rejected, while a value of 1 signifies that the request should be accepted. The process of training the neuron will be detailed in the next post.
Next, we need to establish the initial values for the system parameters, namely the weights and bias. Let’s randomly choose values of 0.2 and 0.1 for the weights and 0.1 for the bias. After that, the program executes a loop, going through each request and calculating Z and the prediction. We also calculate the error for each request based on the difference between the prediction and the expected value. The program prints, for each request, the input values and what was calculated. See the code below.
Definition of parameters and calculation of outputs
m = len(X)
w=[0.2,0.1]
b=0.1
for j in range(m):
z = X[j][0]*w[0]+X[j][1]*w[1]+b
yhat = sigmoide(z)
# Calculates error
erro = yhat-Y[j]
print(" Wage:{0:5.2f} Wage:{1:5.2f} Expected value:{2} ".
format( X[j][0]*1000, X[j][1], Y[j]))
print(" z:{0:2.3f} yhat:{1:2.3f} error:{2:2.3f}\n ".format( z, yhat, erro))
Below is the initial segment of the program’s execution output. This calculation reflects the values of the current weights. If the output has errors, the weights need to be adjusted. For this, we have the learning stage. This stage will be described in the next Post.
We have reached the end of our Post. If this Post was useful to you, please consider leaving a comment.
Top comments (0)