The *Logistic Regression *is a regression algorithm used to estimate the probability that an example belongs to a particular class (e.g. benign or malignant tumors). It is simple: the algorithm refers to the estimated probability, if it is higher than 0.5, then the model predicts that the example belongs to the positive class (y = 1), or otherwise it belongs to the negative class (y = 0). That is the reason why Logistic Regression is a *binary classifier*. Let’s see how we can estimate the probabilities.

## Probabilities Estimation

Given the weighted sum of the input features plus a bias term, the Logistic Regression model estimates probabilities by computing the output value of the *sigmoid function* . Mathematically, considering n-dimensional input vectors (examples characterized by n features), we can define our hypothesis function:

where and represents the sigmoid function, which is defined as . The sigmoid function is also called *logistic function, *and it is greater than 0.5 only when the parameter t is positive.

Once the hypothesis function has estimated the probability that an example belongs to the positive class, the resulting prediction is equal to 1, if is greater than 0.5, 0 otherwise.

## The Cost Function

How do we train the Logistic Regression model? We can do it by using an optimization algorithm, called *Gradient Descent*, which estimates the parameters vector that leads to the lowest cost. The cost function over the whole training set is defined as the average cost over all training examples:

This cost function has the property of being convex, so the Gradient Descent is guaranteed to find the global minimum.

## Let’s code

You can easily practice with logistic regression using our sample code on Github or by developing it by yourself. In this tutorial you’re going to perform logistic regression on the Iris Dataset. Let’s start by importing the LogisticRegression class from Scikit-Learn and some other stuff.

1 2 3 |
from sklearn.linear_model import LogisticRegression from sklearn import datasets import numpy as np |

Now it’s time to load up the dataset into a variable, and retrieve the training data and target labels.

1 2 3 4 5 6 7 8 |
#Loading the Iris Dataset dataset = datasets.load_iris() # Printing infos print (dataset['feature_names'], dataset['target_names']) X_training = dataset['data'] y_training = (dataset['target'] == 2).astype(np.int) # 1 if Iris-Virginica, else 0 |

You’re ready to train your logistic regression model. It can be done with just a single line of code.

1 2 3 |
logistic_regressor = LogisticRegression() # Training the model logistic_regressor.fit(X_training,y_training) |

In order to evaluate the trained model, we build an input example by generating 4 random input features and then we feed the classifier with it. If the example is classified as an Iris-Virginica, the output y* has to be equal to 1 (positive class), since we’re trying to answer to the question “Is this example an Iris-Virginica?”.

1 2 3 4 5 6 7 8 9 10 11 12 |
# Testing the model - generating some random flower X_testing = np.array([np.random.uniform(0.1,6), np.random.uniform(2.5,4), np.random.uniform(1.,6), np.random.uniform(0.2,3)]).reshape(1,-1) y_predict = logistic_regressor.predict(X_testing) # Is this example an Iris Virginica? 1 yes (positive class) , 0 no (negative class) print("Testing example {}".format(X_testing)) if y_predict == 1: print("Example is Iris-Virginica: y={}".format(y_predict)) else: print("Example is NOT Iris Virginica: y={}".format(y_predict)) |

Full code is available on Github.

Star Follow @vincenzosantopietro Watch