Understanding Confusion Matrix In Classification With An Example Of Cyber Attack

4 min readJun 3, 2021

Let’s classify cyber-attacks happened or not by classification & what is the role of the confusion matrix for classify cyber-attacks happened or not.

Classification

The term classification means to find the target from a particular set of values.

Ex. Rain will happen today or not.

so in this example, we are having two possibilities: Rain will happen and rain will not happen.

Machine learning will use the concept classification for predicting the values.

Classification can be done using a logistic regression algorithm.

What the term confusion matrix mean??

The confusion matrix is a two * two matrix used for finding the performance of the classification model.

For understanding the concept of the confusion matrix I will take the example of cybercrime.

What is cybercrime?

Cybercrime is a criminal attack that performs for information theft, money earning, etc. Cybercrime is done by hackers. Hecker term a person who performs malicious activities on computer systems and network devices.

Cybercrime performs for various reason:

Stealing of personal data
Identity stolen
For stealing organizational data
Steal bank card details.
Hack emails for gaining information.

etc…

The target of the machine learning classification model

In this scenario, the machine learning model informs to organization cyber attack is happed on their system so they can soon take immediate action for this cyber attack. The machine learning model was trained using a past cyberattack dataset. In the dataset, information is of what changes happened at the system after the cyber attack happened on the system.

Confusion Matrix

In this matrix value which is left heading are Labeling Actual values and values that show at the top are predicted values.

There is a total of 165 values.

There can be two values: positive & negative.

→ Positive means something good is happening.

Cyber Attack hasn’t happened

→ Negative means something wrong is happening.

Cyber Attack happened.

→ True mean machine predicting the right result.

→ False mean machine predicting the wrong result.

There is some keyword in the above diagram: TN, FP, FN, TP.

TN(True Negative):

Machine predicted cyber-attack happened and this is right attack actually happened.

In the above diagram, 50 results are TN. Machine predicted right 50 attacks happened. An organization will take action for this.

TP(True Positive):

Machine predicted cyber-attack hasn't happened and this is right actually attack hasn’t happened.

In the above diagram, 100 results are TP. It is good that cyber-attack not happened and machines also predicted the right result.

For better accuracy of the machine learning model, this value should be maximum.

FP(False Positive):

Machine predicted attack hasn’t happened and this is the wrong result actually cyber-attack has happened.

In the above diagram, 10 results are FP.

FP also called a Type I error.

This is the most critical value because actually cyber attack happened and machine learning model haven’t informed the organization. And this causes huge losses to the organization. Because they are not able to get information at the right time and they haven't taken any immediate action after the attack happened.

FN(False Negative):

Machine predicted attack happened and this is the wrong result actually cyber-attack hasn’t happened.

FP also called a Type II error.

In the above diagram, 5 results are FN.

For calculating the accuracy of the model:

(TP+TN)/total = (100+50)/165 = 0.91

so 91% model is accurate for checking cyber-attacks happened or not.