Skip to main content

9. Machine Learning - Banknote Authentication

 Building Machine Learning Model to Classify Fraud Bank Note

Introduction

Machine Learning!!!🤪 The highly interesting technology in today's world 😉. I know that you have heard a lot about it. Today we are going to build a machine learning model to authenticate banknote where it's authentic or not 💵. 

Whenever you go to the bank to deposit some cash, the cashier places banknotes in a machine that tells whether a banknote is genuine or counterfeit.  The machine uses some classification techniques to do it. There are many machine learning algorithms for classification. Classification is a type of supervised machine learning. There are multiple machine learning algorithms in the classification. We understand it's hard to know the theoretical concept of each algorithm as a beginner. If it's true for you,  there is nothing to panic about.🤪 

We will implement the 'K nearest neighbor, Support vector machine, Perceptron learning & Gaussian naive Bayes' algorithm & will explain the process of building a banknote authentication system. After reading this article, you will be able to understand how classification systems are built using machine learning algorithms.


Background/ Interest

This article is a part of the Lab Report of the “Artificial Intelligence” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui. This is the best AI course in Bangladesh. 

In this course, we learned AI from scratch. We started from Basic python and ended in Natural language processing. We learned theoretical concepts, essential mathematics properly in the “CSE 417: Artificial Intelligence” course, then implemented our knowledge in the Lab course 'CSE 418: Artificial Intelligence Laboratory'. 

We have done a lot of lab sessions to master the course and gradually we learned each necessary concept of Artificial Intelligence. Now we can build our machine learning model and also can build a neural network to solve a complex problem.


Problem Statement

Machine learning algorithms learn from the dataset. Therefore, to identify whether a banknote is real or not, we needed a dataset of real as well as fake banknotes along with their different features. 

Some sources to download a free dataset are Kaggle, UCI machine learning repository, etc. We know that data is messy. A dataset may contain multiple missing values. In that situation, we have to clean the dataset. To avoid this kind of hassle we are going to use a pre-cleaned dataset. You can download the dataset (.CSV file)  from my Github repository. Here’s the link (GitHub)

Figure: Banknotes.csv

The dataset contains a total of 1372 records of different banknotes. The four left columns are data that we can use to predict whether a note is genuine or counterfeit, which is external data provided by a human, coded as 0 and 1. Machine learning algorithms require data where features and labels are separated from each other. The label means the output class or output category. In our dataset, variance, skew, curtosis, and entropy are features whereas the class column contains the label.


Now we can train our model on this data set and see if we can predict whether new banknotes are genuine or not.


The Python code to train a machine learning model

# pip install scikit-learn

import csv

import random

 

from sklearn import svm

from sklearn.linear_model import Perceptron

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier

 

#Loading Algorithm in model variable

model = svm.SVC()

#model = Perceptron()

# model = KNeighborsClassifier(n_neighbors=1)

# model = GaussianNB()

 

# Read data in from file

with open("banknotes.csv") as f:

   reader = csv.reader(f)

   next(reader)

 

   data = []

   for row in reader:

       data.append({

           "feature": [float(cell) for cell in row[:4]],

           "label": "Authentic" if row[4] == "0" else "Counterfeit"

       })

 

# Separate data into training and testing groups

holdout = int(0.40 * len(data))

random.shuffle(data)

testing = data[:holdout]

training = data[holdout:]

 

# Train model on the training set

X_training = [row["feature"] for row in training]

y_training = [row["label"] for row in training]

model.fit(X_training, y_training)

 

# Make predictions on the testing set

X_testing = [row["feature"] for row in testing]

y_testing = [row["label"] for row in testing]

predictions = model.predict(X_testing)

 

# Compute how well we performed

correct = 0

incorrect = 0

total = 0

for actual, predicted in zip(y_testing, predictions):

   total += 1

   if actual == predicted:

       correct += 1

   else:

       incorrect += 1

 

# Print results

print(f"Results for model {type(model).__name__}")

print(f"Correct: {correct}")

print(f"Incorrect: {incorrect}")

print(f"Accuracy: {100 * correct / total:.2f}%")

 




Output

We have trained the model with the “support vector machine” algorithm. To train with another algorithm, simply comment out any model variable with a different algorithm and comment in others.

Result

Model

Correct

Incorrect

Accuracy

Perceptron

539

9

98.36%

SVM

545

3

99.45%

KNN

548

0

100%

GaussianNB

457

91

83.39%


Explanation of the code

  1. Importing Required Modules 

Before importing our dataset on our machine learning model and performing analysis, we need to import a few libraries. The following script is used to import python modules:

import csv

import random

  1. Load the machine learning Algorithms from the sci-kit learn library

We used Support Vector Machines, K nearest neighbor, perceptron learning, Gaussian naive Bayes algorithm, which are the four most commonly used algorithms for machine learning classification problems.

  • Support Vector Machine

To train the supporter vector machine, we used the SVC class from the “sklearn.svm” module. 

from sklearn import svm

model = svm.SVC()

  • Perceptron Learning

To train Perceptron Learning, we imported Perceptron from the “sklearn.linear_model” module. 

from sklearn.linear_model import Perceptron

model = Perceptron()

  • Gaussian Naive Bayes

To train Gaussian Naive Bayes, we imported Gaussian from the “sklearn.naive_bayes” module. 

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

  • K nearest neighbor 

To train the supporter vector machine, we imported the KNeighborsClassifier from the “sklearn.neighbors” module. 

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=1)


Script:

from sklearn import svm

from sklearn.linear_model import Perceptron

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier


#  Loading Algorithm in model

model = svm.SVC()

#model = Perceptron()

#model = KNeighborsClassifier(n_neighbors=1)

#model = GaussianNB()


Note that after importing the algorithms, we can choose which model to use. The rest of the code will stay the same.

  1. Loading the Dataset

Once we import the libraries, the next step is to load the dataset into our application. To do so, we opened the file with core python file functionalities and used the “csv.reader()” function of the csv module, which reads the dataset that is in the CSV format.

# Read data from the file

with open("banknotes.csv") as f:

    reader = csv.reader(f)

    next(reader)

  1. Divide the dataset into Feature & Labels

In our dataset, variance, skew, curtosis, and entropy are features whereas the class column contains the label. The following script along with the Loading part divides data into features and label sets. Then store the Feature & Label into a list  data = []

# Read data from file

with open("banknotes.csv") as f:

    reader = csv.reader(f)

    next(reader)


    data = []

    for row in reader:

        data.append({

            "feature": [float(cell) for cell in row[:4]],

            "label": "Authentic" if row[4] == "0" else "Counterfeit"

        })


The for loop the index that we want to filter from our dataset, in the "feature": [float(cell) for cell in row[:4]], line we filtered column 0 to column 3 that contain our feature set. In "label": "Authentic" if row[4] == "0" else "Counterfeit", we only filtered records from column four which contains the labels (class). Where if the label is 0 then the note is authentic/ real and when the label is 0, the note is counterfeit/ fake.

4. Separate the dataset into training and testing groups

The training set is used to train the machine learning algorithms while the test set is used to evaluate the performance of the machine learning algorithms.

# Separate data into training and testing groups

holdout = int(0.40 * len(data))

random.shuffle(data)

testing = data[:holdout]

training = data[holdout:]


First, we calculate the length of the data list in holdout = int(0.40 * len(data)) and shuffle the elements of data for better performance using random.shuffle() function from the random module in random.shuffle(data)

Then we store 40% of the data in the Testing group & 60% data in the Training group.

  1. Training the model on the training set

# Train model on the training set

X_training = [row["feature"] for row in training]

y_training = [row["label"] for row in training]

model.fit(X_training, y_training)


The training feature set is stored as x_traing, while the training label set is stored as y_training, then passed to the “fit()” method.

  1. Testing the model on the testing set

After training the algorithm,  we performed predictions on the test set. To make predictions, the “predict()” method is used. The records to be predicted are passed as parameters to the “predict()” method as shown below:

# Make predictions on the testing set

X_testing = [row["evidence"] for row in testing]

y_testing = [row["label"] for row in testing]

predictions = model.predict(X_testing)

  1. Evaluating the model performance

We have evaluated the performance of the model through simple python code:

# Compute how well we performed

correct = 0

incorrect = 0

total = 0

for actual, predicted in zip(y_testing, predictions):

    total += 1

    if actual == predicted:

        correct += 1

    else:

        incorrect += 1

  1. Accuracy of the model 

At the end,  printed the accuracy of the model for better understanding

# Print results

print(f"Results for model {type(model).__name__}")

print(f"Correct: {correct}")

print(f"Incorrect: {incorrect}")

print(f"Accuracy: {100 * correct / total:.2f}%")


Conclusion

Banknote authentication is an important task. It is difficult to manually detect fake banknotes. Machine learning algorithms can help in this regard. In this article, we explained how we solved the problem of banknote authentication using machine learning techniques. We compared three different algorithms in terms of performance and concluded that the KNN & SVM algorithms are the best algorithms for banknote authentication with an accuracy of 100% & 99.45%. 


However, you can build a model on your own to classify similar datasets (eg. Cancer tumor cell classification, Drug classification, sentiment analysis, etc...) by implementing the same/slightly modified code snippet given above.  This was a simple and easier implementation of the “ CSE 418: Artificial Intelligence Lab” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui Sir. Which is the best AI course in Bangladesh. 


You are free to copy the code from here or some other concluding remarks for your project. 


Popular posts from this blog

7. Optimization - Hill-Climbing Algorithm with "Random Restart" variant

Optimizing Cost Function Through Hill Climbing Algorithm & Random Restart Variant Introduction Hill climbing is a mathematical optimization algorithm, which means its purpose is to find the best solution to a problem that has a (large) number of possible solutions. Explaining the algorithm (and optimization in general) is best done using an example. In the Travelling salesman problem, we have a salesman who needs to visit a number of cities exactly once, after which he returns to the first city. The distances between each pair of cities are known, and we need to find the shortest route. As you can imagine, there is (often) a large number of possible solutions (routes) to a specific Travelling salesman problem; the goal is to find the best (i.e. the shortest) solution. Hill Climbing Variants Due to the limitations of Hill Climbing, multiple variants have been thought of to overcome the problem of being stuck in local minima and maxima. What all variations of the algorithm have in co

3. Knowledge- The Clue Game

Knowledge - Represent Knowledge Using Propositional Logic In Python Introduction Representing knowledge is the key issue in Artificial Intelligence. We can develop a knowledgeable AI agent that can analyze like humans. In this lesson, we will develop a game engine that will detect a murder based on its knowledge base.  If we want a machine to be intelligent enough to think like a human, then first the machine needs some information about the real-world situation/problem. The knowledge of the real world needs to be represented to the machine in the right manner that is readable to a computer system. Propositional logic is one of the simplest methods of knowledge representation to a machine. In this lesson, we will implement propositional logic to make our game engine knowledgeable, and then we will make able the engine to detect the murderer. Background/ Interest This report is a part of the “ CSE 418: Artificial Intelligence Lab” course at City University, Dhaka, Bangladesh conducted b