9. Machine Learning - Banknote Authentication

Building Machine Learning Model to Classify Fraud Bank Note

Introduction

Machine Learning!!!🤪 The highly interesting technology in today's world 😉. I know that you have heard a lot about it. Today we are going to build a machine learning model to authenticate banknote where it's authentic or not 💵.

Whenever you go to the bank to deposit some cash, the cashier places banknotes in a machine that tells whether a banknote is genuine or counterfeit. The machine uses some classification techniques to do it. There are many machine learning algorithms for classification. Classification is a type of supervised machine learning. There are multiple machine learning algorithms in the classification. We understand it's hard to know the theoretical concept of each algorithm as a beginner. If it's true for you, there is nothing to panic about.🤪

We will implement the 'K nearest neighbor, Support vector machine, Perceptron learning & Gaussian naive Bayes' algorithm & will explain the process of building a banknote authentication system. After reading this article, you will be able to understand how classification systems are built using machine learning algorithms.

Background/ Interest

This article is a part of the Lab Report of the “Artificial Intelligence” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui. This is the best AI course in Bangladesh.

In this course, we learned AI from scratch. We started from Basic python and ended in Natural language processing. We learned theoretical concepts, essential mathematics properly in the “CSE 417: Artificial Intelligence” course, then implemented our knowledge in the Lab course 'CSE 418: Artificial Intelligence Laboratory'.

We have done a lot of lab sessions to master the course and gradually we learned each necessary concept of Artificial Intelligence. Now we can build our machine learning model and also can build a neural network to solve a complex problem.

Problem Statement

Machine learning algorithms learn from the dataset. Therefore, to identify whether a banknote is real or not, we needed a dataset of real as well as fake banknotes along with their different features.

Some sources to download a free dataset are Kaggle, UCI machine learning repository, etc. We know that data is messy. A dataset may contain multiple missing values. In that situation, we have to clean the dataset. To avoid this kind of hassle we are going to use a pre-cleaned dataset. You can download the dataset (.CSV file) from my Github repository. Here’s the link (GitHub)

Figure: Banknotes.csv

The dataset contains a total of 1372 records of different banknotes. The four left columns are data that we can use to predict whether a note is genuine or counterfeit, which is external data provided by a human, coded as 0 and 1. Machine learning algorithms require data where features and labels are separated from each other. The label means the output class or output category. In our dataset, variance, skew, curtosis, and entropy are features whereas the class column contains the label.

Now we can train our model on this data set and see if we can predict whether new banknotes are genuine or not.

The Python code to train a machine learning model

# pip install scikit-learn

import csv

import random

from sklearn import svm

from sklearn.linear_model import Perceptron

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier

#Loading Algorithm in model variable

model = svm.SVC()

#model = Perceptron()

# model = KNeighborsClassifier(n_neighbors=1)

# model = GaussianNB()

# Read data in from file

with open("banknotes.csv") as f:

reader = csv.reader(f)

next(reader)

data = []

for row in reader:

data.append({

"feature": [float(cell) for cell in row[:4]],

"label": "Authentic" if row[4] == "0" else "Counterfeit"

})

# Separate data into training and testing groups

holdout = int(0.40 * len(data))

random.shuffle(data)

testing = data[:holdout]

training = data[holdout:]

# Train model on the training set

X_training = [row["feature"] for row in training]

y_training = [row["label"] for row in training]

model.fit(X_training, y_training)

# Make predictions on the testing set

X_testing = [row["feature"] for row in testing]

y_testing = [row["label"] for row in testing]

predictions = model.predict(X_testing)

# Compute how well we performed

correct = 0

incorrect = 0

total = 0

for actual, predicted in zip(y_testing, predictions):

total += 1

if actual == predicted:

correct += 1

else:

incorrect += 1

# Print results

print(f"Results for model {type(model).__name__}")

print(f"Correct: {correct}")

print(f"Incorrect: {incorrect}")

print(f"Accuracy: {100 * correct / total:.2f}%")

Output

We have trained the model with the “support vector machine” algorithm. To train with another algorithm, simply comment out any model variable with a different algorithm and comment in others.

Result

Model	Correct	Incorrect	Accuracy
Perceptron	539	9	98.36%
SVM	545	3	99.45%
KNN	548	0	100%
GaussianNB	457	91	83.39%

Explanation of the code

Importing Required Modules

Before importing our dataset on our machine learning model and performing analysis, we need to import a few libraries. The following script is used to import python modules:

import csv

import random

Load the machine learning Algorithms from the sci-kit learn library

We used Support Vector Machines, K nearest neighbor, perceptron learning, Gaussian naive Bayes algorithm, which are the four most commonly used algorithms for machine learning classification problems.

Support Vector Machine

To train the supporter vector machine, we used the SVC class from the “sklearn.svm” module.

from sklearn import svm

model = svm.SVC()

Perceptron Learning

To train Perceptron Learning, we imported Perceptron from the “sklearn.linear_model” module.

from sklearn.linear_model import Perceptron

model = Perceptron()

Gaussian Naive Bayes

To train Gaussian Naive Bayes, we imported Gaussian from the “sklearn.naive_bayes” module.

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

K nearest neighbor

To train the supporter vector machine, we imported the KNeighborsClassifier from the “sklearn.neighbors” module.

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=1)

Script:

from sklearn import svm

from sklearn.linear_model import Perceptron

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier

# Loading Algorithm in model

model = svm.SVC()

#model = Perceptron()

#model = KNeighborsClassifier(n_neighbors=1)

#model = GaussianNB()

Note that after importing the algorithms, we can choose which model to use. The rest of the code will stay the same.

Loading the Dataset

Once we import the libraries, the next step is to load the dataset into our application. To do so, we opened the file with core python file functionalities and used the “csv.reader()” function of the csv module, which reads the dataset that is in the CSV format.

# Read data from the file

with open("banknotes.csv") as f:

reader = csv.reader(f)

next(reader)

Divide the dataset into Feature & Labels

In our dataset, variance, skew, curtosis, and entropy are features whereas the class column contains the label. The following script along with the Loading part divides data into features and label sets. Then store the Feature & Label into a list data = []

# Read data from file

with open("banknotes.csv") as f:

reader = csv.reader(f)

next(reader)

data = []

for row in reader:

data.append({

"feature": [float(cell) for cell in row[:4]],

"label": "Authentic" if row[4] == "0" else "Counterfeit"

})

The for loop the index that we want to filter from our dataset, in the "feature": [float(cell) for cell in row[:4]], line we filtered column 0 to column 3 that contain our feature set. In "label": "Authentic" if row[4] == "0" else "Counterfeit", we only filtered records from column four which contains the labels (class). Where if the label is 0 then the note is authentic/ real and when the label is 0, the note is counterfeit/ fake.

4. Separate the dataset into training and testing groups

The training set is used to train the machine learning algorithms while the test set is used to evaluate the performance of the machine learning algorithms.

# Separate data into training and testing groups

holdout = int(0.40 * len(data))

random.shuffle(data)

testing = data[:holdout]

training = data[holdout:]

First, we calculate the length of the data list in holdout = int(0.40 * len(data)) and shuffle the elements of data for better performance using random.shuffle() function from the random module in random.shuffle(data)

Then we store 40% of the data in the Testing group & 60% data in the Training group.

Training the model on the training set

# Train model on the training set

X_training = [row["feature"] for row in training]

y_training = [row["label"] for row in training]

model.fit(X_training, y_training)

The training feature set is stored as x_traing, while the training label set is stored as y_training, then passed to the “fit()” method.

Testing the model on the testing set

After training the algorithm, we performed predictions on the test set. To make predictions, the “predict()” method is used. The records to be predicted are passed as parameters to the “predict()” method as shown below:

# Make predictions on the testing set

X_testing = [row["evidence"] for row in testing]

y_testing = [row["label"] for row in testing]

predictions = model.predict(X_testing)

Evaluating the model performance

We have evaluated the performance of the model through simple python code:

# Compute how well we performed

correct = 0

incorrect = 0

total = 0

for actual, predicted in zip(y_testing, predictions):

total += 1

if actual == predicted:

correct += 1

else:

incorrect += 1

Accuracy of the model

At the end, printed the accuracy of the model for better understanding

# Print results

print(f"Results for model {type(model).__name__}")

print(f"Correct: {correct}")

print(f"Incorrect: {incorrect}")

print(f"Accuracy: {100 * correct / total:.2f}%")

Conclusion

Banknote authentication is an important task. It is difficult to manually detect fake banknotes. Machine learning algorithms can help in this regard. In this article, we explained how we solved the problem of banknote authentication using machine learning techniques. We compared three different algorithms in terms of performance and concluded that the KNN & SVM algorithms are the best algorithms for banknote authentication with an accuracy of 100% & 99.45%.

However, you can build a model on your own to classify similar datasets (eg. Cancer tumor cell classification, Drug classification, sentiment analysis, etc...) by implementing the same/slightly modified code snippet given above. This was a simple and easier implementation of the “ CSE 418: Artificial Intelligence Lab” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui Sir. Which is the best AI course in Bangladesh.

You are free to copy the code from here or some other concluding remarks for your project.

AI Sticky Notes

Search This Blog

9. Machine Learning - Banknote Authentication

Introduction

Background/ Interest

Problem Statement

The Python code to train a machine learning model

Output

Result

Explanation of the code

Importing Required Modules

Load the machine learning Algorithms from the sci-kit learn library

Support Vector Machine

Perceptron Learning

Gaussian Naive Bayes

K nearest neighbor

Loading the Dataset

Divide the dataset into Feature & Labels

4. Separate the dataset into training and testing groups

Training the model on the training set

Testing the model on the testing set

Evaluating the model performance

Accuracy of the model

Conclusion

Labels

Popular posts from this blog

7. Optimization - Hill-Climbing Algorithm with "Random Restart" variant

3. Knowledge- The Clue Game