Building Machine Learning Model to Classify Fraud Bank Note
Introduction
Machine Learning!!!🤪 The highly interesting technology in today's world 😉. I know that you have heard a lot about it. Today we are going to build a machine learning model to authenticate banknote where it's authentic or not 💵.
Whenever you go to the bank to deposit some cash, the cashier places banknotes in a machine that tells whether a banknote is genuine or counterfeit. The machine uses some classification techniques to do it. There are many machine learning algorithms for classification. Classification is a type of supervised machine learning. There are multiple machine learning algorithms in the classification. We understand it's hard to know the theoretical concept of each algorithm as a beginner. If it's true for you, there is nothing to panic about.🤪
We will implement the 'K nearest neighbor, Support vector machine, Perceptron learning & Gaussian naive Bayes' algorithm & will explain the process of building a banknote authentication system. After reading this article, you will be able to understand how classification systems are built using machine learning algorithms.
Background/ Interest
This article is a part of the Lab Report of the “Artificial Intelligence” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui. This is the best AI course in Bangladesh.
In this course, we learned AI from scratch. We started from Basic python and ended in Natural language processing. We learned theoretical concepts, essential mathematics properly in the “CSE 417: Artificial Intelligence” course, then implemented our knowledge in the Lab course 'CSE 418: Artificial Intelligence Laboratory'.
We have done a lot of lab sessions to master the course and gradually we learned each necessary concept of Artificial Intelligence. Now we can build our machine learning model and also can build a neural network to solve a complex problem.
Problem Statement
Machine learning algorithms learn from the dataset. Therefore, to identify whether a banknote is real or not, we needed a dataset of real as well as fake banknotes along with their different features.
Some sources to download a free dataset are Kaggle, UCI machine learning repository, etc. We know that data is messy. A dataset may contain multiple missing values. In that situation, we have to clean the dataset. To avoid this kind of hassle we are going to use a pre-cleaned dataset. You can download the dataset (.CSV file) from my Github repository. Here’s the link (GitHub)
Figure: Banknotes.csv
The dataset contains a total of 1372 records of different banknotes. The four left columns are data that we can use to predict whether a note is genuine or counterfeit, which is external data provided by a human, coded as 0 and 1. Machine learning algorithms require data where features and labels are separated from each other. The label means the output class or output category. In our dataset, variance, skew, curtosis, and entropy are features whereas the class column contains the label.
Now we can train our model on this data set and see if we can predict whether new banknotes are genuine or not.
The Python code to train a machine learning model
# pip install scikit-learn
import csv
import random
from sklearn import svm
from sklearn.linear_model import Perceptron
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
#Loading Algorithm in model variable
model = svm.SVC()
#model = Perceptron()
# model = KNeighborsClassifier(n_neighbors=1)
# model = GaussianNB()
# Read data in from file
with open("banknotes.csv") as f:
reader = csv.reader(f)
next(reader)
data = []
for row in reader:
data.append({
"feature": [float(cell) for cell in row[:4]],
"label": "Authentic" if row[4] == "0" else "Counterfeit"
})
# Separate data into training and testing groups
holdout = int(0.40 * len(data))
random.shuffle(data)
testing = data[:holdout]
training = data[holdout:]
# Train model on the training set
X_training = [row["feature"] for row in training]
y_training = [row["label"] for row in training]
model.fit(X_training, y_training)
# Make predictions on the testing set
X_testing = [row["feature"] for row in testing]
y_testing = [row["label"] for row in testing]
predictions = model.predict(X_testing)
# Compute how well we performed
correct = 0
incorrect = 0
total = 0
for actual, predicted in zip(y_testing, predictions):
total += 1
if actual == predicted:
correct += 1
else:
incorrect += 1
# Print results
print(f"Results for model {type(model).__name__}")
print(f"Correct: {correct}")
print(f"Incorrect: {incorrect}")
print(f"Accuracy: {100 * correct / total:.2f}%")
Output
We have trained the model with the “support vector machine” algorithm. To train with another algorithm, simply comment out any model variable with a different algorithm and comment in others.
Result
Explanation of the code
Importing Required Modules
Before importing our dataset on our machine learning model and performing analysis, we need to import a few libraries. The following script is used to import python modules:
import csv
import random
Load the machine learning Algorithms from the sci-kit learn library
We used Support Vector Machines, K nearest neighbor, perceptron learning, Gaussian naive Bayes algorithm, which are the four most commonly used algorithms for machine learning classification problems.
Support Vector Machine
To train the supporter vector machine, we used the SVC class from the “sklearn.svm” module.
from sklearn import svm
model = svm.SVC()
Perceptron Learning
To train Perceptron Learning, we imported Perceptron from the “sklearn.linear_model” module.
from sklearn.linear_model import Perceptron
model = Perceptron()
Gaussian Naive Bayes
To train Gaussian Naive Bayes, we imported Gaussian from the “sklearn.naive_bayes” module.
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
K nearest neighbor
To train the supporter vector machine, we imported the KNeighborsClassifier from the “sklearn.neighbors” module.
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=1)
Script:
from sklearn import svm
from sklearn.linear_model import Perceptron
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
# Loading Algorithm in model
model = svm.SVC()
#model = Perceptron()
#model = KNeighborsClassifier(n_neighbors=1)
#model = GaussianNB()
Note that after importing the algorithms, we can choose which model to use. The rest of the code will stay the same.
Loading the Dataset
Once we import the libraries, the next step is to load the dataset into our application. To do so, we opened the file with core python file functionalities and used the “csv.reader()” function of the csv module, which reads the dataset that is in the CSV format.
# Read data from the file
with open("banknotes.csv") as f:
reader = csv.reader(f)
next(reader)
Divide the dataset into Feature & Labels
In our dataset, variance, skew, curtosis, and entropy are features whereas the class column contains the label. The following script along with the Loading part divides data into features and label sets. Then store the Feature & Label into a list data = []
# Read data from file
with open("banknotes.csv") as f:
reader = csv.reader(f)
next(reader)
data = []
for row in reader:
data.append({
"feature": [float(cell) for cell in row[:4]],
"label": "Authentic" if row[4] == "0" else "Counterfeit"
})
The for loop the index that we want to filter from our dataset, in the "feature": [float(cell) for cell in row[:4]], line we filtered column 0 to column 3 that contain our feature set. In "label": "Authentic" if row[4] == "0" else "Counterfeit", we only filtered records from column four which contains the labels (class). Where if the label is 0 then the note is authentic/ real and when the label is 0, the note is counterfeit/ fake.
4. Separate the dataset into training and testing groups
The training set is used to train the machine learning algorithms while the test set is used to evaluate the performance of the machine learning algorithms.
# Separate data into training and testing groups
holdout = int(0.40 * len(data))
random.shuffle(data)
testing = data[:holdout]
training = data[holdout:]
First, we calculate the length of the data list in holdout = int(0.40 * len(data)) and shuffle the elements of data for better performance using random.shuffle() function from the random module in random.shuffle(data)
Then we store 40% of the data in the Testing group & 60% data in the Training group.
Training the model on the training set
# Train model on the training set
X_training = [row["feature"] for row in training]
y_training = [row["label"] for row in training]
model.fit(X_training, y_training)
The training feature set is stored as x_traing, while the training label set is stored as y_training, then passed to the “fit()” method.
Testing the model on the testing set
After training the algorithm, we performed predictions on the test set. To make predictions, the “predict()” method is used. The records to be predicted are passed as parameters to the “predict()” method as shown below:
# Make predictions on the testing set
X_testing = [row["evidence"] for row in testing]
y_testing = [row["label"] for row in testing]
predictions = model.predict(X_testing)
Evaluating the model performance
We have evaluated the performance of the model through simple python code:
# Compute how well we performed
correct = 0
incorrect = 0
total = 0
for actual, predicted in zip(y_testing, predictions):
total += 1
if actual == predicted:
correct += 1
else:
incorrect += 1
Accuracy of the model
At the end, printed the accuracy of the model for better understanding
# Print results
print(f"Results for model {type(model).__name__}")
print(f"Correct: {correct}")
print(f"Incorrect: {incorrect}")
print(f"Accuracy: {100 * correct / total:.2f}%")
Conclusion
Banknote authentication is an important task. It is difficult to manually detect fake banknotes. Machine learning algorithms can help in this regard. In this article, we explained how we solved the problem of banknote authentication using machine learning techniques. We compared three different algorithms in terms of performance and concluded that the KNN & SVM algorithms are the best algorithms for banknote authentication with an accuracy of 100% & 99.45%.
However, you can build a model on your own to classify similar datasets (eg. Cancer tumor cell classification, Drug classification, sentiment analysis, etc...) by implementing the same/slightly modified code snippet given above. This was a simple and easier implementation of the “ CSE 418: Artificial Intelligence Lab” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui Sir. Which is the best AI course in Bangladesh.
You are free to copy the code from here or some other concluding remarks for your project.