Skip to main content

5. Uncertainty - Sampling Rejection

 


Sampling Rejection To Avoid Redundant Samples In An Uncertain Environment


Introduction

Sampling is one technique of approximate inference. In sampling, each variable is sampled for a value according to its probability distribution.


There are many problems in probability, and more broadly in machine learning, where we cannot calculate an analytical solution directly. There may be an argument that exact inference may be intractable for most practical probabilistic models. For most probabilistic models of practical interest, exact inference is intractable, and so we have to resort to some form of approximation.


The desired calculation is typically a sum of a discrete distribution or integral of a continuous distribution and is intractable to calculate. The calculation may be intractable for many reasons, such as a large number of random variables, the stochastic nature of the domain, noise in the observations, the lack of observations, and more.


In problems of this kind, it is often possible to define or estimate the probability distributions for the random variables involved, either directly or indirectly via a computational simulation.


Instead of calculating the quantity directly, sampling can be used. Sampling provides a flexible way to approximate many sums and integrals at a, reduced cost.



Background/ Interest


This report is a part of the “Artificial Intelligence” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui. This is the best AI course in Bangladesh. 


In this course, we learned AI from scratch. We started from Basic python and ended in Natural language processing. We learned theoretical concepts, essential mathematics properly in the “CSE 417: Artificial Intelligence” course, then implemented our knowledge in the Lab course 'CSE 418: Artificial Intelligence Laboratory'. We have done a lot of lab sessions to master the course and gradually we learned each necessary concept of Artificial Intelligence. 

Now we can build our machine learning model and also can build a neural network to solve a complex problem.



Problem Statement

Consider The Bayesian Network Below. It’s From our Lab Report: 4 Uncertainty - The Bayesian Network & Inference


From the Bayesian Network, we can generate a sample space…


From the sample space can we calculate P(Train = On-Time)? 

If we want to answer a question, such as what is P(Train = on time), we can count the number of samples where the variable Train has the value on time, and divide the result by the total number of samples. This way, we have just generated an approximate probability for P(Train = on time).


We can also answer questions that involve conditional probability, such as P(Rain = light | Train = on time). 

In this case, we ignore all samples where the value of Train is not on time and then proceed as before. We count how many samples have the variable Rain = light among those samples that have Train = on time, and then divide by the total number of samples where Train = on time.


The Python code to implement Sampling Rejection

First, We need to train our Bayesian Network. Then we will name it model.py (you can choose any name)

'''

pomegranate is a python package that implements fast, efficient, and extremely flexible probabilistic models

ranging from probability distributions to Bayesian networks to mixtures of hidden Markov models.

'''

from pomegranate import *

from pomegranate.base import Node

 

#Rain has no parent Node. So it's of DiscreteDistribution

rain = Node(DiscreteDistribution(

   {

       'none': 0.7,

       'light': 0.2,

       'heavy': 0.1

   }

), name = 'rain')

 

#Maintanence Node is conditional on rain

maintanence = Node(ConditionalProbabilityTable(

   [

       ['none','yes',0.4],

       ['none','no', 0.6],

 

       ['light','yes',0.2],

       ['light','no',0.8],

 

       ['heavy','yes',0.1],

       ['heavy','no',0.9],

 

   ] ,[rain.distribution]) , name = 'maintanence')

 

#train Node is conditional on rain,maintanence

train = Node(ConditionalProbabilityTable(

   [

       ['none','yes','ontime',0.8],

       ['none','yes','delayed',0.2],

       ['none', 'no','ontime',0.9],

       ['none','no','delayed', 0.1],

 

       ['light','yes','ontime',0.6],

       ['light','yes','delayed',0.4],

       ['light', 'no','ontime',0.7],

       ['light','no','delayed', 0.3],

 

       ['heavy','yes','ontime',0.4],

       ['heavy','yes','delayed',0.6],

       ['heavy', 'no','ontime',0.5],

       ['heavy','no','delayed', 0.5],

 

   ], [rain.distribution, maintanence.distribution]),

   name = 'train')

 

#Appointment Node is conditional on train

appointment = Node(ConditionalProbabilityTable(

   [

       ['ontime','attend',0.9],

       ['ontime', 'miss',0.1],

      

       ['delayed','attend',0.6],

       ['delayed', 'miss',0.4],

 

   ], [train.distribution]), name = 'appointment')

 

#create a Bayesian Network to add states or Nodes

model = BayesianNetwork()

 

#Add Nodes

model.add_states(rain, maintanence, train, appointment)

 

#Add edges connecting Nodes

model.add_edge(rain,maintanence)

model.add_edge(rain,train)

model.add_edge(maintanence, train)

model.add_edge(train, appointment)

 

#Finalize the model

model.bake()

The Bayesian Network model is ready for further implementation. Now we will solve conditional probability problems, such as P (Rain = light | Train = On time).

import pomegranate

 

from collections import Counter

 

from model import model

 

def generate_sample():

 

   # Mapping of random variable name to sample generated

   sample = {}

 

   # Mapping of distribution to sample generated

   parents = {}

 

   # Loop over all states, assuming topological order

   for the state in model.states:

 

       # If we have a non-root node, sample conditional on parents

       if isinstance(state.distribution, pomegranate.ConditionalProbabilityTable):

           sample[state.name] = state.distribution.sample(parent_values=parents)

 

       # Otherwise, just sample from the distribution alone

       else:

           sample[state.name] = state.distribution.sample()

 

       # Keep track of the sampled value in the parents mapping

       parents[state.distribution] = sample[state.name]

 

   # Return generated sample

   return sample


Now, to compute P (Appointment | Train = delayed), which is the probability distribution of the Appointment variable given that the train is delayed, we do the following:

# Rejection sampling

# Compute distribution of Appointment given that train is delayed

N = 10000

data = []

 

# Repeat sampling 10,000 times

for i in range(N):

 

   # Generate a sample based on the function that we defined earlier

   sample = generate_sample()

 

   # If, in this sample, the variable of Train has the value delayed, save the sample. Since we are interested in the probability distribution of Appointment given that the train is delayed, we discard the sample where the train was on time.

   if sample["train"] == "delayed":

       data.append(sample["appointment"])

 

# Count how many times each value of the variable appeared. We can later normalize by dividing the results by the total number of saved samples to get the approximate probabilities of the variable that add up to 1.

print(Counter(data))



Result



Conclusion

In this lab report, we learned how sampling can provide inference approximately. We applied sampling to a problem.


However, you can develop your sampling program to reject unnecessary samples in an uncertain situation to build your AI agent more convenience by implementing the code snippet given above. This is a simple and easier implementation of the “ CSE 418: Artificial Intelligence Lab” course at City University, Dhaka, Bangladesh conducted by Nuruzzaman Faruqui Sir. Which is the best AI course in Bangladesh. 


You are free to copy the code from here or some other concluding remarks for your project.


Popular posts from this blog

7. Optimization - Hill-Climbing Algorithm with "Random Restart" variant

Optimizing Cost Function Through Hill Climbing Algorithm & Random Restart Variant Introduction Hill climbing is a mathematical optimization algorithm, which means its purpose is to find the best solution to a problem that has a (large) number of possible solutions. Explaining the algorithm (and optimization in general) is best done using an example. In the Travelling salesman problem, we have a salesman who needs to visit a number of cities exactly once, after which he returns to the first city. The distances between each pair of cities are known, and we need to find the shortest route. As you can imagine, there is (often) a large number of possible solutions (routes) to a specific Travelling salesman problem; the goal is to find the best (i.e. the shortest) solution. Hill Climbing Variants Due to the limitations of Hill Climbing, multiple variants have been thought of to overcome the problem of being stuck in local minima and maxima. What all variations of the algorithm have in co...

9. Machine Learning - Banknote Authentication

  Building Machine Learning Model to Classify Fraud Bank Note Introduction Machine Learning!!!🤪 The highly interesting technology in today's world 😉. I know that you have heard a lot about it. Today we are going to build a machine learning model to authenticate banknote where it's authentic or not 💵.  Whenever you go to the bank to deposit some cash, the cashier places banknotes in a machine that tells whether a banknote is genuine or counterfeit.  The machine uses some classification techniques to do it. There are many machine learning algorithms for classification. Classification is a type of supervised machine learning. There are multiple machine learning algorithms in the classification. We understand it's hard to know the theoretical concept of each algorithm as a beginner. If it's true for you,  there is nothing to panic about.🤪  We will implement the 'K nearest neighbor, Support vector machine, Perceptron learning & Gaussian naive Bayes' algorithm...

3. Knowledge- The Clue Game

Knowledge - Represent Knowledge Using Propositional Logic In Python Introduction Representing knowledge is the key issue in Artificial Intelligence. We can develop a knowledgeable AI agent that can analyze like humans. In this lesson, we will develop a game engine that will detect a murder based on its knowledge base.  If we want a machine to be intelligent enough to think like a human, then first the machine needs some information about the real-world situation/problem. The knowledge of the real world needs to be represented to the machine in the right manner that is readable to a computer system. Propositional logic is one of the simplest methods of knowledge representation to a machine. In this lesson, we will implement propositional logic to make our game engine knowledgeable, and then we will make able the engine to detect the murderer. Background/ Interest This report is a part of the “ CSE 418: Artificial Intelligence Lab” course at City University, Dhaka, Bangladesh conduct...