Rejection Sampling algorithm
Rejection Sampling allows us to sample from a distribution for which we don't have the actual density. What we do need is a function that is proportional to the desired density. This is often the case in a Bayesian analysis -- we can readily obtain a function \(g(\underline{\theta})\) that is proportional to the posterior distribution of \(\underline{ \theta}\), but finding the exact density is much harder.
Given such a proportional function \(g\), we first need to obtain an envelope density \(w\) such that \(w(\underline{\theta}) \geq g(\underline{\theta})\) for all \(\underline{\theta}\). We may need to multiply \(g\) by some constant so that it is sufficiently small for there to be an envelope.
Suppose \(g(\theta) = \exp(-\theta^2/2) + 0.5\exp(-(\theta - 3)^2/2)\), which corresponds to a mixture of normals. It is possible to sample from this directly, but here we will use rejection sampling to demonstrate the technique.
Once we choose our envelope density, the algorithm (to obtain \(M\) samples) goes as follows:
- Generate a sample \(\theta^{(i)} \sim w(\theta)\).
- Generate a sample \(U_i \sim U(0,1)\) from the uniform distribution on \([0,1]\).
- If \(U_i < g(\theta^{(i)})/w(\theta^{(i)})\), accept \(\theta^{(i)}\). Otherwise, reject.
- Repeat steps 1-3 until \(M\) samples have been accepted.
Let us visualize \(g\) and choose an envelope density \(w\).
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt
def propDensity(x):
return np.exp(-x**2/2.0) + 0.5*np.exp(-(x - 3.0)**2/2.0)
g = propDensity
w = stats.t(1,loc=1) #choose the envelope to be the student's t, centered at 1
dom = np.linspace(-3,6,100) #this captures most of the mass of g
p1, = plt.plot(dom, g(dom))
p2, = plt.plot(dom, w.pdf(dom), linestyle = 'dashed')
plt.legend([p1,p2],['g','w'])
Note that our \(g\) function is too big. To redress this, we simply multiply by some small scalar. Preferably, we would like the envelope to dominate \(g\) by only a small amount, so that we accept the samples as frequently as possible.
p1, = plt.plot(dom, .09*g(dom))
p2, = plt.plot(dom, w.pdf(dom), linestyle = 'dashed')
plt.legend([p1,p2],['g','w'])
def rejectionSampling(propDensity, envelope, nSamples):
''' This function performs the rejection sampling algorithm, given
a proportional density from which to sample, a dominating
envelope, and the desired number of samples. Returns the
samples in a NumPy array. '''
U = stats.uniform()
samples = np.zeros(nSamples)
i = 0
iters = 0
while i < nSamples:
s = envelope.rvs()
if U.rvs() < (propDensity(s)/np.float(envelope.pdf(s))):
samples[i] = s
i += 1
iters += 1
print "Number of iterations: ", iters
return samples
def propDensityScaled(x):
return .09*(np.exp(-x**2/2.0) + 0.5*np.exp(-(x - 3.0)**2/2.0))
g = propDensityScaled
We are now ready to sample from the desired distribution. Below, we obtain 5000 samples and then plot a kernel density estimate of the data.
samples = rejectionSampling(g, w, 5000)
xmin = samples.min()
xmax = samples.max()
positions = np.linspace(xmin, xmax, 5000)
kernel = stats.gaussian_kde(samples)
Z = kernel(positions)
actual = (2.0/3)*(stats.norm().pdf(positions)) + (1.0/3)*(stats.norm(loc=3).pdf(positions))
p1, = plt.plot(positions, Z)
p2, = plt.plot(positions, actual, linestyle = "dotted")
plt.legend([p1,p2], ['KDE', 'actual'])
The two densities are very close, which is what we want. In general we would not be able to plot the actual density, but hopefully by now we have some confidence in the method.
Let's investigate the effect of a looser envelope. For now we will keep the same envelope as before, but scale the proportional density downward.
def propDensityScaled(x):
return .02*(np.exp(-x**2/2.0) + 0.5*np.exp(-(x - 3.0)**2/2.0))
g = propDensityScaled
samples = rejectionSampling(g, w, 5000)
Note that we required more than 4 times the number of iterations, since we were rejected far more frequently. It is better to choose an envelope that dominates the proportional density as tightly as possible.
Now we will choose a different envelope density entirely, say a normal distribution centered at 1 with standard deviation 2.
def propDensityScaled(x):
return .15*(np.exp(-x**2/2.0) + 0.5*np.exp(-(x - 3.0)**2/2.0))
g = propDensityScaled
w = stats.norm(loc=1, scale=2) #choose the envelope to be the normal density, centered at 1
dom = np.linspace(-3,6,100) #this captures most of the mass of g
p1, = plt.plot(dom, g(dom))
p2, = plt.plot(dom, w.pdf(dom), linestyle = 'dashed')
plt.legend([p1,p2],['g','w'])
samples = rejectionSampling(g, w, 5000)
xmin = samples.min()
xmax = samples.max()
positions = np.linspace(xmin, xmax, 5000)
kernel = stats.gaussian_kde(samples)
Z = kernel(positions)
actual = (2.0/3)*(stats.norm().pdf(positions)) + (1.0/3)*(stats.norm(loc=3).pdf(positions))
p1, = plt.plot(positions, Z)
p2, = plt.plot(positions, actual, linestyle = "dotted")
plt.legend([p1,p2], ['KDE', 'actual'])
We required fewer iterations, presumably because our envelope more closely matched the proportional density. Our kernel density estimate also appears to be a better fit than what we obtained originally.