5.4 Exercises
Conceptual
Q1. Using basic statistical properties of the variance, as well as singlevariable calculus, derive that the value of α which minimizes Var(αX+(1−α)Y) is:
α=σ2Y−σXYσ2X+σ2Y−2σXY
Sol: As we know that Var(aX+bY)=a2Var(X)+b2Var(Y)+2abCov(X,Y), the above quantity (that needs to be minimized) can be transformed as:
Var(αX+(1−α)Y)=α2Var(X)+(1−α)2Var(Y)+2α(1−α)Cov(X,Y)
Differentiating with respect to α and equation it to 0, we get:
2αVar(X)−2(1−α)Var(Y)+2(1−2α)Cov(X,Y)=0
α[Var(X)+Var(Y)−2Cov(X,Y)]=Var(Y)−Cov(X,Y)
α=Var(Y)−Cov(X,Y)Var(X)+Var(Y)−2Cov(X,Y)=σ2Y−σXYσ2X+σ2Y−2σXY
Q2. We will now derive the probability that a given observation is part of a bootstrap sample. Suppose that we obtain a bootstrap sample from a set of n observations.
(a) What is the probability that the first bootstrap observation is not the jth observation from the original sample? Justify your answer.
Sol: As the probability of jth observation being selected as the fisrt bootstrap sample is 1n, the probability that the first bootstrap observation is not the jth observation is 1−1n.
(b) What is the probability that the second bootstrap observation is not the jth observation from the original sample?
Sol: Same as above, as we are doing sampling with replacement.
(c) Argue that the probability that the jth observation is not in the bootstrap sample is (1−1n)n.
Sol: As we are selecting n observations and the probablity that the jth observation is not selected as one of the individual samples is 1−1n, the overall probability of jth sample not being selected is (1−1n)n.
(d) When n = 5, what is the probability that the jth observation is in the bootstrap sample?
Sol: Probability is 1−(1−15)5=1−0.32768= 0.67232.
(e) When n = 100, what is the probability that the jth observation is in the bootstrap sample?
Sol: Probability is 1−(1−1100)100=1−0.366= 0.634.
(f) When n = 10, 000, what is the probability that the jth observation is in the bootstrap sample?
Sol: Probability is 1−(1−110000)10000=1−0.36786= 0.63214.
(g) Create a plot that displays, for each integer value of n from 1 to 100, 000, the probability that the jth observation is in the bootstrap sample. Comment on what you observe.
Sol: The plot is displayed below. It can be observed that for a value of n=30, the value of probability reaches around 0.632.
import numpy as np
import matplotlib.pyplot as plt
def compute_probability(n):
return 1 - (1 - 1/n)**n
n_array = np.arange(1,100001)
prob = {}
for n in n_array:
prob[n] = compute_probability(n)
lists = sorted(prob.items())
x, y = zip(*lists)
fig = plt.figure(figsize=(15,8))
ax = fig.add_subplot(111)
plt.plot(x, y, color='r')
ax.set_xlabel('n')
ax.set_ylabel('Probability')
ax.set_title('Probability vs n')
ax.set_xlim(10, 100000)
ax.set_ylim(0.63, 0.64)
plt.show()
