Processing math: 100%

ISLR Chapter 5: Resampling Methods (Part 3: Exercises - Conceptual)

Posted by Amit Rajan on Friday, May 18, 2018

5.4 Exercises

Conceptual

Q1. Using basic statistical properties of the variance, as well as singlevariable calculus, derive that the value of α which minimizes Var(αX+(1α)Y) is:

α=σ2YσXYσ2X+σ2Y2σXY

Sol: As we know that Var(aX+bY)=a2Var(X)+b2Var(Y)+2abCov(X,Y), the above quantity (that needs to be minimized) can be transformed as:

Var(αX+(1α)Y)=α2Var(X)+(1α)2Var(Y)+2α(1α)Cov(X,Y)

Differentiating with respect to α and equation it to 0, we get:

2αVar(X)2(1α)Var(Y)+2(12α)Cov(X,Y)=0

α[Var(X)+Var(Y)2Cov(X,Y)]=Var(Y)Cov(X,Y)

α=Var(Y)Cov(X,Y)Var(X)+Var(Y)2Cov(X,Y)=σ2YσXYσ2X+σ2Y2σXY

Q2. We will now derive the probability that a given observation is part of a bootstrap sample. Suppose that we obtain a bootstrap sample from a set of n observations.

(a) What is the probability that the first bootstrap observation is not the jth observation from the original sample? Justify your answer.

Sol: As the probability of jth observation being selected as the fisrt bootstrap sample is 1n, the probability that the first bootstrap observation is not the jth observation is 11n.

(b) What is the probability that the second bootstrap observation is not the jth observation from the original sample?

Sol: Same as above, as we are doing sampling with replacement.

(c) Argue that the probability that the jth observation is not in the bootstrap sample is (11n)n.

Sol: As we are selecting n observations and the probablity that the jth observation is not selected as one of the individual samples is 11n, the overall probability of jth sample not being selected is (11n)n.

(d) When n = 5, what is the probability that the jth observation is in the bootstrap sample?

Sol: Probability is 1(115)5=10.32768= 0.67232.

(e) When n = 100, what is the probability that the jth observation is in the bootstrap sample?

Sol: Probability is 1(11100)100=10.366= 0.634.

(f) When n = 10, 000, what is the probability that the jth observation is in the bootstrap sample?

Sol: Probability is 1(1110000)10000=10.36786= 0.63214.

(g) Create a plot that displays, for each integer value of n from 1 to 100, 000, the probability that the jth observation is in the bootstrap sample. Comment on what you observe.

Sol: The plot is displayed below. It can be observed that for a value of n=30, the value of probability reaches around 0.632.

import numpy as np
import matplotlib.pyplot as plt

def compute_probability(n):
    return 1 - (1 - 1/n)**n

n_array = np.arange(1,100001)
prob = {}
for n in n_array:
    prob[n] = compute_probability(n)

lists = sorted(prob.items())
x, y = zip(*lists)

fig = plt.figure(figsize=(15,8))
ax = fig.add_subplot(111)
plt.plot(x, y, color='r')
ax.set_xlabel('n')
ax.set_ylabel('Probability')
ax.set_title('Probability vs n')
ax.set_xlim(10, 100000)
ax.set_ylim(0.63, 0.64)

plt.show()