Spring 2025 Quiz 4

← return to practice.dsc10.com


This quiz was administered in-person. It was closed-book; students were not allowed to use the DSC 10 Reference Sheet. Students had 20 minutes to work on the quiz.

This quiz covered Lectures 21-24 of the Spring 2025 offering of DSC 10.


Problem 1

You want to test the following hypotheses:

Null Hypothesis: Everyone who applies for an internship at Google has a 20% chance of receiving a job offer, independently of all other applicants.

Alternative Hypothesis: Everyone who applies for an internship at Google has a more than 20% chance of receiving a job offer, independently of all other applicants.

To test these hypotheses, you collected information from 50 applicants and found that 16 of them received a job offer.


Problem 1.1

Fill in the blanks in the code below to calculate the p-value for a hypothesis test where the test statistic is the number of applicants, out of 50, who receive a job offer.

offers_array = np.array([])

for i in np.arange(10000):
    num_offers = ___(a)___
    offers_array = ___(b)___

p_value = ___(c)___
p_value

Answer:
(a): np.random.multinomial(50,[0.2,0.8])[0] or np.random.choice([0,1], 50, p = [0.80, 0.20]).sum()
(b): np.append(offers_array, num_offers)
(c): np.count_nonzero(offers_array >= 16)/10000 or np.mean(offers_array >= 16)


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 70%.


Problem 1.2

Suppose the p-value comes out to 0.03. What conclusion do we draw?

Answer: Option 2


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 62%.



Problem 1.3

Which of the following test statistics would have also been appropriate to test these hypotheses? Select all that apply.

Answer: Options 1 and 2


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.



Problem 2

According to Indeed, a popular job website, the hourly pay for data science interns across the US has a mean of 24 and a standard deviation of 6. You take a random sample of 64 data science interns. In your sample, the hourly pay has a mean of 25 and a standard deviation of 4. Suppose you bootstrap your sample 10,000 times, calculate the mean hourly pay from each resample, and plot a histogram of these resampled means. Which of the following best describes this histogram?

Answer: Option 4


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 50%.


Problem 3

You are interested in estimating the average wait time between an interview and an internship offer being made. You take a random sample of n internship offers and find that in this sample, the average wait time is d days and the standard deviation is 4 days.

You construct a 95% CLT-based confidence interval for the true average wait time, in days, which comes out to [10.4, 13.6]. Find n and d.

Answer:
n = 25
d = 12


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 65%.


👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.