Spring 2025 Quiz 4

← return to practice.dsc10.com


This quiz was administered in-person. It was closed-book; students were not allowed to use the DSC 10 Reference Sheet. Students had 20 minutes to work on the quiz.

This quiz covered Lectures 21-24 of the Spring 2025 offering of DSC 10.


Note (groupby / pandas 2.0): Pandas 2.0+ no longer silently drops columns that can’t be aggregated after a groupby, so code written for older pandas may behave differently or raise errors. In these practice materials we use .get() to select the column(s) we want after .groupby(...).mean() (or other aggregations) so that our solutions run on current pandas. On real exams you will not be penalized for omitting .get() when the old behavior would have produced the same answer.


Problem 1

Lecture 20

You want to test the following hypotheses:

Null Hypothesis: Everyone who applies for an internship at Google has a 20% chance of receiving a job offer, independently of all other applicants.

Alternative Hypothesis: Everyone who applies for an internship at Google has a more than 20% chance of receiving a job offer, independently of all other applicants.

To test these hypotheses, you collected information from 50 applicants and found that 16 of them received a job offer.


Problem 1.1

Fill in the blanks in the code below to calculate the p-value for a hypothesis test where the test statistic is the number of applicants, out of 50, who receive a job offer.

offers_array = np.array([])

for i in np.arange(10000):
    num_offers = ___(a)___
    offers_array = ___(b)___

p_value = ___(c)___
p_value

Answer:
(a): np.random.multinomial(50,[0.2,0.8])[0] or np.random.choice([0,1], 50, p = [0.80, 0.20]).sum()
(b): np.append(offers_array, num_offers)
(c): np.count_nonzero(offers_array >= 16)/10000 or np.mean(offers_array >= 16)


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 70%.


Problem 1.2

Suppose the p-value comes out to 0.03. What conclusion do we draw?

Answer: Option 2


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 62%.



Problem 1.3

Which of the following test statistics would have also been appropriate to test these hypotheses? Select all that apply.

Answer: Options 1 and 2


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.



Problem 2

Lecture 17

According to Indeed, a popular job website, the hourly pay for data science interns across the US has a mean of 24 and a standard deviation of 6. You take a random sample of 64 data science interns. In your sample, the hourly pay has a mean of 25 and a standard deviation of 4. Suppose you bootstrap your sample 10,000 times, calculate the mean hourly pay from each resample, and plot a histogram of these resampled means. Which of the following best describes this histogram?

Answer: Option 4


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 50%.


Problem 3

Lecture 17

You are interested in estimating the average wait time between an interview and an internship offer being made. You take a random sample of n internship offers and find that in this sample, the average wait time is d days and the standard deviation is 4 days.

You construct a 95% CLT-based confidence interval for the true average wait time, in days, which comes out to [10.4, 13.6]. Find n and d.

Answer:
n = 25
d = 12


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 65%.


👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.