Winter 2025 Quiz 4

← return to practice.dsc10.com


This quiz was administered in-person. Students were allowed a cheat sheet. Students had 20 minutes to work on the quiz.

This quiz covered Lectures 18-21 of the Winter 2025 offering of DSC 10.


Problem 1

You want to estimate the proportion of all flower fields in the world that use fertilizer. You will create a 68% confidence interval for this proportion, based on a sample of flower fields. You want your confidence interval to have a width of at most 0.01. Using the fact that the standard deviation of any dataset of 0s and 1s is no more than 0.5, calculate the minimum sample size required. Give your answer as an integer.

Answer: 10,000


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 53%.


Problem 2

A researcher collects data on many flowers, some of which have been treated with fertilizer and some of which have not (untreated). The researcher wants to determine whether there is a relationship between fertilizer use and the height of the flowers.


Problem 2.1

Choose the correct formulation of the null and alternative hypotheses.

Answer: B


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 70%.


Problem 2.2

True or False: We could use a permutation test to test these hypotheses.

Answer: True


Difficulty: ⭐️⭐️

The average score on this problem was 80%.



Problem 3

The DataFrame flowers contains information on a sample of flower fields. Each row corresponds to a different flower field. The "Fertilizer" column contains Boolean values corresponding to whether each flower field uses fertilizer.

You wonder if it could be the case that flower fields are fertilized at random, with each flower field having an 80% chance of being fertilized, independently of all others. You decide to do a hypothesis test to determine whether this could or could not be the case, testing at the p=0.05p=0.05 significance level. You will use as your test statistic the absolute difference between 0.8 and the proportion of fertilized flower fields.


Problem 3.1

Complete the implementation of the function one_stat, which calculates one simulated value of this test statistic, under the assumptions of the null hypothesis. Note that the optional argument p in np.random.choice specifies the probabilities with which each element is chosen (here, there is a 0.8 probability of selecting True).

def one_stat():
    sample_size = __(a)__
    random_choice = (np.random.choice([True, False], 
                     sample_size, p=[0.8, 0.2]))
    return __(b)__

(a): flowers.shape[0]
(b): abs(random_choice.sum() / sample_size - 0.8)


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 59%.



Problem 3.2

Complete the implementation of the function one_stat_differently, which also calculates one simulated value of this test statistic under the null.

def one_stat_differently():
    multi = np.random.multinomial(__(c)__)
    return __(d)__

(c): flowers.shape[0], [0.8, 0.2]
(d): abs(multi[0] / flowers.shape[0] - 0.8)


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 65%.


Problem 3.3

Fill in the blanks to calculate 10,000 simulated values of the test statistic and collect them in an array called many_stats. You can use the functions you’ve already written to help you.

        many_stats = __(e)__
        for i in np.arange(10000):
            many_stats = __(f)__

(e): np.array([])
(f): np.append(many_stats, one_stat())


Difficulty: ⭐️⭐️

The average score on this problem was 75%.


Problem 3.4

Suppose that the observed value of the test statistic is 0.04. What do we conclude?

Answer: Not enough information.


Difficulty: ⭐️⭐️⭐️⭐️⭐️

The average score on this problem was 15%.



Problem 3.5

True or False: We could construct a confidence interval to test these hypotheses.

Answer: True


Difficulty: ⭐️⭐️

The average score on this problem was 83%.



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.