Fall 2024 Quiz 4

← return to practice.dsc10.com


This quiz was administered in-person. It was closed-book and closed-note; students were not allowed to use the DSC 10 Reference Sheet. Students had 20 minutes to work on the quiz.

This quiz covered Lectures 13, 15-18 of the Fall 2024 offering of DSC 10.


Problem 1

We plan to collect a sample of movies and use this sample to estimate the proportion of all movies with the genre "Musical", a population parameter.


Problem 1.1

If we want to create a 95\% confidence interval that is at most 0.08 wide, which of the expressions below represents the smallest sample size we should collect?

Answer: \left(\dfrac{1}{0.04}\right)^2


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 41%.


Problem 1.2

Let W represent the maximum width of a 95\% confidence interval obtained from a sample that is twice as big as the sample size you found in part (a). Which of the following is true?

Answer: 0.04 < W < 0.08


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 56%.



Problem 2

Consider the following pair of hypotheses:

Let C_9 be the proportion of "Comedy" movies with a rating above 9 and let A_9 be the proportion of "Action" movies with a rating above 9. Which of the following are valid test statistics to test these hypotheses? Select all that apply.

Answer: C_9 - A_9 and A_9 - C_9


Difficulty: ⭐️⭐️

The average score on this problem was 82%.


Problem 3

The table below shows how many movies in each of four genres Jack and Eric have seen. What is the total variation distance (TVD) between Jack and Eric’s distribution of movies by genre?

  "Musical" "Comedy" "Action" "Horror" Total
Jack 2 14 2 2 20
Eric 55 15 15 15 100

Answer: 0.15


Difficulty: ⭐️⭐️⭐️⭐️⭐️

The average score on this problem was 25%.


Problem 4

Suppose that in movies, the average rating of "Action" movies is 7.4 and the average rating of "Horror" movies is 7.2. Based on this data, we decide to test the following hypotheses:

We’ll use as our test statistic the mean rating of "Action" movies minus the mean rating of "Horror" movies.


Problem 4.1

Fill in the blanks so the code below generates 5000 simulated values of this test statistic and calculates the p-value of our test.

def one_stat(df):
    group_means = df.groupby("New").mean().get("Rating")
    return group_means.loc[__(a)__] - group_mean.loc[__(b)__]

action_horror = movies[(movies.get("Genre") == "Action") | 
                       (movies.get("Genre") == "Horror")]
diffs = np.array([])
for i in np.arange(5000):
    new_df = action_horror.assign(New = __(c)__)
    diffs = np.append(diffs, __(d)__)

p_value = np.count_nonzero( __(e)__ ) / 5000

  • (a): "Action"
  • (b): "Horror"
  • (c): np.random.permutation(action horror.get("Genre"))
  • (d): one stat(new df)
  • (e): diffs >= 0.2

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 74%.



Problem 4.2

Suppose that p_value evaluates to 0.14. Using the standard p-value cutoff of 0.05, which of the two hypotheses is better supported by the data?

Answer: Null


Difficulty: ⭐️⭐️

The average score on this problem was 84%.



Problem 4.3

What kind of hypothesis test did we perform in this question?

Answer: Permutation test


Difficulty: ⭐️⭐️

The average score on this problem was 84%.



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.