← return to practice.dsc10.com
This quiz was administered in-person. It was closed-book and
closed-note; students were not allowed to use the DSC
10 Reference Sheet. Students had 20 minutes to work on
the quiz.
This quiz covered Lectures 13, 15-18 of the Fall 2024 offering of
DSC 10.
We plan to collect a sample of movies and use this sample to estimate
the proportion of all movies with the genre "Musical"
, a
population parameter.
If we want to create a 95\% confidence interval that is at most 0.08 wide, which of the expressions below represents the smallest sample size we should collect?
\left(\dfrac{1}{0.02}\right)^2
\left(\dfrac{1}{0.04}\right)^2
\left(\dfrac{1}{0.08}\right)^2
\left(\dfrac{1}{0.16}\right)^2
Answer: \left(\dfrac{1}{0.04}\right)^2
The average score on this problem was 41%.
Let W represent the maximum width of a 95\% confidence interval obtained from a sample that is twice as big as the sample size you found in part (a). Which of the following is true?
0 < W < 0.04
W = 0.04
0.04 < W < 0.08
W \geq 0.08
Answer: 0.04 < W < 0.08
The average score on this problem was 56%.
Consider the following pair of hypotheses:
Null: The proportion of "Comedy"
movies with an average rating above 9 equals the
proportion of "Action"
movies with a rating above
9.
Alternative: The proportion of
"Comedy"
movies with an average rating above 9 is
greater than the proportion of "Action"
movies with a rating above 9.
Let C_9 be the proportion of
"Comedy"
movies with a rating above 9 and let A_9 be the proportion of
"Action"
movies with a rating above 9. Which of the
following are valid test statistics to test these hypotheses?
Select all that apply.
C_9 - 0.5
A_9 - 0.5
A_9 \cdot C_9
A_9 + C_9
C_9 - A_9
A_9 - C_9
|A_9 - C_9|
|C_9 - A_9|
Answer: C_9 - A_9 and A_9 - C_9
The average score on this problem was 82%.
The table below shows how many movies in each of four genres Jack and Eric have seen. What is the total variation distance (TVD) between Jack and Eric’s distribution of movies by genre?
"Musical" |
"Comedy" |
"Action" |
"Horror" |
Total | |
---|---|---|---|---|---|
Jack | 2 | 14 | 2 | 2 | 20 |
Eric | 55 | 15 | 15 | 15 | 100 |
Answer: 0.15
The average score on this problem was 25%.
Suppose that in movies
, the average rating of
"Action"
movies is 7.4 and the average rating of
"Horror"
movies is 7.2. Based on this data, we decide to
test the following hypotheses:
Null: The ratings of "Action"
and
"Horror"
movies come from the same distribution.
Alternative: On average, "Action"
movies have a higher rating "Horror"
movies.
We’ll use as our test statistic the mean rating of
"Action"
movies minus the mean rating of
"Horror"
movies.
Fill in the blanks so the code below generates 5000 simulated values of this test statistic and calculates the p-value of our test.
def one_stat(df):
group_means = df.groupby("New").mean().get("Rating")
return group_means.loc[__(a)__] - group_mean.loc[__(b)__]
action_horror = movies[(movies.get("Genre") == "Action") |
(movies.get("Genre") == "Horror")]
diffs = np.array([])
for i in np.arange(5000):
new_df = action_horror.assign(New = __(c)__)
diffs = np.append(diffs, __(d)__)
p_value = np.count_nonzero( __(e)__ ) / 5000
"Action"
"Horror"
np.random.permutation(action horror.get("Genre"))
one stat(new df)
diffs >= 0.2
The average score on this problem was 74%.
Suppose that p_value
evaluates to 0.14. Using the standard p-value cutoff of
0.05, which of the two hypotheses is
better supported by the data?
Null
Alternative
Answer: Null
The average score on this problem was 84%.
What kind of hypothesis test did we perform in this question?
Standard hypothesis test
Permutation test
Answer: Permutation test
The average score on this problem was 84%.