← return to practice.dsc10.com
This quiz was administered in-person. It was closed-book and
closed-note; students were not allowed to use the DSC
10 Reference Sheet. Students had 20 minutes to work on
the quiz.
This quiz covered Lectures 13-17 of the Winter 2024 offering
of DSC 10.
Note (groupby / pandas 2.0): Pandas 2.0+ no longer
silently drops columns that can’t be aggregated after a
groupby, so code written for older pandas may behave
differently or raise errors. In these practice materials we use
.get() to select the column(s) we want after
.groupby(...).mean() (or other aggregations) so that our
solutions run on current pandas. On real exams you will not be penalized
for omitting .get() when the old behavior would have
produced the same answer.
Suppose we’ve imported the scipy module. To the nearest
0.5, what does the following expression evaluate to?
scipy.stats.norm.cdf(-2) * 100
Answer: 2.5
The average score on this problem was 57%.
Select all the true statements below.
The average of the deviations from the mean is a meaningful measure of the spread of the data.
It is possible for the standard deviation of a dataset to equal zero.
It is possible for the standard deviation of a dataset to be negative.
Given the standard deviation of a dataset, we can determine its mean.
Given the standard deviation of a dataset, we can determine its variance.
Answer: Option 2 and Option 5
The average score on this problem was 80%.
The Oscars, or Academy Awards, are the highest awards in the film
industry, awarded each year to the best movies of that year. The
oscars DataFrame contains a row for each movie that has
ever been nominated for an Oscar. The "name" column
contains the name of the movie and the "rating" column
contains a rating of the movie on a 0 to 100 scale. This number
incorporates many factors, but we won’t worry about how it is
computed.

Fill in the blanks below to collect a simple random
sample of 400 movies from the oscars DataFrame,
then calculate 10,000 bootstrapped sample mean ratings.
my_sample = __(x)__
n_resamples = 10000
boot_means = np.array([])
for i in np.arange(n_resamples):
resample = __(y)__
mean = __(z)__
boot_means = np.append(boot_means, mean)
Answer (x): oscars.sample(400)
The average score on this problem was 85%.
Answer (y):
my_sample.sample(400, replace=True)
The average score on this problem was 87%.
Answer (z):
resample.get("rating").mean()
The average score on this problem was 96%.
In each blank, circle the word that correctly fills in the
sentence.
A histogram of boot_means shows a(n)
probability / empirical distribution
of a statistic / parameter.
Answer: empirical, statistic
The average score on this problem was 77%.
Suppose we use the array boot_means to calculate a 90%
confidence interval for the mean rating of Oscar-nominated movies.
Select all correct conclusions we can draw about this
interval.
There is a 90% chance that the true mean rating of all Oscar-nominated movies falls within this interval.
The sample mean rating is within 90% of the true mean rating of all Oscar-nominated movies.
If we looked at the ratings of many Oscar-nominated movies, about 90% of them would fall within this range.
None of the above.
Answer: None of the above.
The average score on this problem was 74%.
Suppose both of the following expressions evaluate to
True.
my_sample.get("rating").mean() == 61.25
np.std(my_sample.get("rating")) == 15
What are the left and right endpoints of a 95% CLT-based confidence interval for the mean rating of Oscar-nominated movies?
Answer: left endpoint: 59.75, right endpoint: 62.75
The average score on this problem was 54%.