← return to practice.dsc10.com
This quiz was administered in-person. It was closed-book and
closed-note; students were not allowed to use the DSC
10 Reference Sheet. Students had 20 minutes to work on
the quiz.
This quiz covered Lectures 21-24 of the Winter 2024 offering
of DSC 10.
It can be hard to find a parking spot on UCSD’s campus! The
parking
DataFrame contains UCSD parking occupancy data for
two on-campus parking structures. The "Structure"
column
contains either "Gilman"
or "Hopkins"
. Each
row of parking
represents one day. The
"Occupancy"
column contains a float representing the
proportion of occupied spaces at noon on that day. We’ll use this data
to test the following hypotheses:
Null Hypothesis: At noon, Gilman and Hopkins are equally occupied. The observed differences in our samples are simply due to random chance.
Alternative Hypothesis: At noon, Hopkins is less occupied than Gilman. The observed differences in our samples cannot be explained by random chance alone.
As our test statistic, we will use the mean noontime
"Occupancy"
of Hopkins minus the mean noontime
"Occupancy"
of Gilman.
Suppose the Series s
is defined as below. Write an
expression involving s
that evaluates to the observed value
of the test statistic, and store the result in
observed
.
s = parking.groupby("Structure").mean().get("Occupancy")
observed = ______
What goes in the blank?
Answer:
s.loc["Hopkins"] - s.loc["Gilman"]
The average score on this problem was 61%.
In running the permutation test, we need to do a simulation that runs
many times, using a for
-loop. What should be the first
thing we do inside the for
-loop?
Initialize an empty array to store our results.
Define a variable for the number of repetitions.
Permute one of the columns of parking
.
Calculate the difference in group means.
Answer: Permute one of the columns of
parking
.
The average score on this problem was 79%.
Suppose we store 5000 simulated test statistics in the array
differences
. Choose the appropriate symbol to fill in the
calculation of the p-value below.
p_value = np.count_nonzero(differences ______ observed) / 5000
<
<=
==
!=
>
>=
Answer: <=
The average score on this problem was 65%.
Suppose the p-value of our test is 0.01 and we are testing the hypotheses at the 0.05 significance level. Which hypothesis is better supported by the data?
Null
Alternative
Answer: Alternative
The average score on this problem was 93%.
UCSD’s parking lots include A spaces for faculty, B spaces for staff, and S spaces for students.
The scatter plot on the left shows the relationship between the number of A spaces and the total number of parking spaces in each region of campus. Similarly, the scatter plot on the right shows the relationship between the number of B spaces and the total number of parking spaces in each region of campus. Note that these numbers represent a count of parking spaces existing on campus, and have nothing to do with occupancy.
Based on these scatter plots, which pair of variables has a larger correlation coefficient?
A Spaces and Total Spaces
B Spaces and Total Spaces
Answer: B Spaces and Total Spaces
The average score on this problem was 86%.
Which of the following variables would most likely be negatively associated with the total number of spaces in a campus region?
Total area of the campus region
Number of athletic fields in the campus region
Number of residents in the campus region
Number of electric vehicle charging stations in the campus region
Answer: Number of athletic fields in the campus region
The average score on this problem was 73%.
The number of S spaces and the total number of parking spaces in each campus region are linearly related with a correlation coefficient of 0.6. The number of S spaces in the Warren College region of campus is 5 standard deviations below average. What does the regression line predict for the number of total spaces in Warren College, measured in standard units?
Answer: -3
The average score on this problem was 64%.
True or False: Since number of parking spaces is always a positive integer, the regression line that predicts the total number of spaces from the number of S spaces must have a positive y-intercept.
True
False
Answer: False
The average score on this problem was 64%.