Winter 2024 Quiz 6

← return to practice.dsc10.com


This quiz was administered in-person. It was closed-book and closed-note; students were not allowed to use the DSC 10 Reference Sheet. Students had 20 minutes to work on the quiz.

This quiz covered Lectures 21-24 of the Winter 2024 offering of DSC 10.


It can be hard to find a parking spot on UCSD’s campus! The parking DataFrame contains UCSD parking occupancy data for two on-campus parking structures. The "Structure" column contains either "Gilman" or "Hopkins". Each row of parking represents one day. The "Occupancy" column contains a float representing the proportion of occupied spaces at noon on that day. We’ll use this data to test the following hypotheses:

As our test statistic, we will use the mean noontime "Occupancy" of Hopkins minus the mean noontime "Occupancy" of Gilman.


Problem 1


Problem 1.1

Suppose the Series s is defined as below. Write an expression involving s that evaluates to the observed value of the test statistic, and store the result in observed.

    s = parking.groupby("Structure").mean().get("Occupancy")
    observed = ______

What goes in the blank?

Answer: s.loc["Hopkins"] - s.loc["Gilman"]


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 61%.


Problem 1.2

In running the permutation test, we need to do a simulation that runs many times, using a for-loop. What should be the first thing we do inside the for-loop?

Answer: Permute one of the columns of parking.


Difficulty: ⭐️⭐️

The average score on this problem was 79%.


Problem 1.3

Suppose we store 5000 simulated test statistics in the array differences. Choose the appropriate symbol to fill in the calculation of the p-value below.

p_value = np.count_nonzero(differences ______ observed) / 5000

Answer: <=


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 65%.


Problem 1.4

Suppose the p-value of our test is 0.01 and we are testing the hypotheses at the 0.05 significance level. Which hypothesis is better supported by the data?

Answer: Alternative


Difficulty: ⭐️

The average score on this problem was 93%.



Problem 2

UCSD’s parking lots include A spaces for faculty, B spaces for staff, and S spaces for students.

The scatter plot on the left shows the relationship between the number of A spaces and the total number of parking spaces in each region of campus. Similarly, the scatter plot on the right shows the relationship between the number of B spaces and the total number of parking spaces in each region of campus. Note that these numbers represent a count of parking spaces existing on campus, and have nothing to do with occupancy.


Problem 2.1

Based on these scatter plots, which pair of variables has a larger correlation coefficient?

Answer: B Spaces and Total Spaces


Difficulty: ⭐️⭐️

The average score on this problem was 86%.


Problem 2.2

Which of the following variables would most likely be negatively associated with the total number of spaces in a campus region?

Answer: Number of athletic fields in the campus region


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.


Problem 2.3

The number of S spaces and the total number of parking spaces in each campus region are linearly related with a correlation coefficient of 0.6. The number of S spaces in the Warren College region of campus is 5 standard deviations below average. What does the regression line predict for the number of total spaces in Warren College, measured in standard units?

Answer: -3


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 64%.


Problem 2.4

True or False: Since number of parking spaces is always a positive integer, the regression line that predicts the total number of spaces from the number of S spaces must have a positive y-intercept.

Answer: False


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 64%.



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.