# Spring 2024 Quiz 3

This quiz was administered in-person. It was closed-book and closed-note; students were not allowed to use the DSC 10 Reference Sheet. Students had 20 minutes to work on the quiz.

This quiz covered Lectures 13, 15, and 16 of the Spring 2024 offering of DSC 10.

## Problem 1

Which of the following statements are true in general? Select all that apply.

• Parameters are fixed, but statistics can change depending on the sample.

• Parameters and statistics can both fluctuate depending on the sample.

• For simple random samples, statistics give better estimates of parameters when the sample size is larger.

• The distribution of a statistic is the same regardless of the sample size.

• None of the above.

Options 1 and 3

##### Difficulty: ⭐️

The average score on this problem was 90%.

## Problem 2

The DataFrame restaurants contains information about a sample of restaurants in San Diego County. We have each restaurant’s "name" (str), "rating" (int), average "meal_price" (float), and type of "cuisine" (str), such as "Thai" or "Italian".

### Problem 2.1

You are interested in estimating the average "meal_price" across all Italian restaurants in San Diego County using only the data in restaurants. Fill in the following code so that italian_means evaluates to an array of 1000 bootstrapped estimates for this parameter.

    def bootstrap_means(data, n_samples):
means = np.array([])
for i in range(n_samples):
resample = data.sample(__(a)__, replace = __(b)__)
means = np.append(means, __(c)__)
return means

italian_restaurants = __(d)__
italian_means = bootstrap_means(italian_restaurants, __(e)__)

(a): data.shape[0]
(b): True
(c): resample.get("meal price").mean()
(d): restaurants[restaurants.get("cuisine") == "Italian"]
(e): 1000

##### Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.

### Problem 2.2

Next, fill in the blanks below so that italian_CI evaluates to an 88% bootstrapped confidence interval for the average "meal_price" across all Italian restaurants in San Diego County.

    lower_bound = np.percentile(italian_means, __(a)__)
upper_bound = np.percentile(italian_means, __(b)__)
italian_CI = [lower_bound, upper_bound]

(a): 6
(b): 94

##### Difficulty: ⭐️⭐️

The average score on this problem was 83%.

### Problem 2.3

Suppose italian_CI evaluates to [25, 35]. Which of the following statements are correct? Select all that apply.

• If we randomly selected 1000 Italian restaurants from the population of Italian restaurants in San Diego County, about 880 of them will have an average "meal_price"  between $25 and$35.

• There is an 88% chance that the average "meal_price" of Italian restaurants in San Diego County falls between $25 and$35.

• 88% of all Italian restaurants have an average "meal_price" between $25 and$35.

• None of the above.

Option 4: None of the above.

##### Difficulty: ⭐️⭐️⭐️

The average score on this problem was 64%.

## Problem 3

Which of the following can be used to generate a simple random sample of "rating"s from 10 restaurants in restaurants? Select all that apply.

Option 1:

    sample = restaurants.take(np.arange(10)).get("rating")

Option 2:

    sample = restaurants.sample(10, replace = False).get("rating")

Option 3:

    sample = restaurants.sample(10, replace = True).get("rating")

Option 4:

    positions = np.random.choice(np.arange(0, restaurants.shape[0]),
10, replace = False)
sample = restaurants.take(positions).get("rating")

Option 5:

    positions = np.random.choice(np.arange(0, restaurants.shape[0]),
10, replace = True)
sample = restaurants.take(positions).get("rating")
• Option 1

• Option 2

• Option 3

• Option 4

• Option 5

Options 2 and 4

##### Difficulty: ⭐️⭐️⭐️

The average score on this problem was 65%.