Spring 2026 Midterm Exam

This exam was administered in-person. Students were allowed one page of double-sided handwritten notes. No calculators were allowed. Students had 50 minutes to take this exam.

Note (groupby / pandas 2.0): Pandas 2.0+ no longer silently drops columns that can’t be aggregated after a groupby, so code written for older pandas may behave differently or raise errors. In these practice materials we use .get() to select the column(s) we want after .groupby(...).mean() (or other aggregations) so that our solutions run on current pandas. On real exams you will not be penalized for omitting .get() when the old behavior would have produced the same answer.

A college mascot is a costumed character that represents the school at athletic competitions and other major campus events. For example, you might see UCSD’s mascot, King Triton, at basketball games and the upcoming Sun God Festival.

College students can apply to serve as the mascot for their school, which is quite competitive! Applicants usually need to fall within a certain height range to fit the costume. Throughout this exam, we will format heights in feet and inches, for example "5ft 7in" or "6ft", omitting inches when the height is a whole number of feet, as in the second example. Recall that there are 12 inches in one foot.

The DataFrame mascots has one row for every college or university in the US with a named mascot. The columns are:

The first few rows of mascots are shown below, though mascots has many more rows than pictured.

Throughout this exam, we will refer to mascots repeatedly. Assume that we have already run import babypandas as bpd and import numpy as np.

Problem 1

Recall that each value in the "height" column of mascots is a string with two heights separated by " - ". Heights are formatted like "5ft 6in" or "6ft".

In this problem, you will need to fill in the blanks in the provided code to add three new columns to mascots:

After the blanks have been correctly filled in, the first five rows of mascots should appear as follows.

Problem 1.1

To start, complete the implementation of the function inches which takes as input a single height and returns that height in inches, as an int. Example inputs and outputs are given below.

Answer (a): int(parts[0].strip("ft"))*12

Answer (b): output + int(parts[1].strip("in"))

After parts = h.split(" "), the first piece is always the feet part (for example "6ft" or "5ft").

Blank (a): Strip "ft" from parts[0] and convert to an integer, then multiply by 12 to get inches from feet. For "6ft", this gives 6 \times 12 = 72.
Blank (b): If there is a second piece (like "11in"), strip "in", convert to an integer, and add it to output. If there is only one piece (like "5ft"), we skip this step and return just the feet converted to inches.

For example, inches("6ft 11in") gives 72 + 11 = 83, and inches("5ft") gives 60.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 57%.

Problem 1.2

Fill in the blanks below so that the code adds the three required columns to mascots. Feel free to use the inches function defined in part (a).

Answer (c): inches(s.split(" - ")[0])

Answer (d): inches(s.split(" - ")[1])

Answer (e): mascots.get("max") - mascots.get("min")

Each value in "height" looks like "5ft 6in - 6ft 2in". Splitting on " - " separates the smaller and larger costume heights.

Blank (c): The first piece is the minimum height; apply inches to convert it to inches.
Blank (d): The second piece is the maximum height; apply inches to convert it to inches.
Blank (e): After .assign creates "min" and "max" columns, subtract the two Series element-wise to get "range" for each row.

The .apply(min_height) and .apply(max_height) calls run these functions on every height string in the "height" column.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 61%.

Problem 2

Which of the following expressions evaluate to the name of the school with the widest height range for its mascot costume (or any such school, in the case of a tie)? Select all that apply.

Answer: Options 2 and 3.

We want the name of the school with the largest "range" value.

Option 1: mascots.set_index("school").get("range").max() returns the largest value in the "range" column (an integer), not the school’s name.
Option 2: Sorting by "range" puts rows in ascending order of range, so the last row in the index (.index[-1]) is the school with the largest range.
Option 3: .groupby("range") orders rows by range in ascending order. After .max(), the last row of the resulting DataFrame corresponds to the largest range group, and .get("school").iloc[-1] returns that school’s name.
Option 4: .groupby("school") orders rows alphabetically by school name, so .index[-1] returns the alphabetically last school, which has no relation to the range.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 71%.

Problem 3

Sofia defines the variables below. Note that hundred is a DataFrame containing the first 100 rows of mascots.

Problem 3.1

Answer: Series

is_animal is defined as hundred.get("type") == "animal". Comparing a Series to a string with == checks each row and returns a boolean Series with one True/False value per row. It is not a single boolean, string, array, or DataFrame.

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 48%.

Problem 3.2

Sofia randomly selects one row from hundred. Which expression evaluates to the conditional probability that she selects a school with an "animal" mascot, given that she selects a school with "State" in its name?

Answer: Option 3 — (has_state & is_animal).sum() / has_state.sum()

We want P(\text{animal} \mid \text{State}), the probability of an animal mascot given that the school name contains "State".

By the definition of conditional probability, P(\text{animal} \mid \text{State}) = \frac{P(\text{animal and State})}{P(\text{State})}.

The numerator counts rows where both conditions are true: (has_state & is_animal).sum().
The denominator counts rows where has_state is true: has_state.sum().

The other options either divide by 100 (the full sample size, not the given condition) or use the wrong numerator or denominator.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 70%.

Problem 3.3

`"human"`	`"animal"`	`"other"`
40	50	10

Now, Sofia randomly selects two rows from hundred, independently and with replacement. What is the probability that Sofia selects at least one school with a "human" mascot? Give your answer as a mathematical expression, which you do not need to simplify.

Answer:

1 - \left(\frac{60}{100}\right)^2

In hundred, 40 out of 100 schools have a "human" mascot, so 60 out of 100 do not have a human mascot.

Sofia selects two rows with replacement, independently. It is easier to find the probability of the complement: neither pick is human.

On one pick, P(\text{not human}) = \frac{60}{100}.
With independent picks, P(\text{neither human}) = \left(\frac{60}{100}\right)^2.

So P(\text{at least one human}) = 1 - \left(\frac{60}{100}\right)^2.

You can simplify if you want: 1 - \left(\frac{3}{5}\right)^2 = 1 - \frac{9}{25} = \frac{16}{25} = 0.64.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 57%.

Problem 4

Problem 4.1

Answer: 4

np.arange(48, 85, 8) produces the bin edges [48, 56, 64, 72, 80]. With 5 edges, we get 5 - 1 = 4 bins, so Histogram A has 4 bars. Note that 85 is not included because np.arange stops before its endpoint, and the next value 88 would exceed it.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 63%.

Problem 4.2

What is the largest value that can be displayed in Histogram B? Give your answer as an integer.

Answer: 84

np.arange(48, 85, 4) produces the bin edges [48, 52, 56, 60, 64, 68, 72, 76, 80, 84]. The largest edge is 84, which is the right edge of the rightmost bin and therefore the largest value Histogram B can display.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 71%.

Problem 4.3

Answer: Option 1

Option 1 (TRUE): Both are density histograms, so the total area under each equals 1, regardless of the bin width chosen.
Option 2 (FALSE): A bar’s height represents the density across its entire bin, not just the interval [70, 71]. In Histogram A, the bin [64, 72) contains [70, 71], and there could still be mascots in [64, 70) or [71, 72) contributing to that bar. The same logic applies to the bin [68, 72) in Histogram B.
Option 3 (FALSE): Bars that “cover any part of” [65, 70] extend beyond that interval (e.g., the bin [64, 72) in Histogram A also covers [64, 65) and [70, 72)). Adding the full areas of these bars overcounts.
Option 4 (FALSE): Changing bin widths changes the heights. A wide bin in Histogram A may merge two narrow bins from Histogram B, so the tallest bar heights are not guaranteed to match.

Difficulty: ⭐️⭐️

The average score on this problem was 78%.

Problem 5

Problem 5.1

Answer: Option 1

single groups by "range" alone, so its number of rows equals the number of distinct range values. double groups by the pair ("min", "max"), which is a finer grouping — two schools can have the same range but different (min, max) pairs (e.g., heights 60–66 and 62–68 both have a range of 6, but different min/max values). So double will always have at least as many rows as single, and typically more, meaning single.shape[0] < double.shape[0].

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.

Problem 5.2

Answer: Options 2 and 3

Option 1 (FALSE): Two schools grouped together in single share the same "range". But they could have different (min, max) pairs (same difference, different actual heights), so they need not be in the same group in double.
Option 2 (TRUE): Two schools grouped together in double share the same (min, max) pair. Since range = max − min, they must also share the same range, so they are guaranteed to be in the same group in single.
Option 3 (TRUE): If two schools are in different groups in single, they have different ranges. Since range = max − min, they cannot share the same (min, max) pair, so they must also be in different groups in double.
Option 4 (FALSE): Two schools in different groups in double have different (min, max) pairs, but could still have the same range (e.g., 60–66 and 62–68 are different pairs but both have range 6). So they may end up in the same group in single.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.

Problem 5.3

Answer: Option 2

Grouping by "height" groups by the original string (e.g., "5ft 6in - 6ft 2in"). Each unique height string corresponds to exactly one unique (min, max) pair, and vice versa — every unique (min, max) pair maps back to exactly one height string. So the two groupbys produce the same number of groups, and the two shapes are equal.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.

Problem 6

Suppose Pranav randomly selects 4 different schools from among those included in mascots. Fill in the blanks below to implement a simulation that estimates the probability that at least two of the schools Pranav selected have a mascot of type "animal".

Problem 6.1

Answer: Option 4 — mascot_types, 4, replace=False

Pranav selects 4 different schools, so each simulation draw should be a sample of size 4 without replacement from the array of mascot types.

np.random.choice(mascot_types, 4, replace=False) draws 4 types from mascot_types without putting them back.
replace=True would allow the same school/type to be chosen more than once, which does not match “4 different schools.”
Using repetitions as the sample size would draw 10{,}000 values per trial, not 4.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.

Problem 6.2

Answer: sample == "animal"

sample is an array of 4 mascot types from one simulated selection. The expression sample == "animal" compares each entry to "animal" and returns a boolean array. np.count_nonzero then counts how many of the 4 schools in this sample have an animal mascot.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 53%.

Problem 6.3

Answer: num_animals >= 2

The simulation counts how often the event “at least two of the four selected schools have an animal mascot” happens. After counting animals in the sample with num_animals, we check whether that count is 2 or more.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 66%.

Problem 6.4

Answer: Option 2 — count_event / repetitions

After 10{,}000 repetitions, count_event is the number of simulations where at least two animal mascots were selected. Dividing by repetitions gives the simulated proportion: \hat{p} = \frac{\text{number of successes}}{10{,}000}, which estimates the probability of interest.

count_event alone is a count, not a probability.
count_event / 4 would divide by the sample size within one trial, not the number of simulations.
np.mean(count_event) does not apply here because count_event is a single integer, not an array of outcomes.

Difficulty: ⭐️⭐️

The average score on this problem was 79%.

Problem 7

The bar chart below was produced from the data in mascots. It shows how many schools use each color.

Problem 7.1

If you knew the exact heights of each bar, how could you determine the total number of schools in mascots?

Answer: Option 3

Each school can have multiple colors, and any school with more than one color is counted in multiple bars. For example, a school with school colors blue and gold contributes 1 to both the blue bar and the gold bar. This means summing the bar heights overcounts schools, and we have no way to know how many schools were double- or triple-counted just from the bar chart, so we cannot recover the total number of schools.

Difficulty: ⭐️⭐️

The average score on this problem was 87%.

Problem 7.2

In the bar chart, the height of the blue bar is 2100 and the height of the gold bar is 1500. Use this information and the first five rows of mascots (provided with the data description) to determine the maximum possible number of schools in mascots whose school colors are blue and gold only. Give your answer as an integer.

Answer: 1498

A school whose colors are “blue and gold only” must appear in both the blue bar (height 2100) and the gold bar (height 1500), so the answer is at most \min(2100, 1500) = 1500.

To maximize the count, we want as many of the 1500 gold-using schools as possible to also use blue, and to use no other color. However, the first five rows of mascots show specific schools that use gold along with a color other than blue (or use gold without blue at all). Each such school is one of the 1500 gold schools but cannot be counted in our group.

From the first five rows, exactly 2 schools using gold are disqualified in this way, so the maximum possible number of “blue and gold only” schools is 1500 - 2 = 1498.

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 44%.

Problem 7.3

Write one line of code that uses the mascots DataFrame to calculate the sum of the heights of all bars in the bar chart above.

Answer: mascots.get("colors").apply(len).sum()

The sum of the bar heights counts each school once for every color it has (a school with 3 colors contributes 1 to each of 3 bars, for a total of 3). So the total bar height equals the total number of (school, color) pairs across the dataset.

The "colors" column contains a list for each school, and the length of that list is the number of colors the school has. Applying len to each list and then summing gives exactly the total count of (school, color) pairs, which equals the sum of bar heights.

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 31%.

Problem 8

Austin and Ray are contestants on Season One of America’s Next Top Mascot, a new reality TV show. In each episode, contestants will compete for one of the mascot roles in the DataFrame below, called antm. The data includes the name of each mascot and the minimum and maximum heights, in inches, for the costume.

Problem 8.1

The America’s Next Top Mascot costume director wants to identify every pair of costumes where one costume’s maximum height exactly equals the other costume’s minimum height. She calls these “single-height costume pairs" because there is only a single height that can wear both costumes.

Write one line of code that uses .merge to calculate the number of single-height costume pairs in antm. Your answer should be a Python expression that evaluates to an integer.

Answer: antm.merge(antm, left_on="min", right_on="max").shape[0]

A “single-height costume pair” is a pair of costumes where one costume’s "max" equals the other costume’s "min". To find these, we merge antm with itself, matching the "min" column of the left copy to the "max" column of the right copy. Each row in the merged DataFrame corresponds to exactly one such pair, so .shape[0] gives the count.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 67%.

Problem 8.2

What is the number of single-height costume pairs? In other words, what number does a correct answer to part (a) evaluate to? Give your answer as an integer.

Answer: 16

This is the value that the expression in part (a) evaluates to when run on antm. Going through the "min" and "max" columns of antm and counting how many (costume A, costume B) pairs satisfy A.min == B.max, we find 16 such pairs. Equivalently, for each "max" value in the table, count how many costumes have that value as their "min", then sum those counts across all rows; the total is 16.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 54%.

Problem 8.3

The last episode of the season will reveal the mascot roles to be featured next season. Next season’s mascots are stored in a DataFrame called antm2 which has the same column names as antm, but contains different data (i.e. a different set of mascots).

We don’t have all of antm2 available, but do know some information about it. First, we know that antm2.merge(antm2, on="min") gives a DataFrame with 61 rows.

Additionally, we have the table at right, which shows the number of mascots in antm2 with each minimum height requirement, except one value is missing.How many mascots in antm2 have a minimum height requirement of 66 inches? Give your answer as an integer.

Answer: 7

When we merge a DataFrame with itself on "min", every row pairs up with every other row that shares the same "min" value (including itself). So if a particular "min" value appears n times in antm2, it contributes n \times n = n^2 rows to the merged DataFrame.

The total number of rows is therefore the sum of squares of the counts in the table:

n_1^2 + n_2^2 + \dots + x^2 = 61

where x is the missing count for "min" = 66. Plugging in the known counts from the table, the squares of the visible values sum to 12, leaving:

x^2 = 61 - 12 = 49 \implies x = 7

So 7 mascots in antm2 have a minimum height requirement of 66 inches.

Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 32%.

Problem 1

Problem 1.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 1.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 3

Problem 3.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️

Problem 3.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 3.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 4

Problem 4.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 4.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 4.3

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 5

Problem 5.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 5.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 5.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 6

Problem 6.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 6.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 6.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 6.4

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 7

Problem 7.1

Click to view the solution.

Difficulty: ⭐️⭐️

Problem 7.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️

Problem 7.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️

Problem 8

Problem 8.1

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 8.2

Click to view the solution.

Difficulty: ⭐️⭐️⭐️

Problem 8.3

Click to view the solution.

Difficulty: ⭐️⭐️⭐️⭐️

👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.