Spring 2026 Midterm Exam

← return to practice.dsc10.com


Instructor(s): Janine Tiefenbruck

This exam was administered in-person. Students were allowed one page of double-sided handwritten notes. No calculators were allowed. Students had 50 minutes to take this exam.


Note (groupby / pandas 2.0): Pandas 2.0+ no longer silently drops columns that can’t be aggregated after a groupby, so code written for older pandas may behave differently or raise errors. In these practice materials we use .get() to select the column(s) we want after .groupby(...).mean() (or other aggregations) so that our solutions run on current pandas. On real exams you will not be penalized for omitting .get() when the old behavior would have produced the same answer.


A college mascot is a costumed character that represents the school at athletic competitions and other major campus events. For example, you might see UCSD’s mascot, King Triton, at basketball games and the upcoming Sun God Festival.

College students can apply to serve as the mascot for their school, which is quite competitive! Applicants usually need to fall within a certain height range to fit the costume. Throughout this exam, we will format heights in feet and inches, for example "5ft 7in" or "6ft", omitting inches when the height is a whole number of feet, as in the second example. Recall that there are 12 inches in one foot.

The DataFrame mascots has one row for every college or university in the US with a named mascot. The columns are:

The first few rows of mascots are shown below, though mascots has many more rows than pictured.

Throughout this exam, we will refer to mascots repeatedly. Assume that we have already run import babypandas as bpd and import numpy as np.


Problem 1

Recall that each value in the "height" column of mascots is a string with two heights separated by " - ". Heights are formatted like "5ft 6in" or "6ft".

In this problem, you will need to fill in the blanks in the provided code to add three new columns to mascots:

After the blanks have been correctly filled in, the first five rows of mascots should appear as follows.


Problem 1.1

To start, complete the implementation of the function inches which takes as input a single height and returns that height in inches, as an int. Example inputs and outputs are given below.

>>> inches("6ft 11in")
83 

>>> inches("5ft")
60

def inches(h):
    parts = h.split(" ")
    output = __(a)__
    if len(parts) == 2:
        output = __(b)__
    return output

(a): int(parts[0].strip("ft"))*12

(b): output + int(parts[1].strip("in"))


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 57%.


Problem 1.2

Fill in the blanks below so that the code adds the three required columns to mascots. Feel free to use the inches function defined in part (a).

def min_height(s):
    return __(c)__

def max_height(s):
    return __(d)__

mascots = mascots.assign(min = mascots.get("height").apply(min_height))
mascots = mascots.assign(max = mascots.get("height").apply(max_height))
mascots = mascots.assign(range = __(e)__)

(c): inches(s.split(" - ")[0])

(d): inches(s.split(" - ")[1])

(e): mascots.get("max") - mascots.get("min")


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 61%.



Problem 2

Which of the following expressions evaluate to the name of the school with the widest height range for its mascot costume (or any such school, in the case of a tie)? Select all that apply.

Hint: .groupby orders rows in ascending order of the values in the index.

Answer: Options 2 and 3.

We want the name of the school with the largest "range" value.

  • Option 1: mascots.set_index("school").get("range").max() returns the largest value in the "range" column (an integer), not the school’s name.
  • Option 2: Sorting by "range" puts rows in ascending order of range, so the last row in the index (.index[-1]) is the school with the largest range.
  • Option 3: .groupby("range") orders rows by range in ascending order. After .max(), the last row of the resulting DataFrame corresponds to the largest range group, and .get("school").iloc[-1] returns that school’s name.
  • Option 4: .groupby("school") orders rows alphabetically by school name, so .index[-1] returns the alphabetically last school, which has no relation to the range.

Difficulty: ⭐️⭐️⭐️

The average score on this problem was 71%.


Problem 3

Sofia defines the variables below. Note that hundred is a DataFrame containing the first 100 rows of mascots.

hundred = mascots.take(np.arange(100))
has_state = hundred.get("school").str.contains("State")
is_animal = hundred.get("type") == "animal"


Problem 3.1

What is the data type of is_animal?

Answer: Series


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 48%.


Problem 3.2

Sofia randomly selects one row from hundred. Which expression evaluates to the conditional probability that she selects a school with an "animal" mascot, given that she selects a school with "State" in its name?

Answer: Option 3


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 70%.


Problem 3.3

The distribution of mascot "type" in hundred is shown in the table below.

"human" "animal" "other"
40 50 10

Now, Sofia randomly selects two rows from hundred, independently and with replacement. What is the probability that Sofia selects at least one school with a "human" mascot? Give your answer as a mathematical expression, which you do not need to simplify.

Answer:

1 - \left(\frac{60}{100}\right)^2 = 1 - \left(\frac{3}{5}\right)^2 = 1 - \frac{9}{25} = \frac{16}{25} = 0.64


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 57%.



Problem 4

Ella creates two histograms as follows.

mascots.plot(kind="hist", y="max", density=True, 
             bins=np.arange(48, 85, 8))
mascots.plot(kind="hist", y="max", density=True, 
             bins=np.arange(48, 85, 4))


Problem 4.1

How many bars are in Histogram A? Give your answer as an integer.

Answer: 4

np.arange(48, 85, 8) produces the bin edges [48, 56, 64, 72, 80]. With 5 edges, we get 5 - 1 = 4 bins, so Histogram A has 4 bars. Note that 85 is not included because np.arange stops before its endpoint, and the next value 88 would exceed it.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 63%.


Problem 4.2

What is the largest value that can be displayed in Histogram B? Give your answer as an integer.

Answer: 84

np.arange(48, 85, 4) produces the bin edges [48, 52, 56, 60, 64, 68, 72, 76, 80, 84]. The largest edge is 84, which is the right edge of the rightmost bin and therefore the largest value Histogram B can display.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 71%.


Problem 4.3

Which of the following statements must be true? Select all that apply.

Answer: Option 1

  • Option 1 (TRUE): Both are density histograms, so the total area under each equals 1, regardless of the bin width chosen.
  • Option 2 (FALSE): A bar’s height represents the density across its entire bin, not just the interval [70, 71]. In Histogram A, the bin [64, 72) contains [70, 71], and there could still be mascots in [64, 70) or [71, 72) contributing to that bar. The same logic applies to the bin [68, 72) in Histogram B.
  • Option 3 (FALSE): Bars that “cover any part of” [65, 70] extend beyond that interval (e.g., the bin [64, 72) in Histogram A also covers [64, 65) and [70, 72)). Adding the full areas of these bars overcounts.
  • Option 4 (FALSE): Changing bin widths changes the heights. A wide bin in Histogram A may merge two narrow bins from Histogram B, so the tallest bar heights are not guaranteed to match.

Difficulty: ⭐️⭐️

The average score on this problem was 78%.



Problem 5

Define two DataFrames single and double as follows.

single = mascots.groupby("range").count()
double = mascots.groupby(["min", "max"]).count()


Problem 5.1

Which statement below evaluates to True?

Answer: Option 1


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.


Problem 5.2

Which of the following statements are true? Select all that apply.

Answer: Options 2 and 3


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.


Problem 5.3

Which statement below evaluates to True?

Answer: Option 2


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 68%.



Problem 6

Suppose Pranav randomly selects 4 different schools from among those included in mascots. Fill in the blanks below to implement a simulation that estimates the probability that at least two of the schools Pranav selected have a mascot of type "animal".

mascot_types = np.array(mascots.get("type"))

repetitions = 10000
count_event = 0

for i in np.arange(repetitions):
    sample = np.random.choice(__(a)__)
    
    num_animals = np.count_nonzero(__(b)__)
    
    if __(c)__:
        count_event += 1

prob = __(d)__


Problem 6.1

What goes in blank (a)?

Answer: Option 4


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 73%.


Problem 6.2

What goes in blank (b)?

Answer: sample == "animal"


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 53%.


Problem 6.3

What goes in blank (c)?

Answer: num_animals >= 2


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 66%.


Problem 6.4

What goes in blank (d)?

Answer: Option 2


Difficulty: ⭐️⭐️

The average score on this problem was 79%.



Problem 7

The bar chart below was produced from the data in mascots. It shows how many schools use each color.


Problem 7.1

If you knew the exact heights of each bar, how could you determine the total number of schools in mascots?

Answer: Option 3

Each school can have multiple colors, and any school with more than one color is counted in multiple bars. For example, a school with school colors blue and gold contributes 1 to both the blue bar and the gold bar. This means summing the bar heights overcounts schools, and we have no way to know how many schools were double- or triple-counted just from the bar chart, so we cannot recover the total number of schools.


Difficulty: ⭐️⭐️

The average score on this problem was 87%.


Problem 7.2

In the bar chart, the height of the blue bar is 2100 and the height of the gold bar is 1500. Use this information and the first five rows of mascots (provided with the data description) to determine the maximum possible number of schools in mascots whose school colors are blue and gold only. Give your answer as an integer.

Answer: 1498

A school whose colors are “blue and gold only” must appear in both the blue bar (height 2100) and the gold bar (height 1500), so the answer is at most \min(2100, 1500) = 1500.

To maximize the count, we want as many of the 1500 gold-using schools as possible to also use blue, and to use no other color. However, the first five rows of mascots show specific schools that use gold along with a color other than blue (or use gold without blue at all). Each such school is one of the 1500 gold schools but cannot be counted in our group.

From the first five rows, exactly 2 schools using gold are disqualified in this way, so the maximum possible number of “blue and gold only” schools is 1500 - 2 = 1498.


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 44%.


Problem 7.3

Write one line of code that uses the mascots DataFrame to calculate the sum of the heights of all bars in the bar chart above.

Answer: mascots.get("colors").apply(len).sum()

The sum of the bar heights counts each school once for every color it has (a school with 3 colors contributes 1 to each of 3 bars, for a total of 3). So the total bar height equals the total number of (school, color) pairs across the dataset.

The "colors" column contains a list for each school, and the length of that list is the number of colors the school has. Applying len to each list and then summing gives exactly the total count of (school, color) pairs, which equals the sum of bar heights.


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 31%.



Problem 8

Austin and Ray are contestants on Season One of America’s Next Top Mascot, a new reality TV show. In each episode, contestants will compete for one of the mascot roles in the DataFrame below, called antm. The data includes the name of each mascot and the minimum and maximum heights, in inches, for the costume.


Problem 8.1

The America’s Next Top Mascot costume director wants to identify every pair of costumes where one costume’s maximum height exactly equals the other costume’s minimum height. She calls these “single-height costume pairs" because there is only a single height that can wear both costumes.

Write one line of code that uses .merge to calculate the number of single-height costume pairs in antm. Your answer should be a Python expression that evaluates to an integer.

Answer: antm.merge(antm, left_on="min", right_on="max").shape[0]

A “single-height costume pair” is a pair of costumes where one costume’s "max" equals the other costume’s "min". To find these, we merge antm with itself, matching the "min" column of the left copy to the "max" column of the right copy. Each row in the merged DataFrame corresponds to exactly one such pair, so .shape[0] gives the count.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 67%.


Problem 8.2

What is the number of single-height costume pairs? In other words, what number does a correct answer to part (a) evaluate to? Give your answer as an integer.

Answer: 16

This is the value that the expression in part (a) evaluates to when run on antm. Going through the "min" and "max" columns of antm and counting how many (costume A, costume B) pairs satisfy A.min == B.max, we find 16 such pairs. Equivalently, for each "max" value in the table, count how many costumes have that value as their "min", then sum those counts across all rows; the total is 16.


Difficulty: ⭐️⭐️⭐️

The average score on this problem was 54%.


Problem 8.3

The last episode of the season will reveal the mascot roles to be featured next season. Next season’s mascots are stored in a DataFrame called antm2 which has the same column names as antm, but contains different data (i.e. a different set of mascots).

We don’t have all of antm2 available, but do know some information about it. First, we know that antm2.merge(antm2, on="min") gives a DataFrame with 61 rows.

Additionally, we have the table at right, which shows the number of mascots in antm2 with each minimum height requirement, except one value is missing.How many mascots in antm2 have a minimum height requirement of 66 inches? Give your answer as an integer.

Answer: 7

When we merge a DataFrame with itself on "min", every row pairs up with every other row that shares the same "min" value (including itself). So if a particular "min" value appears n times in antm2, it contributes n \times n = n^2 rows to the merged DataFrame.

The total number of rows is therefore the sum of squares of the counts in the table:

n_1^2 + n_2^2 + \dots + x^2 = 61

where x is the missing count for "min" = 66. Plugging in the known counts from the table, the squares of the visible values sum to 12, leaving:

x^2 = 61 - 12 = 49 \implies x = 7

So 7 mascots in antm2 have a minimum height requirement of 66 inches.


Difficulty: ⭐️⭐️⭐️⭐️

The average score on this problem was 32%.



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.