Spring 2025 Final Exam

← return to practice.dsc10.com


Instructor(s): Janine Tiefenbruck

This exam was administered in-person. Students were allowed one page of double-sided handwritten notes. No calculators were allowed. Students had 3 hours to take this exam.


The Hunger Games is a young adult dystopian fictional novel. The events take place in the future in the fictional country of Panem, which consists of 12 impoverished districts and a wealthy metropolitan area called the Capitol.

The plot centers around an annual televised competition called the Hunger Games, in which children from the districts are forced to compete in a battle to the death. The participants, called tributes, are randomly selected via a lottery system.

The competition takes place in an arena that is specially designed for this purpose, and usually lasts several days or weeks, until only one tribute survives. The entire event is turned into a spectacle, which is broadcast throughout Panem as a way for the oppressive government to remind district residents of their powerlessness.

The main character and protagonist of The Hunger Games is Katniss Everdeen, a 16-year old girl from District 12 who competes in the 74th annual Hunger Games. After a life of poverty and near starvation, her experience in the Hunger Games arena further fuels her hatred of the government and lights a fire in her to fight back against the oppression.

Notes:


Problem 1

In an annual ceremony known as the reaping, tributes are selected to represent their district in the Hunger Games. One male and one female tribute from each district are randomly selected via a lottery drawing.

Every child between the ages of 12 and 18 (inclusive) has tickets entered into the drawing for their sex and district (e.g. girls from District 12). The number of tickets entered is dependent on age.

Starting at age 12, each child receives one ticket in the lottery. For each year after that, they receive one additional ticket, added to the total from the previous year. For example, 13-year-olds have two tickets, 14-year-olds have three tickets, and so on.

In this problem, we will consider only tickets corresponding to girls from District 12, and look at the distribution of these tickets according to the age of the person they represent. A density histogram for these tickets is shown below.


Problem 1.1

Which of the following statements about this distribution is correct?

Answer: The mean is less than the median

The histogram shows us that most of the tickets are for older girls i.e. girls that are of ages 17 to 19. It also shows us that there are fewer tickets for the younger girls. When most of the values are larger, the median is larger than the mean because the small values tend to pull the mean down.


The histogram from the previous page is repeated below for your reference.


Problem 1.2

Suppose the rules of the Hunger Games were changed to eliminate 18-year-olds. If we plotted a new density histogram of the distribution of ages for tickets corresponding to girls from District 12 aged 12 to 17, how would the height of the [13, 14) bar change?

Let h be the height of this bar in the original histogram. Give its height in the new histogram in terms of h.

Answer: 4/3 * h

The total area under a density histogram is 1. Using this bit of information, when the 18 year olds are removed we have to scale the the remaining bars so that the area of our density histogram is still 1. Notice the height of the exempt bar is 0.25 meaning that the remaining data is exactly 3/4 th of the original data. To rescale we need to divide the height h of each bar by 3/4 or multiply h by 4/3.



Problem 1.3

What is the most common age among girls from District 12 aged 12 to 18? Remember, the distribution above is for all tickets, and older girls have more tickets.

Answer: 15

For this problem we need to calculate which bar has the most amount of girls within the bar keeping in mind that each age group gets a different amount of tickets. To do this we can calculate the proportion of girls in each bar relative to the total ticket distribution by dividing the height of each bar by the number of tickets allocated to that bar. The bar for age 15 comes out to the largest with a value of 0.045, and therefore that is our solution.



Problem 2

As we saw in the last problem, children aged 12 to 18 (inclusive) have tickets entered into a drawing at the reaping. 12-year-olds have one ticket, 13-year-olds have two tickets, 14-year-olds have three tickets, and so on, gaining one ticket per year of age.

In this problem, we’ll look at the ages of all boys from District 3 and determine the probability that a boy of a certain age is selected in the drawing.


Problem 2.1

Suppose that there are only five boys from District 3 and their ages are as follows (in no particular order):

17, 12, 15, 14, 12.

Determine the probability that a 17-year-old is selected in the drawing.

Give your answer as an unsimplified fraction where the numerator is the number of tickets corresponding to a 17-year-old and the denominator is the total number of tickets.

Answer: 6/15

A short cut we can use to check how many tickets each kid gets is just taking their age and subtracting 11. If we do this for all of the listed kids given their ages, we get a total of 15 tickets. Furthermore, using our shortcut, the number of tickets allocated to the 17 year old would be 6. Thus, the unsimplified fraction is 6/15.


Problem 2.2

Now, we’ll solve the problem more generally. Fill in the blanks below to define a function pick_prob that takes as input an array containing the ages of all boys in District 3, and a single age between 12 and 18 (inclusive). The function should return the probability of randomly selecting a boy of that age during the reaping.

def pick_prob(ages, one_age):
    age_tickets = __(a)__
    total_tickets = __(b)__
    return age_tickets / total_tickets

Answer:

(a): sum((ages == one_age) * (one_age - 11))

(b): (ages - 11).sum()

In part a we find how many tickets come from boys of the given age by taking the sum of all matches within the input and multiplied by the corresponding ticket value for that age.

In part b, to find the total number of tickets, we take all values in the input and calculate the corresponding number of tickets then add them all up.


Problem 2.3

Using pick_prob, write one line of code that evaluates to the probability that a 14-year-old boy is not chosen during the reaping if the boys in District 3 are aged 12, 14, 14, 15, 17, and 18.

Answer: 1 - pick_prob(np.array([12, 14, 14, 15, 17, 18]), 14)

To solve this problem we can simply take the complement of the probability that we do select a 14-year-old boy during the reaping. With the given ages of the boys in the problem statement, we can use the function we defined above to calculate this. As a result, to arrive at our answer we can simply take 1 - pick_prob(np.array([12, 14, 14, 15, 17, 18]), 14).



Problem 3

So far, we have seen one way that children have tickets entered into the reaping: they receive one ticket when they are 12 years old, and then each year thereafter, an additional ticket is added onto the previous year’s total. This means 13-year-olds have two tickets, 14-year-olds have three tickets, and so on. We’ll call these tickets age tickets.

In this problem only, we’ll consider another way that a child may choose to enter tickets into the reaping in addition to the mandatory age tickets. If a child wishes, they can guarantee food rations for their family members, including themselves, at the price of one ticket per person. We’ll call these tickets food tickets. Like age tickets, food tickets are compounded each year, adding onto last year’s total.

As an example, let’s calculate the number of tickets that Katniss Everdeen has entered into the drawing at the reaping. Katniss is 16 years old, and every year, she has bought food for 3 family members (herself, her mother, and her sister Prim). This means:

This pattern continues, and by the time Katniss is 16, she has 20 tickets.

In other words, Katniss had 4 tickets entered when she was 12 years, and 4 more with each passing year. The array np.arange(4, 24, 4) contains the number of tickets Katniss entered each year, starting at age 12, up to and including her current age of 16 years old.


Problem 3.1

Fill in the blanks below to define the function tix_array which takes in a child’s current age between 12 and 18 (inclusive) and a number of family members, k. The function returns an array similar Katniss’s array above, representing the number of tickets they entered into the reaping each year since they were 12 years old, assuming that they buy food for their whole family every year.

Tip: tix_array(16, 3) should be the same as the array np.arange(4, 24, 4).

def tix_array(age, k):
    return np.arange(__(a)__, __(b)__, __(c)__)

Answer:

(a): k + 1

(b): (ages - 10) * (k + 1)

(c): k + 1

At age 12 each kid gets 1 ticket for their age and k food tickets; therefore, the starting value of our list must be must be at k + 1 because that’s how many tickets they get when they first enter. Next, we want the list to go up to the kid’s current age. This means we can stop at (age - 10) * (k + 1) to cover all of the years beginning from age 12 all the way till their current age. Finally, each year the ki adds the same number of tickets one for their increase in age and another k for food. Thus, the step is k + 1.


Problem 3.2

The DataFrame reaping contains information on the children of District 12 between the ages of 12 and 18. For each child, we have their "name", "age", "family_size" which includes themselves, and a boolean variable "buying_food". A value of True means the child always buys food for their entire family, and False means the child never buys food for anyone. The first few rows of reaping are shown below, but there are many more rows than pictured.

Fill in the blanks in the code below to add a new column, "tickets", to reaping that contains the number of tickets that the child will have entered into the drawing in the current year.

Hint: In Python, True is treated as 1 and False is treated as 0 when doing arithmetic!

tickets_per_year = __(d)__ * __(e)__ + 1
current_tickets = tickets_per_year * (__(f)__)
reaping = reaping.assign(tickets = current_tickets)

Answer:

(d): reaping.get("buying_food")

(e): reaping.get("family_size")

(f): reaping.get("age") - 11

We use the "buying_food" column to extract whether a child chose to get extra food rations. This value consists of bool values, and because of the way booleans are encoded in python we can use this column as part of the math.

We use "family_size" because the child needs one food ticket per each one of their family members. Multiplying it by "buying_food" column from the previous part gives the correct number of food tickets per year which is either the full family size (if they are buying food) or 0 (if they are not buying food).

Finally we can use our shortcut to calculate the number of years a child has been entering tickets, our shortcut being subtracting the age of each child found in the "age" column by 11.


Problem 3.3

For this subpart, assume that the tix_array function was defined correctly in part (a), and that the "tickets" column was added correctly to the reaping DataFrame in part (b). Fill in the blanks in the code below so that the following expression evaluates to True.

reaping.get("tickets").iloc[7] == tix_array(__(g)__, __(h)__)[-1]

Answer:

(g): reaping.get("age").iloc[7]

(h): reaping.get("family_size").iloc[7]

The left side of the equation looks to access the number of tickets associated with the 8th child. To check this using the tix_array function we need to use the current age of the 8th child and the family size of the 8th child as inputs. We can get these values by extracting them from their respective columns.



Problem 4

After being selected at the reaping, tributes are transported to the Capitol to prepare for the Hunger Games. While they are there, they attend a training camp to practice skills that might be helpful in the arena. At the training camp, there are 8 different stations such as camouflage, knife throwing, archery, plant identification, etc. At each of the 8 stations, tributes are scored on their skills from 1 to 10.

These 8 scores are combined into an overall score as follows:

Overall scores therefore range from 0 to 12. Which of the following functions takes as input an array containing a tribute’s 8 scores from the stations and correctly outputs their overall score? Select all that apply.

Hint: In Python, True + True evaluates to 2.

def function1(stations):
        overall = 0
        for score in stations:
            if score > 5:
                overall = overall + 1
            if score > 8:
                overall = overall + 1
            if overall >= 12:
                return 12
        return overall
def function2(stations):
    overall = 0
    for score in stations:
        if score > 5:
            overall = overall + 1
        elif score > 8:
            overall = overall + 2
    return min(overall, 12)
def function3(stations):
    overall = 0
    for i in np.arange(8):
        if stations[i] > 8:
            overall = overall + 2
        elif stations[i] > 5:
            overall = overall + 1
    return min(overall, 12)
def function4(stations):
    overall = 0
    for score in stations:
        add = score > 5
        add = (score > 8) + add
        overall = overall + add
    return min(overall, 12)
def function5(stations):
    return min(12, np.count_nonzero(stations > 5) + np.count_nonzero(stations > 8))

Answer: Function 1, 3, 4, 5

The best way to go about this problem is individually check each function for correctness.

Function 1 correctly checks for scores > 5 and adds 1 for each valid instance. It then checks again if the same score is > 8 and adds another 1 to the total count. Finally there is a cap set to 12 to use if needed.

Function 2 uses the elif conditional so if score > 5, it skips the score > 8 check. This means that scores > 8 only get the plus 1 rather than the plus 2. Thus, this function is incorrect.

Function 3 iterates through all stations if the score > 8 it adds 2 otherwise (using elif), if the score is > 5, adds 1. This is capped at 12 using the min function. Thus, this is a correct implementation.

Function 4 is correct because it adds 1 point if the score is greater than 5 and then adds 1 more point if its also greater than 8. This is exactly what we are looking for in the problem statement.

Function 5 provides an elegant one-liner for the same problem using the count_nonzero function to count the number of scores greater than 5 and then adding that the to scores greater than 8. Together this does the same thing as adding 1 for each score between 5 and 8, and adding 2 for scores greater than 8.


Problem 5

The night before the Hunger Games begins, each tribute is interviewed in front of a live audience. During this interview, the host asks each tribute a few personal questions and reveals their overall score from the training camp. These interviews are broadcast across the country, so that the residents of Panem can get to know the tributes better and form opinions about who they want to win.

The Capitol wants to understand public perceptions of the tributes after the interviews for the 74th Hunger Games. They conduct a survey of a sample of residents from all 12 districts, asking them two questions:

  1. “What district do you live in?"

  2. “Who do you think will win this year’s Hunger Games?"

The survey results are in the DataFrame survey, with columns "District" and "Tribute" which contain each person’s answers to the two questions above. The first few rows of survey are shown below.

In this problem, we will try to estimate the proportion of residents from a given district who think a certain tribute will win the Hunger Games.


Problem 5.1

What proportion of residents in District 11 think Peeta will win? Write one line of code that evaluates to this proportion in our sample, based on the data in survey.

Answer: survey[(survey.get("Tribute") == "Peeta") & (survey.get("District") == 11)].shape[0] / survey[survey.get("District") == 11].shape[0]

This question is just a whole lot of querying. For the numerator we want all the people who answered the survey who are from district 11 and votes for Peeta. We can do this by querying on those two conditions and taking the shape. For the denominator we want all the people from district 11 who answered the survey, so we qeury for that in the denominator and take the shape.


Problem 5.2

Next, we want to create a 95% confidence interval for the proportion of all residents from a given district who think a certain tribute will win. Fill in the blanks in the function win_CI below. This function takes the name of a tribute and the number of a district and returns the endpoints of a 95% bootstrapped confidence interval for the proportion of all residents of that district who think that tribute will win, based on the data in survey.

For example win_ci("Peeta", 11) returns the endpoints of a 95% confidence interval for the proportion of all residents from District 11 who think Peeta will win.

def win_ci(tribute, district):
            only_district = survey[survey.get("District") == district]
            props = np.array([])
            for i in np.arange(10000):
                resample = __(a)__
                tribute_count = __(b)__
                boot_prop = tribute_count / __(c)__
                props = np.append(props, boot_prop)
            return [np.percentile(props, 2.5), np.percentile(props, 97.5)]

(a): only_district.sample(only_district.shape[0], replace=True)

For the first blank we have to create a bootstrapped sample from just the rows in the given district. We sample with replacement here as we do when we bootstrap to keep the same number of rows. That being said we use the .sample function with replacement to get our sample from the only_district dataframe containing the rows in the given district. Within our sample we want the number of rows to be the same size as the only_district dataframe. so we set the size argument to be only_district.shape[0].

(b): resample[resample.get("Tribute") == tribute].shape[0]

Now we want to find how many times the given tribute appears in the bootstrapped sample. To do that we query the dataframe for the given tribute and the take the size of our query using .shape[0].

(c): resample.shape[0]

The denominator of our resample is just the total number of people in the resample. That being said to fill this blank all we need to do is use .shape[0] to take the size of the resample.


Problem 5.3

Suppose we were to plot a histogram of props within the function win_CI. Which of the following best describes this histogram?

Answer: The histogram is roughly normal because of the Central Limit Theorem (CLT).

The props histogram shows the ditribution of proportions from a bunch of random resamples. Per the CLT, the distribution of sample stats like proportions will be basically normal, regardless of the shape of the original dataset.


Problem 5.4

Suppose we now compute the following:

win_ci("Katniss", 4)

[0.25, 0.72]

win_ci("Katniss", 12)

[0.50, 0.70]

Which of the following reasons best explains why the second interval is narrower than the first?

Answer: There are more survey participants from District 12 than District 4.

Confidence intervals get narrower when there is an increase in sample size. This is because the variation present in the bootstrapped estimates is smaller. Therefore, we can say there were more survey participants from District 12 than District 4.


Problem 5.5

Suppose we want to redo our survey so that our confidence interval for the proportion of District 12 residents who think Katniss will win has a width of at most 0.10. We will assume that our new sample’s standard deviation will be the same as our original sample’s standard deviation. Which of the following best describes how to achieve this?

Answer: Our new sample should have four times as many people overall. It doesn’t matter how many of them are from District 12.

The width of our confidence interval is determined by the standard error which decreases at the factor of \frac{1}{\sqrt(n)} to half the width, we need the denominator to increase by a factor of 2. Therefore, we need 4x more data, as the square root of 4 is 2.



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.