Discussion 4: Functions, DataFrames, and Control Flow

← return to practice.dsc10.com


The problems in this worksheet are taken from past exams. Work on them on paper, since the exams you take in this course will also be on paper.

We encourage you to complete this worksheet in a live discussion section. Solutions will be made available after all discussion sections have concluded. You don’t need to submit your answers anywhere.

Note: We do not plan to cover all problems here in the live discussion section; the problems we don’t cover can be used for extra practice.


Problem 1

In the ikea DataFrame, the first word of each string in the 'product' column represents the product line. For example the HEMNES line of products includes several different products, such as beds, dressers, and bedside tables.

The code below assigns a new column to the ikea DataFrame containing the product line associated with each product.

(ikea.assign(product_line = ikea.get('product')
                                .apply(extract_product_line)))


Problem 1.1

What are the input and output types of the extract_product_line function?


Problem 1.2

Complete the return statement in the extract_product_line function below.

For example, extract_product_line('HEMNES Daybed frame with 3 drawers, white, Twin') should return 'HEMNES'.

def extract_product_line(x):
    return _________

What goes in the blank?



Problem 2

Complete the implementation of the to_minutes function below. This function takes as input a string formatted as 'x hr, y min' where x and y represent integers, and returns the corresponding number of minutes, as an integer (type int in Python).

For example, to_minutes('3 hr, 5 min') should return 185.

def to_minutes(time):
    first_split = time.split(' hr, ')
    second_split = first_split[1].split(' min')
    return _________

What goes in the blank?


Problem 3

Consider the function tom_nook, defined below. Recall that if x is an integer, x % 2 is 0 if x is even and 1 if x is odd.

def tom_nook(crossing):
    bells = 0
    for nook in np.arange(crossing):
        if nook % 2 == 0:
            bells = bells + 1
        else:
            bells = bells - 2
    return bells

What value does tom_nook(8) evaluate to?


Problem 4

The DataFrame evs consists of 32 rows, each of which contains information about a different EV model.

The first few rows of evs are shown below.




We also have a DataFrame that contains the distribution of “BodyStyle” for all “Brands” in evs, other than Nissan.

Suppose we’ve run the following few lines of code.

tesla = evs[evs.get("Brand") == "Tesla"]
bmw = evs[evs.get("Brand") == "BMW"]
audi = evs[evs.get("Brand") == "Audi"]

combo = tesla.merge(bmw, on="BodyStyle").merge(audi, on="BodyStyle")

How many rows does the DataFrame combo have?


Problem 5

The sums function takes in an array of numbers and outputs the cumulative sum for each item in the array. The cumulative sum for an element is the current element plus the sum of all the previous elements in the array.

For example:

>>> sums(np.array([1, 2, 3, 4, 5]))
array([1, 3, 6, 10, 15])
>>> sums(np.array([100, 1, 1]))
array([100, 101, 102])

The incomplete definition of sums is shown below.

def sums(arr):
    res = _________
             (a)
    res = np.append(res, arr[0])
    for i in _________:
                (b)
        res = np.append(res, _________)
                                (c)
    return res


Problem 5.1

Fill in blank (a).


Problem 5.2

Fill in blank (b).


Problem 5.3

Fill in blank (c).



Problem 6

Teresa and Sophia are bored while waiting in line at Bistro and decide to start flipping a UCSD-themed coin, with a picture of King Triton’s face as the heads side and a picture of his mermaid-like tail as the tails side.


Teresa flips the coin 21 times and sees 13 heads and 8 tails. She stores this information in a DataFrame named teresa that has 21 rows and 2 columns, such that:

Then, Sophia flips the coin 11 times and sees 4 heads and 7 tails. She stores this information in a DataFrame named sophia that has 11 rows and 2 columns, such that:


Problem 6.1

How many rows are in the following DataFrame? Give your answer as an integer.

    teresa.merge(sophia, on="flips")

Hint: The answer is less than 200.



Problem 6.2

Let A be your answer to the previous part. Now, suppose that:

Suppose we again merge teresa and sophia on the "flips" column. In terms of A, how many rows are in the new merged DataFrame?



Problem 7

In recent years, there has been an explosion of board games that teach computer programming skills, including CoderMindz, Robot Turtles, and Code Monkey Island. Many such games were made possible by Kickstarter crowdfunding campaigns.

Suppose that in one such game, players must prove their understanding of functions and conditional statements by answering questions about the function wham, defined below. Like players of this game, you’ll also need to answer questions about this function.

1 def wham(a, b):
2   if a < b:
3       return a + 2
4   if a + 2 == b:
5       print(a + 3)
6       return b + 1
7   elif a - 1 > b:
8       print(a)
9       return a + 2
10  else:
11      return a + 1


Problem 7.1

What is printed when we run print(wham(6, 4))?


Problem 7.2

Give an example of a pair of integers a and b such that wham(a, b) returns a + 1.


Problem 7.3

Which of the following lines of code will never be executed, for any input?



Problem 8

We’ll be looking at a DataFrame named sungod that contains information on the artists who have performed at Sun God in years past. For each year that the festival was held, we have one row for each artist that performed that year. The columns are:

The rows of sungod are arranged in no particular order. The first few rows of sungod are shown below (though sungod has many more rows than pictured here).

Assume:

Fill in the blank in the code below so that chronological is a DataFrame with the same rows as sungod, but ordered chronologically by appearance on stage. That is, earlier years should come before later years, and within a single year, artists should appear in the DataFrame in the order they appeared on stage at Sun God. Note that groupby automatically sorts the index in ascending order.

chronological = sungod.groupby(___________).max().reset_index()

Problem 9

Another DataFrame called music contains a row for every music artist that has ever released a song. The columns are:

You want to know how many musical genres have been represented at Sun God since its inception in 1983. Which of the following expressions produces a DataFrame called merged that could help determine the answer?


Problem 10

Consider an artist that has only appeared once at Sun God. At the time of their Sun God performance, we’ll call the artist

Complete the function below so it outputs the appropriate description for any input artist who has appeared exactly once at Sun God.

def classify_artist(artist):
    filtered = merged[merged.get('Artist') == artist]
    year = filtered.get('Year').iloc[0]
    top_hit_year = filtered.get('Top_Hit_Year').iloc[0]
    if ___(a)___ > 0:
        return 'up-and-coming'
    elif ___(b)___:
        return 'outdated'
    else:
        return 'trending'


Problem 10.1

What goes in blank (a)?



Problem 10.2

What goes in blank (b)?



Problem 11

King Triton, UCSD’s mascot, is quite the traveler! For this question, we will be working with the flights DataFrame, which details several facts about each of the flights that King Triton has been on over the past few years. The first few rows of flights are shown below.


Here’s a description of the columns in flights:

Suppose we create a DataFrame called socal containing only King Triton’s flights departing from SAN, LAX, or SNA (John Wayne Airport in Orange County). socal has 10 rows; the bar chart below shows how many of these 10 flights departed from each airport.


Consider the DataFrame that results from merging socal with itself, as follows:

double_merge = socal.merge(socal, left_on='FROM', right_on='FROM')

How many rows does double_merge have?


Problem 12

We define a “route” to be a departure and arrival airport pair. For example, all flights from 'SFO' to 'SAN' make up the “SFO to SAN route”. This is different from the “SAN to SFO route”.

Fill in the blanks below so that most_frequent.get('FROM').iloc[0] and most_frequent.get('TO').iloc[0] correspond to the departure and destination airports of the route that King Triton has spent the most time flying on.

most_frequent = flights.groupby(__(a)__).__(b)__
most_frequent = most_frequent.reset_index().sort_values(__(c)__)


Problem 12.1

What goes in blank (a)?


Problem 12.2

What goes in blank (b)?


Problem 12.3

What goes in blank (c)?



Problem 13

We define the seasons as follows:

Season Month
Spring March, April, May
Summer June, July, August
Fall September, October, November
Winter December, January, February


Problem 13.1

We want to create a function date_to_season that takes in a date as formatted in the 'DATE' column of flights and returns the season corresponding to that date. Which of the following implementations of date_to_season works correctly? Select all that apply.

Option 1:

def date_to_season(date):
    month_as_num = int(date.split('-')[1])
    if month_as_num >= 3 and month_as_num < 6:
        return 'Spring'
    elif month_as_num >= 6 and month_as_num < 9:
        return 'Summer'
    elif month_as_num >= 9 and month_as_num < 12:
        return 'Fall'
    else:
        return 'Winter'

Option 2:

def date_to_season(date):
    month_as_num = int(date.split('-')[1])
    if month_as_num >= 3 and month_as_num < 6:
        return 'Spring'
    if month_as_num >= 6 and month_as_num < 9:
        return 'Summer'
    if month_as_num >= 9 and month_as_num < 12:
        return 'Fall'
    else:
        return 'Winter'

Option 3:

def date_to_season(date):
    month_as_num = int(date.split('-')[1])
    if month_as_num < 3:
        return 'Winter'
    elif month_as_num < 6:
        return 'Spring'
    elif month_as_num < 9:
        return 'Summer'
    elif month_as_num < 12:
        return 'Fall'
    else:
        return 'Winter' 


Problem 13.2

Assuming we’ve defined date_to_season correctly in the previous part, which of the following lines of code correctly computes the season for each flight in flights?



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.