Discussion 2: Arrays and DataFrames

← return to practice.dsc10.com


The problems in this worksheet are taken from past exams. Work on them on paper, since the exams you take in this course will also be on paper.

We encourage you to complete this worksheet in a live discussion section. Solutions will be made available after all discussion sections have concluded. You don’t need to submit your answers anywhere.

Note: We do not plan to cover all problems here in the live discussion section; the problems we don’t cover can be used for extra practice.


Problem 1

prices is an array of prices, in dollars, of different products at the grocery store. Similarly, calories is an array of the calories in these same products, in the same order.


Problem 1.1

What does type(prices[0]) evaluate to?


Problem 1.2

What does type(calories[0]) evaluate to?


Problem 1.3

When we divide two arrays of the same length, their corresponding elements get divided, and the result is a new array of the same length as the two originals. In one sentence, interpret the meaning of min(prices / calories).


Problem 1.4

True or False: min(prices / calories) is the same as max(calories / price).



Problem 2

Consider the following four assignment statements.

bass = "5"
tuna = 2
sword = ["4.0", 5, 12.5, -10, "2023"]
gold = [4, "6", "CSE", "doc"]


Problem 2.1

What is the value of the expression bass * tuna?


Problem 2.2

Which of the following expressions results in an error?


Problem 2.3

Which of the following expressions evaluates to "DSC10"?



Problem 3

Evaluate the expression (np.arange(1, 7, 2.5) * np.arange(8, 2, -2))[2] .


Problem 4

For the problems that follow, we will work with a dataset consisting of various skyscrapers in the US, which we’ve loaded into a DataFrame called sky. The first few rows of sky are shown below (though the full DataFrame has more rows):

 

Each row of sky corresponds to a single skyscraper. For each skyscraper, we have:

Below, identify the data type of the result of each of the following expressions, or select “error” if you believe the expression results in an error.


Problem 4.1

sky.sort_values('height')


Problem 4.2

sky.sort_values('height').get('material').loc[0]


Problem 4.3

sky.sort_values('height').get('material').iloc[0]


Problem 4.4

sky.get('floors').max()


Problem 4.5

sky.index[0]



Problem 5


Problem 5.1

Write a single line of code that evaluates to the name of the tallest skyscraper in the sky DataFrame.


Problem 5.2

Write a single line of code that evaluates to the average number of floors across all skyscrapers in the DataFrame.



Problem 6

Consider the following assignment statement.

puffin = np.array([5, 9, 13, 17, 21])


Problem 6.1

Provide arguments to call np.arange with so that the array penguin is identical to the array puffin.

penguin = np.arange(____)


Problem 6.2

Fill in the blanks so that the array parrot is also identical to the array puffin.
Hint: Start by choosing y so that parrot has length 5.

parrot = __(x)__ * np.arange(0, __(y)__, 2) + __(z)__



Problem 7

Suppose students is a DataFrame of all students who took DSC 10 last quarter. students has one row per student, where:


Problem 7.1

What type is students.get("Overall")? If this expression errors, select “this errors."


Problem 7.2

What type is students.get("PID")? If this expression errors, select “this errors."


Vanessa is one student who took DSC 10 last quarter. Her PID is A12345678, she earned the sixth-highest overall percentage grade in the class, and her favorite animal is the giraffe.


Problem 7.3

Supposing that students is already sorted by "Overall" in descending order, fill in the blanks so that animal_one and animal_two both evaluate to "giraffe".

animal_one = students.get(__(x)__).loc[__(y)__]
animal_two = students.get(__(x)__).iloc[__(z)__]


Problem 7.4

If students wasn’t already sorted by "Overall" in descending order, which of your answers would need to change?



Problem 8

You are given a DataFrame called sports, indexed by 'Sport' containing one column, 'PlayersPerTeam'. The first few rows of the DataFrame are shown below:

Sport PlayersPerTeam
baseball 9
basketball 5
field hockey 11


Which of the following evaluates to 'basketball'?


Problem 9

Suppose you are given a DataFrame of employees for a given company. The DataFrame, called employees, is indexed by 'employee_id' (string) with a column called 'years' (int) that contains the number of years each employee has worked for the company.


Problem 9.1

Suppose that the code

employees.sort_values(by='years', ascending=False).index[0]

outputs '2476'.

True or False: The number of years that employee 2476 has worked for the company is greater than the number of years that any other employee has worked for the company.


Problem 9.2

What will be the output of the following code?

employees.assign(start=2021-employees.get('years'))
employees.sort_values(by='start').index.iloc[-1]



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.