← return to practice.dsc10.com
These problems are taken from past quizzes and exams. Work on them
on paper, since the quizzes and exams you take in this
course will also be on paper.
We encourage you to complete these
problems during discussion section. Solutions will be made available
after all discussion sections have concluded. You don’t need to submit
your answers anywhere.
Note: We do not plan to cover all of
these problems during the discussion section; the problems we don’t
cover can be used for extra practice.
The following code computes an array containing the unique kinds of dogs that are heavier than 20 kg or taller than 40 cm on average.
= df.__(a)__.__(b)__
foo np.array(foo[__(c)__].__d__)
Fill in blank (a).
Fill in blank (b).
Fill in blank (c).
Which of the following should fill in blank (d)?
.index
.unique()
.get('kind')
.get(['kind'])
You have a DataFrame called prices
that contains
information about food prices at 18 different grocery stores. There is
column called 'broccoli'
that contains the price in dollars
for one pound of broccoli at each grocery store. There is also a column
called 'ice_cream'
that contains the price in dollars for a
pint of store-brand ice cream.
Using the code,
='hist', y='broccoli', bins=np.arange(0.8, 2.11, 0.1), density=True) prices.plot(kind
we produced the histogram below:
How many grocery stores sold broccoli for a price greater than or equal to $1.30 per pound, but less than $1.40 per pound (the tallest bar)?
Suppose we now plot the same data with different bins, using the following line of code:
='hist', y='broccoli', bins=[0.8, 1, 1.1, 1.5, 1.8, 1.9, 2.5], density=True) prices.plot(kind
What would be the height on the y-axis for the bin corresponding to the interval [\$1.10, \$1.50)? Input your answer below.
You are interested in finding out the number of stores in which a pint of ice cream was cheaper than a pound of broccoli. Will you be able to determine the answer to this question by looking at the plot produced by the code below?
'broccoli', 'ice_cream']).plot(kind='barh') prices.get([
Yes
No
You are interested in finding out the number of stores in which a pint of ice cream was cheaper than a pound of broccoli. Will you be able to determine the answer to this question by looking at the plot produced by the code below?
'broccoli', 'ice_cream']).plot(kind='hist') prices.get([
Yes
No
Some code and the scatterplot that produced it is shown below:
'broccoli', 'ice_cream']).plot(kind='scatter', x='broccoli', y='ice_cream')) (prices.get([
Can you use this plot to figure out the number of stores in which a pint of ice cream was cheaper than a pound of broccoli?
If so, say how many such stores there are and explain how you came to that conclusion.
If not, explain why this scatterplot cannot be used to answer the question.
Suppose df
is a DataFrame and b
is any
boolean array whose length is the same as the number of rows of
df
.
True or False: For any such boolean array b
,
df[b].shape[0]
is less than or equal to
df.shape[0]
.
True
False
You are given a DataFrame called books
that contains
columns 'author'
(string), 'title'
(string),
'num_chapters'
(int), and 'publication_year'
(int).
Suppose that after doing books.groupby('Author').max()
,
one row says
author | title | num_chapters | publication_year |
---|---|---|---|
Charles Dickens | Oliver Twist | 53 | 1838 |
Based on this data, can you conclude that Charles Dickens is the alphabetically last of all author names in this dataset?
Yes
No
Based on this data, can you conclude that Charles Dickens wrote Oliver Twist?
Yes
No
Based on this data, can you conclude that Oliver Twist has 53 chapters?
Yes
No
Based on this data, can you conclude that Charles Dickens wrote a book with 53 chapters that was published in 1838?
Yes
No
Included is a DataFrame named sungod
that contains
information on the artists who have performed at Sun God in years past.
For each year that the festival was held, we have one row for
each artist that performed that year. The columns are:
'Year'
(int
): the year of the
festival'Artist'
(str
): the name of the
artist'Appearance_Order'
(int
): the order in
which the artist appeared in that year’s festival (1 means they came
onstage first)The rows of sungod
are arranged in no particular
order. The first few rows of sungod
are shown
below (though sungod
has many more rows
than pictured here).
Assume:
Only one artist ever appeared at a time (for example, we can’t
have two separate artists with a 'Year'
of 2015 and an
'Appearance_Order'
of 3).
An artist may appear in multiple different Sun God festivals (they could be invited back).
We have already run import babypandas as bpd
and
import numpy as np
.
On the graph paper below, draw the histogram that would be produced by this code.
(5))
sungod.take(np.arange(='hist', density=True,
.plot(kind=np.arange(0, 7, 2), y='Appearance_Order');
bins )
In your drawing, make sure to label the height of each bar in the histogram on the vertical axis. You can scale the axes however you like, and the two axes don’t need to be on the same scale.
King Triton, UCSD’s mascot, is quite the traveler! For this question,
we will be working with the flights
DataFrame, which
details several facts about each of the flights that King Triton has
been on over the past few years. The first few rows of
flights
are shown below.
Here’s a description of the columns in flights
:
'DATE'
: the date on which the flight occurred. Assume
that there were no “redeye” flights that spanned multiple days.'FLIGHT'
: the flight number. Note that this is not
unique; airlines reuse flight numbers on a daily basis.'FROM'
and 'TO'
: the 3-letter airport code
for the departure and arrival airports, respectively. Note that it’s not
possible to have a flight from and to the same airport.'DIST'
: the distance of the flight, in miles.'HOURS'
: the length of the flight, in hours.'SEAT'
: the kind of seat King Triton sat in on the
flight; the only possible values are 'WINDOW'
,
'MIDDLE'
, and 'AISLE'
. Which of these correctly evaluates to the number of flights King
Triton took to San Diego (airport code 'SAN'
)?
flights.loc['SAN'].shape[0]
flights[flights.get('TO') == 'SAN'].shape[0]
flights[flights.get('TO') == 'SAN'].shape[1]
len(flights.sort_values('TO', ascending=False).loc['SAN'])
Fill in the blanks below so that the result also evaluates to the
number of flights King Triton took to San Diego (airport code
'SAN'
).
'FLIGHT').__(b)__ flights.groupby(__(a)__).count().get(
What goes in blank (a)?
'DATE'
'FLIGHT'
'FROM'
'TO'
What goes in blank (b)?
.index[0]
.index[-1]
.loc['SAN']
.iloc['SAN']
.iloc[0]
True or False: If we change .get('FLIGHT')
to
.get('SEAT')
, the results of the above code block will not
change. (You may assume you answered the previous two subparts
correctly.)
True
False
Consider the DataFrame san
, defined below.
= flights[(flights.get('FROM') == 'SAN') & (flights.get('TO') == 'SAN')] san
Which of these DataFrames must have the same number
of rows as san
?
flights[(flights.get('FROM') == 'SAN') and (flights.get('TO') == 'SAN')]
flights[(flights.get('FROM') == 'SAN') | (flights.get('TO') == 'SAN')]
flights[(flights.get('FROM') == 'LAX') & (flights.get('TO') == 'SAN')]
flights[(flights.get('FROM') == 'LAX') & (flights.get('TO') == 'LAX')]
The American Kennel Club (AKC) organizes information about dog
breeds. We’ve loaded their dataset into a DataFrame called
df
. The index of df
contains the dog breed
names as str
values.
The columns are:
'kind' (str)
: the kind of dog (herding, hound, toy,
etc.). There are six total kinds.'size' (str)
: small, medium, or large.'longevity' (float)
: typical lifetime (years).'price' (float)
: average purchase price (dollars).'kids' (int)
: suitability for children. A value of
1
means high suitability, 2
means medium, and
3
means low.'weight' (float)
: typical weight (kg).'height' (float)
: typical height (cm).The rows of df
are arranged in no particular
order. The first five rows of df
are shown below
(though df
has many more rows than
pictured here).
Assume we have already run import babypandas as bpd
and
import numpy as np
.
The following code computes the breed of the cheapest toy dog.
df[__(a)__].__(b)__.__(c)__
Fill in part (a).
Fill in part (b).
Which of the following can fill in blank (c)? Select all that apply.
loc[0]
iloc[0]
index[0]
min()
In September 2020, Governor Gavin Newsom announced that by 2035, all new vehicles sold in California must be zero-emissions vehicles. Electric vehicles (EVs) are among the most popular zero-emissions vehicles (though other examples include plug-in hybrids and hydrogen fuel cell vehicles).
The DataFrame evs
consists of 32 rows,
each of which contains information about a different EV model.
"Brand"
(str): The vehicle’s manufacturer."Model"
(str): The vehicle’s model name."BodyStyle"
(str): The vehicle’s body style."Seats"
(int): The vehicle’s number of seats."TopSpeed"
(int): The vehicle’s top speed, in
kilometers per hour."Range"
(int): The vehicle’s range, or distance it can
travel on a single charge, in kilometers.The first few rows of evs
are shown below (though
remember, evs
has 32 rows total).
Assume that:
"Brand"
column are
"Tesla"
, "BMW"
, "Audi"
, and
"Nissan"
.import babypandas as bpd
and
import numpy as np
.Suppose we’ve run the following line of code.
= evs.groupby("Brand").count() counts
What value does counts.get("Range").sum()
evaluate
to?
What value does counts.index[3]
evaluate to?
Consider the following incomplete assignment statement.
= evs______.mean() result
In each part, fill in the blank above so that result evaluates to the specified quantity.
A DataFrame, indexed by "Brand"
, whose
"Seats"
column contains the average number of
"Seats"
per "Brand"
. (The DataFrame may have
other columns in it as well.)
A number, corresponding to the average "TopSpeed"
of all
EVs manufactured by Audi in evs
Nintendo collected data on the heights of a sample of Animal Crossing: New Horizons players. A histogram of the heights in their sample is given below.
What percentage of players in Nintendo’s sample are at least 62.5 inches tall? Give your answer as an integer rounded to the nearest multiple of 5.
You are given a DataFrame called restaurants
that
contains information on a variety of local restaurants’ daily number of
customers and daily income. There is a row for each restaurant for each
date in a given five-year time period.
The columns of restaurants
are 'name'
(string), 'year'
(int), 'month'
(int),
'day'
(int), 'num_diners'
(int), and
'income'
(float).
Assume that in our data set, there are not two different restaurants
that go by the same 'name'
(chain restaurants, for
example).
What type of visualization would be best to display the data in a way that helps to answer the question “Do more customers bring in more income?”
scatterplot
line plot
bar chart
histogram
What type of visualization would be best to display the data in a way that helps to answer the question “Have restaurants’ daily incomes been declining over time?”
scatterplot
line plot
bar chart
histogram
Suppose there are 200 students enrolled in DSC 10, and that the histogram below displays the distribution of the number of Instagram followers each student has, measured in 100s. That is, if a student is represented in the first bin, they have between 0 and 200 Instagram followers.
How many students in DSC 10 have between 200 and 800 Instagram followers? Give your answer as an integer.
Suppose the height of a bar in the above histogram is h. How many students are represented in the corresponding bin, in terms of h?
Hint: Just as in the first subpart, you’ll need to use the assumption from the start of the problem.
20 \cdot h
100 \cdot h
200 \cdot h
400 \cdot h
800 \cdot h