← return to practice.dsc10.com

These problems are taken from past quizzes and exams. Work on them
**on paper**, since the quizzes and exams you take in this
course will also be on paper.

We encourage you to complete these
problems during discussion section. Solutions will be made available
after all discussion sections have concluded. You don’t need to submit
your answers anywhere.**Note: We do not plan to cover all of
these problems during the discussion section**; the problems we don’t
cover can be used for extra practice.

The following code computes an array containing the unique kinds of
dogs that are heavier than 20 kg **or** taller than 40 cm
on average.

```
= df.__(a)__.__(b)__
foo np.array(foo[__(c)__].__d__)
```

Fill in blank (a).

Fill in blank (b).

Fill in blank (c).

Which of the following should fill in blank (d)?

`.index`

`.unique()`

`.get('kind')`

`.get(['kind'])`

You have a DataFrame called `prices`

that contains
information about food prices at 18 different grocery stores. There is
column called `'broccoli'`

that contains the price in dollars
for one pound of broccoli at each grocery store. There is also a column
called `'ice_cream'`

that contains the price in dollars for a
pint of store-brand ice cream.

Using the code,

`='hist', y='broccoli', bins=np.arange(0.8, 2.11, 0.1), density=True) prices.plot(kind`

we produced the histogram below:

How many grocery stores sold broccoli for a price greater than or equal to $1.30 per pound, but less than $1.40 per pound (the tallest bar)?

Suppose we now plot the same data with different bins, using the following line of code:

`='hist', y='broccoli', bins=[0.8, 1, 1.1, 1.5, 1.8, 1.9, 2.5], density=True) prices.plot(kind`

What would be the height on the y-axis for the bin corresponding to the interval [\$1.10, \$1.50)? Input your answer below.

You are interested in finding out the number of stores in which a pint of ice cream was cheaper than a pound of broccoli. Will you be able to determine the answer to this question by looking at the plot produced by the code below?

`'broccoli', 'ice_cream']).plot(kind='barh') prices.get([`

Yes

No

You are interested in finding out the number of stores in which a pint of ice cream was cheaper than a pound of broccoli. Will you be able to determine the answer to this question by looking at the plot produced by the code below?

`'broccoli', 'ice_cream']).plot(kind='hist') prices.get([`

Yes

No

Some code and the scatterplot that produced it is shown below:

`'broccoli', 'ice_cream']).plot(kind='scatter', x='broccoli', y='ice_cream')) (prices.get([`

Can you use this plot to figure out the number of stores in which a pint of ice cream was cheaper than a pound of broccoli?

If so, say how many such stores there are and explain how you came to that conclusion.

If not, explain why this scatterplot cannot be used to answer the question.

Suppose `df`

is a DataFrame and `b`

is any
boolean array whose length is the same as the number of rows of
`df`

.

True or False: For any such boolean array `b`

,
`df[b].shape[0]`

is less than or equal to
`df.shape[0]`

.

True

False

You are given a DataFrame called `books`

that contains
columns `'author'`

(string), `'title'`

(string),
`'num_chapters'`

(int), and `'publication_year'`

(int).

Suppose that after doing `books.groupby('Author').max()`

,
one row says

author | title | num_chapters | publication_year |
---|---|---|---|

Charles Dickens | Oliver Twist | 53 | 1838 |

Based on this data, can you conclude that Charles Dickens is the alphabetically last of all author names in this dataset?

Yes

No

Based on this data, can you conclude that Charles Dickens wrote
*Oliver Twist*?

Yes

No

Based on this data, can you conclude that *Oliver Twist* has
53 chapters?

Yes

No

Based on this data, can you conclude that Charles Dickens wrote a book with 53 chapters that was published in 1838?

Yes

No

Included is a DataFrame named `sungod`

that contains
information on the artists who have performed at Sun God in years past.
**For each year that the festival was held, we have one row for
each artist that performed that year.** The columns are:

`'Year'`

(`int`

): the year of the festival`'Artist'`

(`str`

): the name of the artist`'Appearance_Order'`

(`int`

): the order in which the artist appeared in that year’s festival (1 means they came onstage first)

The rows of `sungod`

are arranged in **no particular
order**. The first few rows of `sungod`

are shown
below (though `sungod`

has **many more rows**
than pictured here).

Assume:

Only one artist ever appeared at a time (for example, we can’t have two separate artists with a

`'Year'`

of 2015 and an`'Appearance_Order'`

of 3).An artist may appear in multiple different Sun God festivals (they could be invited back).

We have already run

`import babypandas as bpd`

and`import numpy as np`

.

On the graph paper below, draw the histogram that would be produced by this code.

```
(5))
sungod.take(np.arange(='hist', density=True,
.plot(kind=np.arange(0, 7, 2), y='Appearance_Order');
bins )
```

In your drawing, make sure to label the height of each bar in the histogram on the vertical axis. You can scale the axes however you like, and the two axes don’t need to be on the same scale.

King Triton, UCSD’s mascot, is quite the traveler! For this question,
we will be working with the `flights`

DataFrame, which
details several facts about each of the flights that King Triton has
been on over the past few years. The first few rows of
`flights`

are shown below.

Here’s a description of the columns in `flights`

:

`'DATE'`

: the date on which the flight occurred. Assume that there were no “redeye” flights that spanned multiple days.`'FLIGHT'`

: the flight number. Note that this is not unique; airlines reuse flight numbers on a daily basis.`'FROM'`

and`'TO'`

: the 3-letter airport code for the departure and arrival airports, respectively. Note that it’s not possible to have a flight from and to the same airport.`'DIST'`

: the distance of the flight, in miles.`'HOURS'`

: the length of the flight, in hours.`'SEAT'`

: the kind of seat King Triton sat in on the flight; the only possible values are`'WINDOW'`

,`'MIDDLE'`

, and`'AISLE'`

.

Which of these correctly evaluates to the number of flights King
Triton took to San Diego (airport code `'SAN'`

)?

`flights.loc['SAN'].shape[0]`

`flights[flights.get('TO') == 'SAN'].shape[0]`

`flights[flights.get('TO') == 'SAN'].shape[1]`

`len(flights.sort_values('TO', ascending=False).loc['SAN'])`

Fill in the blanks below so that the result also evaluates to the
number of flights King Triton took to San Diego (airport code
`'SAN'`

).

`'FLIGHT').__(b)__ flights.groupby(__(a)__).count().get(`

What goes in blank (a)?

`'DATE'`

`'FLIGHT'`

`'FROM'`

`'TO'`

What goes in blank (b)?

`.index[0]`

`.index[-1]`

`.loc['SAN']`

`.iloc['SAN']`

`.iloc[0]`

True or False: If we change `.get('FLIGHT')`

to
`.get('SEAT')`

, the results of the above code block will not
change. (You may assume you answered the previous two subparts
correctly.)

True

False

Consider the DataFrame `san`

, defined below.

`= flights[(flights.get('FROM') == 'SAN') & (flights.get('TO') == 'SAN')] san `

Which of these DataFrames **must** have the same number
of rows as `san`

?

`flights[(flights.get('FROM') == 'SAN') and (flights.get('TO') == 'SAN')]`

`flights[(flights.get('FROM') == 'SAN') | (flights.get('TO') == 'SAN')]`

`flights[(flights.get('FROM') == 'LAX') & (flights.get('TO') == 'SAN')]`

`flights[(flights.get('FROM') == 'LAX') & (flights.get('TO') == 'LAX')]`

The American Kennel Club (AKC) organizes information about dog
breeds. We’ve loaded their dataset into a DataFrame called
`df`

. The index of `df`

contains the dog breed
names as `str`

values.

The columns are:

`'kind' (str)`

: the kind of dog (herding, hound, toy, etc.). There are six total kinds.`'size' (str)`

: small, medium, or large.`'longevity' (float)`

: typical lifetime (years).`'price' (float)`

: average purchase price (dollars).`'kids' (int)`

: suitability for children. A value of`1`

means high suitability,`2`

means medium, and`3`

means low.`'weight' (float)`

: typical weight (kg).`'height' (float)`

: typical height (cm).

The rows of `df`

are arranged in **no particular
order**. The first five rows of `df`

are shown below
(though `df`

has **many more rows** than
pictured here).

Assume we have already run `import babypandas as bpd`

and
`import numpy as np`

.

The following code computes the breed of the cheapest toy dog.

` df[__(a)__].__(b)__.__(c)__`

Fill in part (a).

Fill in part (b).

Which of the following can fill in blank (c)? **Select all that
apply.**

`loc[0]`

`iloc[0]`

`index[0]`

`min()`

In September 2020, Governor Gavin Newsom announced that by 2035, all new vehicles sold in California must be zero-emissions vehicles. Electric vehicles (EVs) are among the most popular zero-emissions vehicles (though other examples include plug-in hybrids and hydrogen fuel cell vehicles).

The DataFrame `evs`

consists of **32** rows,
each of which contains information about a different EV model.

`"Brand"`

(str): The vehicle’s manufacturer.`"Model"`

(str): The vehicle’s model name.`"BodyStyle"`

(str): The vehicle’s body style.`"Seats"`

(int): The vehicle’s number of seats.`"TopSpeed"`

(int): The vehicle’s top speed, in kilometers per hour.`"Range"`

(int): The vehicle’s range, or distance it can travel on a single charge, in kilometers.

The first few rows of `evs`

are shown below (though
remember, `evs`

has 32 rows total).

Assume that:

- The only four values in the
`"Brand"`

column are`"Tesla"`

,`"BMW"`

,`"Audi"`

, and`"Nissan"`

. - We have already run
`import babypandas as bpd`

and`import numpy as np`

.

Suppose we’ve run the following line of code.

`= evs.groupby("Brand").count() counts `

What value does `counts.get("Range").sum()`

evaluate
to?

What value does `counts.index[3]`

evaluate to?

Consider the following incomplete assignment statement.

`= evs______.mean() result `

In each part, fill in the blank above so that result evaluates to the specified quantity.

A DataFrame, indexed by `"Brand"`

, whose
`"Seats"`

column contains the average number of
`"Seats"`

per `"Brand"`

. (The DataFrame may have
other columns in it as well.)

A number, corresponding to the average `"TopSpeed"`

of all
EVs manufactured by Audi in evs

Nintendo collected data on the heights of a sample of *Animal
Crossing: New Horizons* players. A histogram of the heights in their
sample is given below.

What **percentage** of players in Nintendo’s sample are
at least 62.5 inches tall? Give your answer as an integer rounded to the
**nearest multiple of 5**.

You are given a DataFrame called `restaurants`

that
contains information on a variety of local restaurants’ daily number of
customers and daily income. There is a row for each restaurant for each
date in a given five-year time period.

The columns of `restaurants`

are `'name'`

(string), `'year'`

(int), `'month'`

(int),
`'day'`

(int), `'num_diners'`

(int), and
`'income'`

(float).

Assume that in our data set, there are not two different restaurants
that go by the same `'name'`

(chain restaurants, for
example).

What type of visualization would be best to display the data in a way that helps to answer the question “Do more customers bring in more income?”

scatterplot

line plot

bar chart

histogram

What type of visualization would be best to display the data in a way that helps to answer the question “Have restaurants’ daily incomes been declining over time?”

scatterplot

line plot

bar chart

histogram

Suppose there are 200 students enrolled in DSC 10, and that the histogram below displays the distribution of the number of Instagram followers each student has, measured in 100s. That is, if a student is represented in the first bin, they have between 0 and 200 Instagram followers.

How many students in DSC 10 have between 200 and 800 Instagram followers? Give your answer as an integer.

Suppose the height of a bar in the above histogram is h. How many students are represented in the corresponding bin, in terms of h?

*Hint: Just as in the first subpart, you’ll need to use the
assumption from the start of the problem.*

20 \cdot h

100 \cdot h

200 \cdot h

400 \cdot h

800 \cdot h