← return to practice.dsc10.com

Welcome! The problems shown below should be worked on **on
paper**, since the quizzes and exams you take in this course will
also be on paper.

We encourage you to complete this worksheet in
a live discussion section. Solutions will be made available after all
discussion sections have concluded. You don’t need to submit your
answers anywhere.

Evaluate the expression
`(np.arange(1, 7, 2.5) * np.arange(8, 2, -2))[2]`

.

**Answer:** `24.0`

This question although is daunting at first, is best solved by
breaking up the question into parts. First, let us think about the first
part, `np.arange(1, 7, 2.5)`

. In order to answer this, we
must figure out what `np.arange()`

does. What
`np.arange()`

does is it creates a `numpy`

array
that contains regularly spaces values between a start value and an end
value (start is inclusive, end is exclusive). So in this first case, our
starting value is 1, our end value is 7, and the regular interval or
step size is 2.5. So this call, `np.arange(1, 7, 2.5)`

, will
output the `numpy`

array
`np.array([1.0, 3.5, 6.0])`

because we start at 1, and
continue adding 2.5 stopping at the last value that’s less than 7. The
reason the resulting `np.array([])`

containts all
`float`

values is because one of the numbers is not an
`int`

, and all elements in the array have to have the same
data type. Now that we have evaluated the first half, let us now solve
for `np.arange(8, 2, -2)`

. Now this part may seem a little
tricky because of the negative regular interval (step size), but it is
the same logic as before. The output will simply be
`np.array([8, 6, 4])`

. In order to get that, we start at 8,
and continue to decrease our start value by 2 stopping before we reach
2. Now that we have evaluated both `np.arange(1, 7, 2.5)`

and
`np.arange(8, 2, -2)`

, it is now time to multiply.

Multiplication of two `numpy`

arrays is simply a pair wise
multiplication. So in our case, we will be multiplying
`np.array([1.0, 3.5, 6.0]) * np.array([8, 6, 4])`

, which
results to `np.array([8.0, 21.0, 24.0])`

. Again, paying
attention to the datatypes, the reason that
`np.array([8.0, 21.0, 24.0])`

contains `float`

values rather than `int`

values is because when you multiply
an `int`

by a `float`

, your answer will be a
`float`

. Now that we have evaluated
`(np.arange(1, 7, 2.5) * np.arange(8, 2, -2))`

to be
`np.array([8.0, 21.0, 24.0])`

, we now just need to access the
element in position 2, which is `24.0`

.

For the problems that follow, we will work with a dataset consisting
of various skyscrapers in the US, which we’ve loaded into a DataFrame
called `sky`

. The first few rows of `sky`

are
shown below (though the full DataFrame has more rows):

Each row of `sky`

corresponds to a single skyscraper. For
each skyscraper, we have:

its name, which is stored in the index of

`sky`

(string)the

`'material'`

it is made up of (string)the

`'city'`

in the US where it is located (string)the number of

`'floors'`

(levels) it contains (int)its

`'height'`

in meters (float), andthe

`'year'`

in which it was opened (int)

Below, identify the data type of the result of each of the following expressions, or select “error” if you believe the expression results in an error.

`'height') sky.sort_values(`

int or float

Boolean

string

array

Series

DataFrame

error

**Answer:** DataFrame

`sky`

is a DataFrame. All the `sort_values`

method does is change the order of the rows in the Series/DataFrame it
is called on, it does not change the data structure. As such,
`sky.sort_values('height')`

is also a DataFrame.

The average score on this problem was 87%.

`'height').get('material').loc[0] sky.sort_values(`

int or float

Boolean

string

array

Series

DataFrame

error

**Answer:** error

`sky.sort_values('height')`

is a DataFrame, and
`sky.sort_values('height').get('material')`

is a Series
corresponding to the `'material'`

column, sorted by
`'height'`

in increasing order. So far, there are no
errors.

Remember, the `.loc`

*accessor* is used to access
elements in a Series based on their index.
`sky.sort_values('height').get('material').loc[0]`

is asking
for the element in the
`sky.sort_values('height').get('material')`

Series with index
0. However, the index of `sky`

is made up of building names.
Since there is no building named `0`

, `.loc[0]`

causes an error.

The average score on this problem was 79%.

`'height').get('material').iloc[0] sky.sort_values(`

int or float

Boolean

string

array

Series

DataFrame

error

**Answer:** string

As we mentioned above,
`sky.sort_values('height').get('material')`

is a Series
containing values from the `'material'`

column (but sorted).
Remember, there is no element in this Series with an index of 0, so
`sky.sort_values('height').get('material').loc[0]`

errors.
However, `.iloc[0]`

works differently than
`.loc[0]`

; `.iloc[0]`

will give us the first
element in a Series (independent of what’s in the index). So,
`sky.sort_values('height').get('material').iloc[0]`

gives us
back a value from the `'material'`

column, which is made up
of strings, so it gives us a string. (Specifically, it gives us the
`'material'`

type of the skyscraper with the smallest
`'height'`

.)

The average score on this problem was 89%.

`'floors').max() sky.get(`

int or float

Boolean

string

array

Series

DataFrame

error

**Answer:** int or float

The Series `sky.get('floors')`

is made up of integers, and
`sky.get('floors').max()`

evaluates to the largest number in
the Series, which is also an integer.

The average score on this problem was 91%.

`0] sky.index[`

int or float

Boolean

string

array

Series

DataFrame

error

**Answer:** string

`sky.index`

contains the values
`'Bayard-Condict Building'`

,
`'The Yacht Club at Portofino'`

,
`'City Investing Building'`

, etc. `sky.index[0]`

is then `'Bayard-Condict Building'`

, which is a string.

The average score on this problem was 91%.

Write a single line of code that evaluates to the name of the tallest
skyscraper in the `sky`

DataFrame.

**Answer:**
`sky.sort_values(by='height', ascending=False).index[0]`

In order to answer this question, we must first sort the values of
the column we are interested in. As such, we sort the entire DataFrame
by the `height`

column, and because we are interested in the
name of the tallest building, we should set the `ascending`

parameter to `False`

because we would like the heights to be
ordered in descending order, thus leading to the line
`sky.sort_values(by='height', ascending=False)`

. After
sorting in descending order, we know that the tallest building is going
to be the first row of the new `sky`

DataFrame, and thus we
now only need to get the name of the skyscraper, which happens to be in
the index. In order to access the index of the DataFrame we can use
`sky.index`

, and in our case because we know that we want the
first index, we would need to write `sky.index[0]`

. Finally,
putting it all together, in order to get the name of the tallest
skyscraper in the `sky`

DataFrame, we would need to write
`sky.sort_values(by='Height', ascending=False).index[0]`

.

Write a single line of code that evaluates to the average number of floors across all skyscrapers in the DataFrame.

**Answer:** `sky.get('floors').mean()`

In order to answer the question, we must first figure out how to get
the number of floors each skyscraper has. We can do this with a line of
code like `sky.get('floors')`

which will get the number of
floors each skyscraper has. After doing this, we now need to find out
the average number of floors each skyscraper has. We can do this by
using the `.mean()`

method, which in our case will get the
average number of floors each skyscraper has. Putting this all togther,
we get a line of code that looks like
`sky.get('floors').mean()`

.

Write a single line of code that evaluates to the tallest skyscraper in New York City.

**Answer:**
`sky[sky.get('city') == 'New York City'].get('height').max()`

In order to answer this question, we must first query the DataFrame
to only include skyscrapers that are located in New York City. We can do
this with a line such as
`sky[sky.get('city') == 'New York City']`

. After doing this,
we know that the resulting DataFrame is only going to include
skyscrapers from New York City, and we now can focus on getting the
tallest building. In order to do so, we first need to get the heights of
all the buildings in the resulting DataFrame which can be done with
`.get('height')`

. Now that we have gotten all the heights, we
finally need to get the largest height, which can simply be done by
using the `.max()`

Series method. Putting it all together, we
have a line that looks like
`sky[sky.get('city') == 'New York City'].get('height').max()`

.