Discussion 7: Standardization and The Normal Distribution

← return to practice.dsc10.com


The problems in this worksheet are taken from past exams. Work on them on paper, since the exams you take in this course will also be on paper.

We encourage you to complete this worksheet in a live discussion section. Solutions will be made available after all discussion sections have concluded. You don’t need to submit your answers anywhere.

Note: We do not plan to cover all problems here in the live discussion section; the problems we don’t cover can be used for extra practice.


Problem 1

Rank these three students in ascending order of their exam performance relative to their classmates.


Problem 2

The data visualization below shows all Olympic gold medals for women’s gymnastics, broken down by the age of the gymnast.

Based on this data, rank the following three quantities in ascending order: the median age at which gold medals are earned, the mean age at which gold medals are earned, the standard deviation of the age at which gold medals are earned.


Problem 3

Among all Costco members in San Diego, the average monthly spending in October 2023 was $350 with a standard deviation of $40.


Problem 3.1

The amount Ciro spent at Costco in October 2023 was -1.5 in standard units. What is this amount in dollars? Give your answer as an integer.


Problem 3.2

What is the minimum possible percentage of San Diego members that spent between $250 and $450 in October 2023?


Problem 3.3

Now, suppose we’re given that the distribution of monthly spending in October 2023 for all San Diego members is roughly normal. Given this fact, fill in the blanks:

"In October 2023, 95% of San Diego members spent between __(m)__ dollars and __(n)__ dollars."


What are m and n? Give your answers as integers rounded to the nearest multiple of 10.



Problem 4

Researchers from the San Diego Zoo, located within Balboa Park, collected physical measurements of several species of penguins in a region of Antarctica.

One piece of information they tracked for each of 330 penguins was its mass in grams. The average penguin mass is 4200 grams, and the standard deviation is 840 grams.


Problem 4.1

Consider the histogram of mass below.


Select the true statement below.


Problem 4.2

For your convenience, we show the histogram of mass again below.


Recall, there are 330 penguins in our dataset. Their average mass is 4200 grams, and the standard deviation of mass is 840 grams.

Per Chebyshev’s inequality, at least what percentage of penguins have a mass between 3276 grams and 5124 grams? Input your answer as a percentage between 0 and 100, without the % symbol. Round to three decimal places.


Problem 4.3

Per Chebyshev’s inequality, at least what percentage of penguins have a mass between 1680 grams and 5880 grams?


Problem 4.4

The distribution of mass in grams is not roughly normal. Is the distribution of mass in standard units roughly normal?


Problem 4.5

Suppose boot_means is an array of the resampled means. Fill in the blanks below so that [left, right] is a 68% confidence interval for the true mean mass of penguins.

left = np.percentile(boot_means, __(a)__)
right = np.percentile(boot_means, __(b)__)
[left, right]

What goes in blank (a)? What goes in blank (b)?


Problem 4.6

Which of the following is a correct interpretation of this confidence interval? Select all that apply.



Problem 5

An IKEA chair designer is experimenting with some new ideas for armchair designs. She has the idea of making the arm rests shaped like bell curves, or normal distributions. A cross-section of the armchair design is shown below.


This was created by taking the portion of the standard normal distribution from z=-4 to z=4 and adjoining two copies of it, one centered at z=0 and the other centered at z=8. Let’s call this shape the armchair curve.

Since the area under the standard normal curve from z=-4 to z=4 is approximately 1, the total area under the armchair curve is approximately 2.

Complete the implementation of the two functions below:

  1. area_left_of(z) should return the area under the armchair curve to the left of z, assuming -4 <= z <= 12, and
  2. area_between(x, y) should return the area under the armchair curve between x and y, assuming -4 <= x <= y <= 12.
import scipy

def area_left_of(z):
    '''Returns the area under the armchair curve to the left of z.
       Assume -4 <= z <= 12'''
    if ___(a)___: 
        return ___(b)___ 
    return scipy.stats.norm.cdf(z)

def area_between(x, y):
    '''Returns the area under the armchair curve between x and y. 
    Assume -4 <= x <= y <= 12.'''
    return ___(c)___


Problem 5.1

What goes in blank (a)?


Problem 5.2

What goes in blank (b)?


Problem 5.3

What goes in blank (c)?



Problem 6

Suppose you have correctly implemented the function area_between(x, y) so that it returns the area under the armchair curve between x and y, assuming the inputs satisfy -4 <= x <= y <= 12.

Note: You can still do this question, even if you didn’t know how to do the previous one.


Problem 6.1

What is the approximate value of area_between(-2, 10)?


Problem 6.2

What is the approximate value of area_between(0.37, 8.37)?



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.