← return to practice.dsc10.com
Below are practice problems tagged for Lecture 18 (rendered directly from the original exam/quiz sources).
After you graduate, you are hired by TritonCard! On your new work
computer, you install numpy, but something goes wrong with
the installation — your copy of numpy doesn’t come with
np.random.multinomial. To demonstrate your resourcefulness
to your new employer, you decide to implement your own version of
np.random.multinomial.
Below, complete the implementation of the function
manual_multinomial so that
manual_multinomial(n, p) works the same way as
np.random.multinomial(n, p). That is,
manual_multinomial should take in an integer n
and an array of probabilities p. It should return an array
containing the counts in each category when we randomly draw
n items from a categorical distribution where the
probabilities of drawing an item from each category are given in the
array p. The array returned by
manual_multinomial(n, p) should have a length of
len(p) and a sum of n.
For instance, to simulate flipping a coin five times, we could call
manual_multinomial(5, np.array([0.5, 0.5])), and the output
might look like array([2, 3]).
def manual_multinomial(n, p):
values = np.arange(len(p))
choices = np.random.choice(values, size=__(a)__, replace=__(b)__, p=p)
value_counts = np.array([])
for value in values:
value_count = __(c)__
value_counts = np.append(value_counts, value_count)
return value_countsWhat goes in blank (a)?
Answer: n
The size argument in np.random.choice provides the
number of samples we draw. In the manual_multinomial
function, we randomly draw n items, and so the size should
be n.
The average score on this problem was 81%.
What goes in blank (b)?
Answer: True
Here, we are using np.random.choice to simulate picking
n elements from values. We draw with
replacement since we are allowed to have repeated elements. For example,
if we were flipping a coin five times, we would need to have repeated
elements, since there are only two possible outcomes of a coin flip but
we are flipping the coin more than two times.
The average score on this problem was 79%.
What goes in blank (c)?
Answer:
np.count_nonzero(choices == value)
The choices variable contains an array of the
n randomly drawn values selected from values.
In each iteration of the for-loop, we want to count the number of
elements in choices that are equal to the given
value. To do this, we can use
np.count_nonzero(choices == value). In the end,
value_counts is an array that says how many times we
selected 0, how many times we selected 1, and so on.
The average score on this problem was 37%.
You want to use the data in apts to test both of the
following pairs of hypotheses:
Pair 1:
Pair 2:
In apts, there are 467 apartments that are either one
bedroom or two bedroom apartments. You perform the following simulation
under the assumption of the null hypothesis.
prop_1bf = np.array([])
abs_diff = np.array([])
for i in np.arange(10000):
prop = np.random.multinomial(467, [0.5, 0.5])[0]/467
prop_1br = np.append(prop_1br, prop)
abs_diff = np.append(abs_diff, np.abs(prop-0.5))You then calculate some percentiles of prop_1br. The
following four expressions all evaluate to True.
np.percentiles(prop_1br, 2.5) == 0.4
np.percentiles(prop_1br, 5) == 0.42
np.percentiles(prop_1br, 95) == 0.58
np.percentiles(prop_1br, 97.5) == 0.6What is prop_1br.mean() to two decimal places?
Answer: 0.5
From the given percentiles, we can notice that since the distribution is symmetric around the mean, the mean should be around the 50th percentile. Given the symmetry and the percentiles around 0.5, we can infer that the mean should be very close to 0.5.
Another way we can look at it is by noticing that prop
is pulled from a [0.5, 0.5]
distribution (because we are simulating under the null hypotheses) in
np.random.multinomial(). This means that its expected for
most of the distribution to be from around 0.5.
The average score on this problem was 84%.
What is np.std(prop_1br) to two decimal places?
Answer: 0.05
If we look again at the percentiles, we notice that it seems to resemble a normal distribution. So by taking the mean and the 97.5th percentile, we can solve for the standard deviation. Since [2.5, 97.5] is the 95% confidence interval, we can say that the 97.5th percentile is two standard deviations away from the mean (2.5 too!). Thus,
0.5 + 2 \cdot \text{SD} = 0.6
\therefore Solving for SD, we get \text{SD} = 0.05
The average score on this problem was 45%.
What is np.percentile(abs_diff, 95) to two decimal
places?
Answer: 0.1
Each time through our for-loop, we execute the following lines of code:
prop_1br = np.append(prop_1br, prop)
abs_diff = np.append(abs_diff, np.abs(prop-0.5))
Additionally, we’re told the following statements evaluate to True:
np.percentiles(prop_1br, 2.5) == 0.4
np.percentiles(prop_1br, 5) == 0.42
np.percentiles(prop_1br, 95) == 0.58
np.percentiles(prop_1br, 97.5) == 0.6
We can combine these pieces of information to find the answer to this question.
First, consider the shape of the distribution of
prop_1br. We know it’s symmetrical around 0.5, and beyond
that, we can infer that it’s a normal distribution.
Now, think about how this relates to the distribution of
abs_diff. abs_diff is generated by finding the
absolute difference between prop_1br and 0.5. Because of
this, abs_diff is an array of distances (which are nonnegative by
definition) from 0.5.
We know that prop_1br is normal, and symmetrical about
0.5. So, the distribution of how far away prop_1br is from
0.5 will look like we took the distribution of prop_1br,
moved it to be centered at 0, and folded it in half so that all negative
values become positive. This is because the previous center at 0.5
represents a distance of 0 from 0.5. Similarly, a value of 0.6 would
represent a distance of 0.1 from 0.5, and a value of 0.4 would also
represent a distance of 0.1 from 0.5.
Now, the problem becomes much simpler to solve. Before, we were told
that 95% of our the in prop_1br lies between 0.4 and 0.6
(Thanks to the lines of code that evaluate to True). This is the same as
telling us that 95% of the data in prop_1br lies within a
distance of 0.1 to 0.5 (Because 0.4 and 0.6 are both 0.1 away from
0.5).
Because of this, the 95% percentile of abs_diff is 0.1, since 95% of
the data in prop_1br lies within a distance of 0.1 to 0.5
(meaning that 95% of the data in abs_diff is between 0 and 0.1).
The average score on this problem was 10%.
Which simulated test statistics should be used to test the first pair of hypotheses?
prop_1br
abs_diff
Answer: prop_1br
Our first pair of hypotheses’ alternative hypothesis asks if one number is greater than the other. Because of this, we can’t use an absolute value test statistic to answer the question, since all absolute value cares about is the distance the simulation is from the null assumption, not whether one value is greater than the other.
The average score on this problem was 82%.
Which simulated test statistics should be used to test the second pair of hypotheses?
prop_1br
abs_diff
Answer: abs_diff
Our first pair of hypotheses’ alternative hypothesis asks if one number is not equal to the other. Because of this, we have to use a test statistic that sees the distance both ways, not just in one direction. Therefore, we use the absolute value.
The average score on this problem was 83%.
Your observed data in apts is such that you reject the
null for the first pair of hypotheses at the 5% significance level, but
fail to reject the null for the second pair at the 5% significance
level. What could the value of the following proportion have been?
\frac{\text{\# of one bedroom apartments in \texttt{apts}}}{\text{\# of one bedroom apartments in \texttt{apts}+ \# of two bedroom apartments in \texttt{apts}}}
Give your answer as a number to two decimal places.
Answer: 0.59
The average score on this problem was 20%.
Suppose you want to estimate the proportion of UCSD students that love dogs using a survey with a yes/no question. If you poll 400 students, what is the widest possible width for a 95% confidence interval?
0.01
0.05
0.1
0.2
0.5
None of the above
Answer: Option 3: 0.1
Since, we’re looking at a proportion of UCSD students that love dogs,
we’ll set a “yes” vote to a value of 1 and a “no” vote to a value of 0.
(Try to see why this makes the mean of “yes”/“no” votes also the
proportion of “yes” votes). Also by central limit theorem, the
distribution of the sample mean is approximately normal. Now recall that
a 95% confidence interval of a sample mean is given by
[sample mean - 2 * (sample std / np.sqrt(sample size)), sample mean + 2 * (sample std / np.sqrt(sample size))].
As a result, we realize that the width of a 95% confidence interval is
4 * (sample std / np.sqrt(sample size)). Now, the sample
size is already constant, which was given to be 400. However, we can
attempt to maximize the sample std. It’s not hard to see
that the maximum std we could achieve is by recieving an equal number of
yes/no votes (aka 200 of each vote). Calculating the standard deviation
in this case is just 0.5*, and so the widest possible width for a 95%
confidence interval is just 4 * 0.5/np.sqrt(400) which
evaluates to 0.1.
*To make the calculation of the standard deviation faster, try to see why calculating the std of a dataset with 200 1’s and 200 0’s is the same as calculating the std of a data set with only a single 1 and a single 0.
The average score on this problem was 34%.
You need to estimate the proportion of American adults who want to be vaccinated against Covid-19. You plan to survey a random sample of American adults, and use the proportion of adults in your sample who want to be vaccinated as your estimate for the true proportion in the population. Your estimate must be within 0.04 of the true proportion, 95% of the time. Using the fact that the standard deviation of any dataset of 0’s and 1’s is no more than 0.5, calculate the minimum number of people you would need to survey. Input your answer below, as an integer.
Answer: 625
Note: Before reviewing these solutions, it’s highly recommended to revisit the lecture on “Choosing Sample Sizes,” since this problem follows the main example from that lecture almost exactly.
While this solution is long, keep in mind from the start that our goal is to solve for the smallest sample size necessary to create a confidence interval that achieves certain criteria.
The Central Limit Theorem tells us that the distribution of the sample mean is roughly normal, regardless of the distribution of the population from which the samples are drawn. At first, it may not be clear how the Central Limit Theorem is relevant, but remember that proportions are means too – for instance, the proportion of adults who want to be vaccinated is equal to the mean of a collection of 1s and 0s, where we have a 1 for each adult that wants to be vaccinated and a 0 for each adult who doesn’t want to be vaccinated. What this means (😉) is that the Central Limit Theorem applies to the distribution of the sample proportion, so we can use it here too.
Not only do we know that the distribution of sample proportions is roughly normal, but we know its mean and standard deviation, too:
\begin{align*} \text{Mean of Distribution of Possible Sample Means} &= \text{Population Mean} = \text{Population Proportion} \\ \text{SD of Distribution of Possible Sample Means} &= \frac{\text{Population SD}}{\sqrt{\text{Sample Size}}} \end{align*}
Using this information, we can create a 95% confidence interval for the population proportion, using the fact that in a normal distribution, roughly 95% of values are within 2 standard deviations of the mean:
\left[ \text{Population Proportion} - 2 \cdot \frac{\text{Population SD}}{\sqrt{\text{Sample Size}}}, \: \text{Population Proportion} + 2 \cdot \frac{\text{Population SD}}{\sqrt{\text{Sample Size}}} \right]
However, this interval depends on the population proportion (mean) and SD, which we don’t know. (If we did know these parameters, there would be no need to collect a sample!) Instead, we’ll use the sample proportion and SD as rough estimates:
\left[ \text{Sample Proportion} - 2 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}}, \: \text{Sample Proportion} + 2 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}} \right]
Note that the width of this interval – that is, its right endpoint minus its left endpoint – is: \text{width} = 4 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}}
In the problem, we’re told that we want our interval to be accurate to within 0.04, which is equivalent to wanting the width of our interval to be less than or equal to 0.08 (since the interval extends the same amount above and below the sample proportion). As such, we need to pick the smallest sample size necessary such that:
\text{width} = 4 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}} \leq 0.08
We can re-arrange the inequality above to solve for our sample’s size:
\begin{align*} 4 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}} &\leq 0.08 \\ \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}} &\leq 0.02 \\ \frac{1}{\sqrt{\text{Sample Size}}} &\leq \frac{0.02}{\text{Sample SD}} \\ \frac{\text{Sample SD}}{0.02} &\leq \sqrt{\text{Sample Size}} \\ \left( \frac{\text{Sample SD}}{0.02} \right)^2 &\leq \text{Sample Size} \end{align*}
All we now need to do is pick the smallest sample size that satisfies the above inequality. But there’s an issue – we don’t know what our sample SD is, because we haven’t collected our sample! Notice that in the inequality above, as the sample SD increases, so does the minimum necessary sample size. In order to ensure we don’t collect too small of a sample (which would result in the width of our confidence interval being larger than desired), we can use an upper bound for the SD of our sample. In the problem, we’re told that the largest possible SD of a sample of 0s and 1s is 0.5 – this means that if we replace our sample SD with 0.5, we will find a sample size such that the width of our confidence interval is guaranteed to be less than or equal to 0.08. This sample size may be larger than necessary, but that’s better than it being smaller than necessary.
By substituting 0.5 for the sample SD in the last inequality above, we get
\begin{align*} \left( \frac{\text{Sample SD}}{0.02} \right)^2 &\leq \text{Sample Size} \\\ \left( \frac{0.5}{0.02} \right)^2 &\leq \text{Sample Size} \\ 25^2 &\leq \text{Sample Size} \implies \text{Sample Size} \geq 625 \end{align*}
We need to pick the smallest possible sample size that is greater than or equal to 625; that’s just 625.
The average score on this problem was 40%.
Suppose you draw a sample of size 100 from a population with mean 50 and standard deviation 15. What is the probability that your sample has a mean between 50 and 53? Input the probability below, as a number between 0 and 1, rounded to two decimal places.
Answer: 0.48
This problem is testing our understanding of the Central Limit Theorem and normal distributions. Recall, the Central Limit Theorem tells us that the distribution of the sample mean is roughly normal, with the following characteristics:
\begin{align*} \text{Mean of Distribution of Possible Sample Means} &= \text{Population Mean} = 50 \\ \text{SD of Distribution of Possible Sample Means} &= \frac{\text{Population SD}}{\sqrt{\text{Sample Size}}} = \frac{15}{\sqrt{100}} = 1.5 \end{align*}
Given this information, it may be easier to express the problem as “We draw a value from a normal distribution with mean 50 and SD 1.5. What is the probability that the value is between 50 and 53?” Note that this probability is equal to the proportion of values between 50 and 53 in a normal distribution whose mean is 50 and 1.5 (since probabilities can be thought of as proportions).
In class, we typically worked with the standard normal distribution, in which the mean was 0, the SD was 1, and the x-axis represented values in standard units. Let’s convert the quantities of interest in this problem to standard units, keeping in mind that the mean and SD we’re using now are the mean and SD of the distribution of possible sample means, not of the population.
Now, our problem boils down to finding the proportion of values in a standard normal distribution that are between 0 and 2, or the proportion of values in a normal distribution that are in the interval [\text{mean}, \text{mean} + 2 \text{ SDs}].
From class, we know that in a normal distribution, roughly 95% of values are within 2 standard deviations of the mean, i.e. the proportion of values in the interval [\text{mean} - 2 \text{ SDs}, \text{mean} + 2 \text{ SDs}] is 0.95.

Since the normal distribution is symmetric about the mean, half of the values in this interval are to the right of the mean, and half are to the left. This means that the proportion of values in the interval [\text{mean}, \text{mean} + 2 \text{ SDs}] is \frac{0.95}{2} = 0.475, which rounds to 0.48, and thus the desired result is 0.48.
The average score on this problem was 48%.
It’s your first time playing a new game called Brunch Menu. The deck contains 96 cards, and each player will be dealt a hand of 9 cards. The goal of the game is to avoid having certain cards, called Rotten Egg cards, which come with a penalty at the end of the game. But you’re not sure how many of the 96 cards in the game are Rotten Egg cards. So you decide to use the Central Limit Theorem to estimate the proportion of Rotten Egg cards in the deck based on the 9 random cards you are dealt in your hand.
You are dealt 3 Rotten Egg cards in your hand of 9 cards. You then construct a CLT-based 95% confidence interval for the proportion of Rotten Egg cards in the deck based on this sample. Approximately, how wide is your confidence interval?
Choose the closest answer, and use the following facts:
The standard deviation of a collection of 0s and 1s is \sqrt{(\text{Prop. of 0s}) \cdot (\text{Prop of 1s})}.
\sqrt{18} is about \frac{17}{4}.
\frac{17}{9}
\frac{17}{27}
\frac{17}{81}
\frac{17}{96}
Answer: \frac{17}{27}
A Central Limit Theorem-based 95% confidence interval for a population proportion is given by the following:
\left[ \text{Sample Proportion} - 2 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}}, \text{Sample Proportion} + 2 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}} \right]
Note that this interval uses the fact that (about) 95% of values in a normal distribution are within 2 standard deviations of the mean. It’s key to divide by \sqrt{\text{Sample Size}} when computing the standard deviation because the distribution that is roughly normal is the distribution of the sample mean (and hence, sample proportion), not the distribution of the sample itself.
The width of the above interval – that is, the right endpoint minus the left endpoint – is
\text{width} = 4 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}}
From the provided hint, we have that
\text{Sample SD} = \sqrt{(\text{Prop. of 0s}) \cdot (\text{Prop of 1s})} = \sqrt{\frac{3}{9} \cdot \frac{6}{9}} = \frac{\sqrt{18}}{9}
Then, since we know that the sample size is 9 and that \sqrt{18} is about \frac{17}{4}, we have
\text{width} = 4 \cdot \frac{\text{Sample SD}}{\sqrt{\text{Sample Size}}} = 4 \cdot \frac{\frac{\sqrt{18}}{9}}{\sqrt{9}} = 4 \cdot \frac{\sqrt{18}}{9 \cdot 3} = 4 \cdot \frac{\frac{17}{4}}{27} = \frac{17}{27}
The average score on this problem was 51%.
Which of the following are limitations of trying to use the Central Limit Theorem for this particular application? Select all that apply.
The CLT is for large random samples, and our sample was not very large.
The CLT is for random samples drawn with replacement, and our sample was drawn without replacement.
The CLT is for normally distributed data, and our data may not have been normally distributed.
The CLT is for sample means and sums, not sample proportions.
Answer: Options 1 and 2
Option 1: We use Central Limit Theorem (CLT) for large random samples, and a sample of 9 is considered to be very small. This makes it difficult to use CLT for this problem.
Option 2: Recall CLT happens when our sample is drawn with replacement. When we are handed nine cards we are never replacing cards back into our deck, which means that we are sampling without replacement.
Option 3: This is wrong because CLT states that a large sample is approximately a normal distribution even if the data itself is not normally distributed. This means it doesn’t matter if our data had not been normally distributed if we had a large enough sample we could use CLT.
Option 4: This is wrong because CLT does apply to the sample proportion distribution. Recall that proportions can be treated like means.
The average score on this problem was 77%.
In figure skating, skaters move around an ice rink performing a series of skills, such as jumps and spins. Ylesia has been training for the Olympics, and she has a set routine that she plans to perform.
Let’s say that Ylesia performs a skill successfully if she does not
fall during that skill. Each skill comes with its own probability of
success, as some skills are harder and some are easier. Suppose that the
probabilities of success for each skill in Ylesia’s Olympic routine are
stored in an array called skill_success.
For example, if Ylesia’s Olympic routine happened to only contain
three skills, skill_success might be the array with values
0.92, 0.84, 0.92. However, her routine can contain any number of
skills.
Ylesia wants to simulate one Olympic routine to see how many times
she might fall. Fill in the function count_falls below,
which takes as input an array skill_success and returns as
output the number of times Ylesia falls during her Olympic routine.
def count_falls(skill_success):
falls = 0
for p in skill_success:
result = np.random.multinomial(1, __(a)__)
falls = __(b)__
return fallsAnswer: (a): [p, 1-p], (b):
falls + result[1] OR (a): [1-p, p], (b):
falls + result[0]
[p, 1-p]
in blank (a).result, with index 0 being how many times she succeeded
(corresponds to p), and index 1 being how many times she fell
(corresponds to 1-p). Since index 1 corresponds to the scenario in which
she falls, in order to correctly increase the number of falls, we add
falls by result[1]. Therefore, blank (b) is
falls + result[1].Likewise, you can change the order with (a): [1-p, p]
and (b): falls + result[0] and it would still correctly
simulate how many times she falls.
The average score on this problem was 59%.
Fill in the blanks below so that prob_no_falls evaluates
to the exact probability of Ylesia performing her entire routine without
falling.
prob_no_falls = __(a)__
for p in skill_success:
prob_no_falls = __(b)__
prob_no_fallsAnswer: (a): 1, (b):
prob_no_falls * p
prob_no_falls. This should be set to 1 because we’re
computing a probability product, and starting with 1 ensures the initial
value doesn’t affect the multiplication of subsequent
probabilities.prob_no_falls by multiplying it by each probability of
success (p) in skill_success. This is because
the probability of Ylesia not falling throughout multiple independent
skills is the product of her not falling during each skill.
The average score on this problem was 72%.
Fill the blanks below so that approx_prob_no_falls
evaluates to an estimate of the probability that Ylesia performs her
entire routine without falling, based on 10,000 trials. Feel free to use
the function you defined in part (a) as part of your solution.
results = np.array([])
for i in np.arange(10000):
results = np.append(results, __(a)__)
approx_prob_no_falls = __(b)__
approx_prob_no_fallsAnswer:(a): count_falls(skill_success),
(b): np.count_nonzero(results == 0) / 10000, though there
are many other correct solutions
count_falls(skill_success).np.count_nonzero(results == 0) / 10000.
The average score on this problem was 66%.
Bertie Bott’s Every Flavor Beans are a popular treat in the wizarding world. They are jellybean candies sold in boxes of 100 beans, containing a variety of flavors including chocolate, peppermint, spinach, liver, grass, earwax, and paper. Luna’s favorite flavor is bacon.
Luna wants to estimate the proportion of bacon-flavored beans produced at the Bertie Bott’s bean factory. She buys a box of Bertie Bott’s Every Flavor Beans and finds that 4 of the 100 beans inside are bacon-flavored. Using this sample, she decides to construct an 86\% CLT-based confidence interval for the proportion of bacon-flavored beans produced at the factory.
Let’s begin by solving a related problem that will help us in the later parts of this question. Consider the following fact:
For a sample of size 100 consisting of 0’s and 1’s, the maximum possible width of an 86\% CLT-based confidence interval is approximately 0.15.
Use this fact to find the value of z such that
scipy.stats.norm.cdf(z) evaluates to 0.07.
Give your answer as a number to one decimal place.
Answer: -1.5
The 86% confidence interval for the population mean is given by:
\left[ \text{sample mean} - |z| \cdot \frac{\text{sample SD}}{\sqrt{\text{sample size}}}, \ \text{sample mean} + |z| \cdot \frac{\text{sample SD}}{\sqrt{\text{sample size}}} \right]
Since the width is equal to the difference between the right and left endpoints,
\text{width} = 2 \cdot |z| \cdot \frac{\text{sample SD}}{\sqrt{\text{sample size}}}
We solve for |z|. The maximum width of our CI is given to be 0.15, so we must also use the maximum possible standard deviation, 0.5. we substitute the known values to obtain:
0.15 = 2 \cdot |z| \cdot \frac{0.5}{\sqrt{100}}
which leaves |z| = 1.5 after
computation. To find the z such that
scipy.stats.norm.cdf(z) evaluates to 0.07, we
realize that z is the point under the
normal curve, in standard units, left of which represents 7\% of the area under the entire curve. Note
that scipy.stats.norm.cdf(0) evaluates to 0.5
(Recall: half of the area is left of the mean, which is zero in standard
units). We must therefore take a negative value for z. Thus z =
-1.5.
The average score on this problem was 55%.
Suppose that Luna’s sample has a standard deviation of 0.2. What are the endpoints of her 86\% confidence interval? Give each endpoint as a number to two decimal places.
Answer: [0.01, \ 0.07]
Recall the formula for the width of an 86\% confidence interval:
\text{width} = 2 \cdot |z| \cdot \frac{\text{sample SD}}{\sqrt{\text{sample size}}}
where we found |z| = 1.5 in part (a). Instead of using the maximum sample SD, we will now use 0.2 and compute the new width of the confidence interval. This results in
\text{width} = 2 \cdot 1.5 \cdot \frac{0.2}{\sqrt{10}} = 0.06
Since this is a CLT-based confidence interval for the population mean, the interval must be centered at the mean. We compute the interval using the structure from part (a), which leaves
\left[ 0.04 - \frac{1}{2} \cdot 0.06, \ 0.04 + \frac{1}{2} \cdot 0.06 \right] = [0.01, \ 0.07]
The average score on this problem was 37%.
Hermione thinks she can do a better job of estimating the proportion of bacon-flavored beans, though she’ll need a bigger sample to do so. Hermione will collect a new sample and use it to construct another 86\% confidence interval for the same parameter.
Under the assumption that Hermione’s sample will have the same standard deviation as Luna’s sample, which was 0.2, how many boxes of Bertie Bott’s Every Flavor Beans must Hermione buy to guarantee that the width of her 86\% confidence interval is at most 0.012? Give your answer as an integer.
Remember: There are 100 beans in each box.
Answer: 25 boxes
Recall the formula for the width of an 86\% confidence interval:
\text{width} = 2 \cdot |z| \cdot \frac{\text{sample SD}}{\sqrt{\text{sample size}}}
where we must again use the fact that |z| = 1.5 from part (a). Here, we want a width that is no larger than 0.012, given that our sample SD remains 0.2. Plugging everything in:
0.012 \geq 2 \cdot 1.5 \cdot \frac{0.2}{\sqrt{n}}
Rearranging the expression to solve for n, we get
\begin{align*} n &\geq \left( \frac{3 \cdot 0.2}{0.012} \right)^2 \\ n &\geq \left( \frac{600}{12} \right)^2 \\ n &\geq (50)^2 \\ n &\geq 2500 \end{align*}
However, 2500 isn’t our final answer. The question asks for the number of boxes Hermione must buy, given that each box contains 100 beans. The bound we computed above for n corresponds to the minimum number of beans Hermione must observe. To get the minimum number of boxes, we simply divide the bound by 100. The final answer is 25 boxes.
The average score on this problem was 37%.