Basics of Confidence Intervals

AP Stats ~ Lesson 8A: Confidence Intervals
OBJECTIVES:
DETERMINE the point estimate and margin of error from a confidence interval.
INTERPRET a confidence interval in context.
INTERPRET a confidence level in context.
DESCRIBE how the sample size and confidence level affect the length of a confidence
interval.
EXPLAIN how practical issues like nonresponse, undercoverage, and response bias can
affect the interpretation of a confidence interval.
If you had to give one number to estimate an unknown population parameter, what would it
be?
If you were estimating a population mean 𝜇, you would probably use 𝑥. If you were
estimating a population proportion 𝑝, you would probably choose 𝑝, because they are
usually considered to be unbiased estimators of the population. In both cases, you are
providing a POINT ESTIMATE of the parameter of interest.
A POINT ESTIMATOR is a statistic that provides an estimate of a population parameter. The
value of that statistic from a sample is called a POINT ESTIMATE.
An ideal point estimator will have no bias, and will have low variability. Since variability is
almost always present when calculating statistics from different samples, we must extend
our thinking about estimating parameters to include an acknowledgement that repeated
sampling could yield different results.
Example: In each of the following settings, determine the point estimator you would use and calculate the
value of the point estimate.
(a) The makers of a new golf ball want to estimate the median distance the new balls will travel when hit by a
mechanical driver. They select a random sample of 10 balls and measure the distance each ball travels after
being hit by the mechanical driver. Here are the distances (in yards):
285
286
284
285
282
284
287
290
288
285
(b) The golf ball manufacturer would also like to investigate the variability of the distance travelled by the golf
balls by estimating the interquartile range.
(c) The math department wants to know what proportion of its students own a graphing calculator, so they take
a random sample of 100 students and find that 28 own a graphing calculator.
So, how close is our point estimator going to be to the actual parameter? How will our
sample means or proportions vary if we took many, many SRS’s?
Think about what we know.
• We know that the sampling distribution of 𝑥, describes how the values of 𝑥 vary in
repeated samplings.
• We know that the SHAPE of the sampling distribution mimics that of the population
distribution, so if the population is Normal, our sampling distribution will be, too.
• We know that the mean of the sampling distribution is the same as the unknown
population mean (it’s an unbiased estimator).
• We know that the standard deviation gets smaller as the sample size gets larger.
It stands to reason, then, that even if we don’t know the true mean or standard deviation of
the population, if we take repeated samples, then the mean and standard deviation of the
sampling distribution will be the same, or almost the same as that of the population.
This leads us to our “Big Idea”.
The sampling distribution of 𝑥 tells us how close to 𝜇 the sample mean 𝑥 is likely to
be. We can use this information to construct a CONFIDENCE INTERVAL (sometimes called an
interval estimate). All confidence intervals we construct will have a form similar to this:
point estimate ± margin of error
The point estimate can be 𝑥 or 𝑝. It is our best guess for the unknown population parameter
(𝜇 or 𝑝). The margin of error shows how close we believe our guess is, and is based on the
variability of our estimate.
A C% confidence interval gives an interval of plausible* values for a parameter. The
interval is calculated from the data and has the form
point estimate ± margin of error
The difference between the point estimate and the true parameter value will be less than
the margin of error in C% of all samples.
The confidence level C gives the overall success rate of the method for calculating the
confidence interval. That is, in C% of all possible samples, the method would yield an
interval that captures the true parameter value.
* Note: Plausible does not mean the same thing as possible! Just about any value of a parameter is possible, but based
on our data, the values in our interval are reasonable or believable values of our parameter.
The confidence level is the overall capture
rate if the method is used many times.
The sample mean will vary from sample
to sample, but when we use the method
estimate ± margin of error to get an
interval based on each sample, C% of
these intervals capture the unknown
population mean µ.
Interpreting Confidence Intervals and Confidence Levels
THIS IS A PHRASE TO MEMORIZE!!!!
Example: A large company is concerned that many of its employees are in poor physical condition, which can
result in decreased productivity. To determine how many steps each employee takes per day, on average, the
company provides a pedometer to 50 randomly selected employees to use for one 24-hour period. After
collecting the data, the company statistician reports a 95% confidence interval of 4547 steps to 8473 steps.
(a) Interpret the confidence interval.
(b) What is the point estimate that was used to create the interval? What is the
margin of error?
(c) Recent guidelines suggest that people aim for 10,000 steps per day. Is there convincing evidence
that the employees of this company are not meeting the guideline, on average? Explain.
The confidence level tells us how likely it is that the method we are using will produce an
interval that captures the population parameter if we use it many times.
The confidence level does not tell us the chance that a particular confidence
interval captures the population parameter.
Instead, the confidence interval gives us a set of plausible values for the parameter.
We interpret confidence levels and confidence intervals in much the same way whether
we are estimating a population mean, proportion, or some other parameter.
Let’s be sure we’ve got it. There are only 2 possibilities when discussing confidence levels:
Our sample may be one of the 95% of samples that contain the population mean, or else it’s
(unhappily) one of the 5% that doesn’t.
We cannot know whether our sample is one of the 95%, so by saying we’re 95% confident,
what we’re saying is that the method we are using gives correct results 95% of the time.
The chance of our getting a confidence interval that captures the true parameter is NOT
95%. Instead, we have a 95% chance of getting an sample mean that’s within 2 standard
deviations of the mystery parameter. After we actually construct the confidence interval,
the probability that it captures the population parameter is either 1 or 0.
Example: According to the American Community Survey, a 95% confidence interval for the
median household income in Texas during the years 2009–2011 is $58,929 ± $218.
<http://www.census.gov/hhes/www/income/data/statemedian/>
Interpret the confidence interval and the confidence level.
Constructing Confidence Intervals:
What if we want have a greater than 95% confidence level (like 99%)? Or a less than 95% confidence level
(like 90%)? What will happen to our confidence interval?
We’ve already determined that our confidence interval is found by the point estimate ± margin of error.
This leads to a more general formula for confidence intervals:
statistic ± (critical value) • (standard deviation of statistic)
The CRITICAL VALUE (sometimes referred to as z*) is a multiplier that makes the interval wide enough to
capture the desired percentage. The critical value depends both on the confidence level C and the
sampling distribution of the statistic. (These critical values are based on the number of standard
deviations away from the mean. For example, our 68-95-99.7 rule states that 95% of our data is within 2
standard deviations away from the mean. We will be a bit more precise, and use z*=1.96 when we want a
95% confidence level.)
Properties of Confidence Intervals:
•The “margin of error” is the
(critical value) • (standard deviation of statistic)
•The user chooses the confidence level, and the margin of error follows from this choice.
•The critical value depends on the confidence level and the sampling distribution of the
statistic.
•Greater confidence requires a larger critical value.
•The standard deviation of the statistic depends on the sample size n.
Here are two important cautions to keep in mind when constructing and interpreting
confidence intervals.
• Our method of calculation assumes that the data come from an SRS of size n from the
population of interest. While other types of sampling may be preferable, we cannot use
the data from them in this setting.
• The margin of error in a confidence interval covers only chance variation due to random
sampling or random assignment! Remember that the way a survey or experiment is
conducted may influence our results. Usually not in a good way.
Homework:
page 488: #1-19 odds, 20-25 all
Read pp. 492-504