6. Estimating with Confidence
Point estimate, Confidence Intervals, Margin of Error, Differences in proportion
1. Confidence Interval Basics
Say you meet someone saying "We are 90% confident that the true proportion of all of history's NFL Kickers who were 29+ years old at their prime was from 0.61 to 0.82."
The "0.61-0.82" is the confidence interval itself, although it is more proper if we write it as (0.61, 0.82). This represents a set of plausible values in our results based on a sample. Plausible does not really mean possible, because, in reality, any value could be possible. Plausible in statistics is more like "expected," or "not surprised we got them." So, in this case, our boundaries of plausibility are from 61% to 82% of the NFL Kickers.
The point estimate is the central value in that boundary. We can get this by adding the lowest and highest value of the confidence interval and dividing by 2. So in this case, our point estimate is (.61 + .82) / 2 = .715.
The margin of error can be defined by the maximal estimated distance that the estimate differs from the actual population value. We can calculate this by taking our highest end of the confidence interval and subtracting it by the point estimate. So in this case, the margin of error is (.82 - .715) = 0.105. We can alternatively take our point estimate and subtract it by our lowest end in the confidence interval, which is (.715 - .61) = .105. We get the same answer in both.
There are multiple ways to represent a confidence interval. Alternatively, if we find the point estimate and the margin of error, then we can represent it in another way with the formula: point estimate ± margin of error.
This equals to 0.715 ± 0.105. Both ways are correct.
Note: The margin of error can be affected primarily by 2 factors. The first is that the larger your population or sample is, the lower your margin of error will be. One reason is that with more subjects in the study, it will be easier to negate outliers or those that don't follow the overall pattern by the law of large numbers. This ultimately means that our sample size will resemble more of the true population. Secondly, decreasing your confidence level will decrease your margin of error. Think about it like this. Let's say you are 100% confident that the data will fall into a range. Of course, this range will be from negative infinity to positive infinity, since if you are 100% confident, then the value must be in the range you mentioned. If you are 75% confident now that the data will fall into a range, then you know that it might not fall into the range. This makes our confidence interval smaller so that it is possible for data to fall outside the range. And, a smaller confidence interval makes for smaller margins of errors, since the point estimate will remain the same regardless of changes.
2. Constructing a Confidence Interval
Going back, the formula for a confidence interval is point estimate ± margin of error.
But, how exactly do we get the point estimate and the margin of error if the problem does not give it to us?
The point estimate is more precisely called the statistic, which in this context of sampling will be p̂. We can get this by dividing the subjects who "qualify" by the total sample size. (This will make more sense once we go into our example)
The margin of error = (critical value) x (standard deviation of statistic)
The "critical value" in this case is purely based on how confident in % we are, and we can find this easily by searching it up. For example, the critical value for a 95% confidence interval is 1.96.
The standard deviation is called standard error when we make estimations. Regardless, they both follow the same format, with a minor change in the variable appearances. Standard deviation = √[p(1-p)/n], whereas standard error = √[p̂(1-p̂)/n]
Conditions:
1. Random- The sample was randomly chosen
2. Independence- the sample size is less than or equal to 10% of the population size.
3. Large Counts- np̂ AND n(1-p̂) ≥ 10
*If at least one of these conditions is broken, (will elaborate on lesson #7) then we have to interpret the data with much caution.*
Now Let's Apply Fantasy Football!
Example #1
Prompt: "We are 95% confident that the true proportion of rookies who have played at least 2 years of college is 0.31 to 0.40 ." In the context of this *made-up* statement, what is the point estimate? What is the margin of error? How else can we properly represent the confidence interval?
Work/Explanation:
Point estimate = (0.31 + 0.40) / 2
=0.355
Margin of error = 0.40 - 0.355
= 0.045
Confidence Interval = point estimate ± margin of error, so it can be represented as 0.355 ± 0,045. It can also be (0.31, 0.40). Both ways are correct.
Answer: At a 95% confidence level with this *made-up* statement and confidence interval mentioned above, the point estimate is 0.355, the margin of error is 0.045, and the confidence interval can either be represented as 0.355 ± 0,045 or (0.31, 0.40).
Example #2
Prompt: Construct the 95% confidence interval for the true proportion of runningbacks in a population of all positions that averaged at least 13 PPR points per game from 2018-2023. The population size was 541 players, and a random sample was conducted with a sample size of 54 players. 17 of them were Runningbacks
Work/Explanation: The first that we do is figure out what p̂ is. (We know that it is p̂ instead of p because we are using a sample, which means estimation. And that also means that we are labeling our margin of error as a standard error instead of a standard deviation.) Let's look at what the prompt told us. 17 of our sample are runningbacks, and the sample size is 54. Therefore, p̂ = 17/54 = 0.31. Now that we have p̂, we can proceed with our equation.
Confidence Interval = point estimate ± margin of error
= statistic ± (critical value) x (standard error)
= p̂ ± 1.96 x √[p̂(1-p̂)/n]
= 0.31 ± 1.96 x √[0.31(1-0.31)/54]
= .31 ± .12
= (.18 - .43)
Answer: We are 95% confident that the true proportion of runningbacks in a population of all positions that averaged at least 13 PPR points per game from 2018-2023 is from 0.18 to 0.43.
End of Example
3. Constructing a Confidence Interval for a Difference of Proportions
Imagine this scenario. In a large public university, you are studying the demographics of 2 grade levels: freshmen and sophomores. You take a random sample of 100 freshmen and find that the proportion of freshmen that play a sport is 0.28. On the other hand, the sample of 100 sophomores proportion that play a sport is 0.29. Most people would think that this means that the true proportion of freshmen and sophomores that play a sport is different from each other just because 0.28 is different from 0.29. However, that's not how statisticians view it. One way they can assess if there really is a difference or not is to construct a confidence interval for a difference.
Now, we have 2 samples, not just one. This is the formula for the difference in proportions using a confidence interval. The subscripts 1 and 2 differentiate our samples. For example, freshmen could be #1. and sophmores are #2. The z* is just another way of representing the critical value, which depends based on our confidence level. The n is our sample size.
Now, let's solve this. But first, let's summarize our scenario. We have a random sample of 100 freshmen, n1, and a random sample of 100 sophmores, n2. The proportion of freshmen that play a sport in this sample is 0.28, p̂1, while for sophmores, it is 0.29, p̂2. Let's use a 95% confidence interval for this problem, z* = 1.96.
The #1 rule is that if your confidence interval has a 0 in it, then there is no convincing evidence of a difference at the designated confidence interval. In our case with our 95% confidence interval, we do have a 0 in it. Therefore, there is no convincing evidence of a difference in the true proportion of freshmen and sophomores who play a sport in the large university we looked at.
And we also should not forget about conditions. Since we have 2 samples, we need to check every condition twice.
Random- Conducted a random sample of freshmen and a random sample of sophomores. This is met.
Independence- Our sample had 100 freshmen. There are likely at least 1000 freshmen in this large university. Our 2nd sample had 100 sophomores. Likewise, it is likely that there is at least 1000 sophomores in the university. This is met.
Large Counts- Remember the formula is np̂ AND n(1-p̂) ≥ 10. For freshmen, 100(0.28), 100(1-0.28) ≥ 10. For sophmores, 100(0.29), 100(1-0.29) ≥ 10. This is met.
All 3 conditions are met.
Now Let's Apply Fantasy Football!
Example #1
Prompt: The sample proportion, p̂1, of Quarterbacks in a population of all positions that averaged at least 13 PPR points per game from 2018-2023 was 0.389. The sample size (randomly selected) was 54 players in a population of 541. For 2013-2017 though, the sample proportion, p̂2, of this same metric was 0.50. The sample size (randomly selected) was 26 players in a population of 262. Construct a 95% confidence interval for the difference in the true proportion of Quarterbacks in a population of all positions that averaged at least 13 PPR points per game from the 2018-2023 seasons, and those in 2013-2017. With this confidence interval, then explain if there is convincing evidence of a difference or not.
Work/Explanation:
We will begin with checking for conditions for both of our samples.
Random: Both samples were randomly selected. This is met.
Independence: For the 2018-2023 sample, 54 ≤ .10(541). For the 2013-2017 sample, 26 ≤ .10(262). This is met.
Large counts: For the 2018-2023 sample, 54(0.389), 54(1-0.389) ≥ 10. For the 2013-2017 sample, 26(0.50), 26(1-0.50) ≥ 10. This is met.
All conditions are met.
Now, to get ready for our equation let's define our variables.
It comes to preference, but for me, since the 2018-2023 sample was mentioned first in the prompt, its subscript will be 1.
Therefore, p̂1 = 0.31, n1 = 54.
And because the 2012-2017 sample was mentioned later, its subscript will be 2.
Therefore, p̂2 = 0.50, n2 = 26
Now, we can use our formula and plug in values.
Answer: The 95% confidence interval for the difference in the true proportion of Quarterbacks in a population of all positions that averaged at least 13 PPR points per game from the 2018-2023 seasons, and those in 2013-2017 is (-0.343, 0.121). Because 0 is a plausible number in our interval, there is no convincing evidence for a difference between the true proportions of these two.
End of Example
2 Takeaways for Fantasy Football
1. The more confident you are, the less precision
This is what makes statistics tricky. If you want to be sure of something, you have to sacrifice specificity. If you are okay with some errors, then you can be more precise. It is all about how much you are willing to risk being wrong, and this can have major implications on Fantasy Football. Every player decides how risky or how safe they will be as they draft.
2. Seek for convincing evidence
There is so much media about differences in proportions that are taken out of context and are overblown in terms of their magnitude. However, until there is actual analysis done on these differences, we should not overreact. Don't always overconsider if a group has better stats in one category as that may just be out of a coincidence depending on the other group.