Day 75: SINning in Stats
Today we begin the journey to Inferential Statistics in my AP Statistics class. And I use the development of the sampling distribution of a statistic as a means to set up good habits as well as develop the conceptual understanding of why we need to check conditions for the elements of the sampling distribution: Center, Spread, and Shape.
I found that students really struggled with all of the apparently different conditions for the various inference methods we study in second semester. I really wanted to streamline the process of checking the conditions. After looking at and comparing the various assumptions and conditions, I realized that the two sampling distributions used in the inference procedures about proportions or means boiled down to two things: a random element in the data collection method (simple random sample or randomized experiment), and sample size where one needs both a large enough and a small enough sample to determine the standard deviation and shape. In addition, I needed to help my students understand the difference between assumptions and conditions.
Here is a summary of what I found:
Independence Assumption: The sampled values must be independent of each other
The Sample Size Assumption: The sample size, n, must be large enough
Assumptions are hard—often impossible—to check. Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. The corresponding conditions to check before using the Normal to model the distribution of sample proportions or means are the Randomization Condition, 10% Condition and the Success-Failure Condition/Large Enough Sample Size Condition.
Conditions you can check:
Randomization Condition: The data must be representative of the population. (That is, it must come from a randomized experiment, or from a simple random sample of the population; the sampling method must be unbiased.)
10% Condition: The sample size, n, must be no larger than 10% of the population.
Success/Failure Condition: The sample size must be large enough so that we can expect at least 10 “successes” and 10 “failures”. That is, np > 10 and nq > 10 OR Large Enough Sample Condition: If the population is unimodal and symmetric, any size sample is sufficient. Otherwise, a larger sample is needed.
I also wanted it have some kind of cognitive framework to fit these ideas. How could I combine the ideas into chunks that include the essence of the assumptions and conditions? Well, Random made its own sense, but Independence and the 10% condition were intertwined, and the approximately Normal shape was tied to having a large enough size sample via success vs fails or simply a larger sample was better.
Acronyms are great memory devices, especially when first learning about something new and complex. The acronym is a simple organizational tool that reminds the user of the complex ideas. My next thought was what kind of acronym could I come up with for these? RIN (for random, independence and normal)..but there was no real hook or interesting connection with RIN. How about SIN where S stood for random Sample…but that was somewhat suspect because not all inference is about sampling, so I did stretch it a little to say SRS or random assignment (they both sound essy). And I use the catchy phrase, “It is a SIN to not check the conditions.” And I have found that my kiddos don’t forget to check them….phewww!
Here is an example of how we talk through these ever important assumptions and conditions. As the year progresses, we fine-tune and focus in on the important distinctions, but the acronym SIN and these three words give us a simple framework to talk about the distinctions. To set the groundwork for next semester, I have students recognize the assumptions/conditions AND what the condition guarantees.