When consulted about what statistical analyses to use, or to pay attention to, I almost always deliver the ‘keep it simple smarty’ message.
Why?
1. Because more people will be able to understand (and isn’t that the point?)
2. Because everyone will be able to more quickly understand (and attention spans are short!)
3. Because usually simple describes or explains the data just as well as more complicated
4. Over complicating statistical methods can lend themselves to misleading or mis-interpreted results
My current favorite example is a very simple one — In higher ed I see a lot of use of means and standard deviations in reporting of data from scales, such as those in assessment rubrics (e.g. student work exceeds, meets, approaching, or is below standard). With this type of data advocate for looking first at frequencies or percentages. It may seem like the mean is simpler, because it’s one number versus several, but it’s much more difficult to interpret in many cases. Here’s an example:
Fake results from fake student writing assessment (n = 12)
Version 1: Applies numbers to the example rubric — exceeds = 4, meets = 3, approaching = 2, below = 1
Grammar: mean score 3.3 (standard deviation .75)
Sentence Structure: mean score 2.8 (standard deviation .83)
Version 2:
Grammar |
Sentence Structure |
|
below |
0% |
0% |
approaching |
17% |
42% |
meets |
42% |
33% |
exceeds |
42% |
25% |
Which makes more immediate sense to you? Can you see the differences more easily via the percentages than via the means?*
*There are certainly statisticians who disagree with use of mean and standard deviation outright with these types of variables. While I don’t come down hard and fast there, I do think it’s important to consider what you’re trying to learn when choosing just what numbers to pay attention to. In most cases, I find colleagues are looking to see more nuanced changes over time, or what the range of scores looks like, which a mean and standard deviation can’t alone provide.
While this is a very simple example, I think the same goes for choices between more complicated statistical approaches. I’m not saying that we shouldn’t use advanced statistical methods, only that if a simple option would net sufficiently explanatory results, why over-complicate things?
What do you think? When is it worth choosing complicated over simple? How can we simplify our explanations of more complicated analyses to make them more palatable, while still accurate and authentic?