In my little treatise on statistics, I didn't spend much time on opinion polling. Primarily that's because, while I have a lot of experience in the realm of physical measurement statistics, I have a lot less in the area of opinion surveys. That being said, I do have a good bit of experience with one type of opinion poll, and that is the product evaluation.
Now there are differences in opinion polling and product evaluation, but, generally speaking, both types ask for subjective judgments about something, which could be how good a job the President is doing or how good a shave you got from an unidentified razor.
I use the razor as an analogy because that's the sort of product testing I did, so I can speak with experience.
Product testing and opinion polling both depend on two things: the population being polled and the nature of the poll questions. In addition, as we shall see, there is the matter of how the responses are grouped.
In our testing, we selected a random group of shavers. Sort of. We chose our mailing list from a list of members of the American Society for Quality Control (as the American Society for Quality was known then). We chose that group because we waned people who would be more likely to seriously evaluate the product. In thinking back over the masses of data I saw, that probably was the case. Most tests were comparison tests, where they received our product and that of a competitor. They were asked to rate each based on a set number of uses then pick which they thought was better.
You know those commercials where they say, "Four out of five people loved our product"? This is just that sort of test, and believe me, if you're going to make that claim, you'd better have some good looking data to support it, because the Feds just love to occasionally call your bluff. Now avoiding doing time or at least paying hefty fines is pretty much based on how you do your data.
For example, "Four out of five people couldn't tell the difference" isn't the same as "Four out of five thought we were better." There are a lot of ways to ask the tester questions that can give you the answer you want. All of them may be used in a particular survey, but only the one that gives you the desired result is used. For example:
- Rate the product as follows: Poor, Fair, Good, Very Good, Excellent (this can be done for various characteristics)
- Rate product A against B: A is somewhat better, A is much better, B is better, B is somewhat better, No Differnece
- Which product would you buy?
There's also the business of grouping results. It's typical to group Poor and Fair as well as Very Good and Excellent. So we really only have three categories, but I have seen situations where Good, Very Good, and Excellent were all combined. When comparing the results of two products, one of which is okay the other of which performs very well, the aggregated groups data might show no statistical significance because the "okay" product's "Good" results carry as much weight as the better product's "Very Good" and "Excellent" results.
And that's where the problems come from.
When you hear about competing surveys for, say, political candidates where one says Smith is going to win and the other says Jones is a shoo-in, there are several things that can be happening. One survey may say, "Which candidate can do a better job?" while the other says "Which candidate will you vote for?" Those are two different questions, because factors beyond how good an alderman the candidate is come into play. A person might think a female candidate would do a better job than a man, but the gender angle would sway that person to vote for the man. Or someone might be a "straight ticket" voter, so it doesn't matter how good one candidate is relative to another. That person will always vote for a Democrat, a Republican, a conservative, or a liberal regardless of ability.
Or when you hear that people "found no difference" between low-priced product A and high-priced product B, you might be seeing grouping in action. In fact, you can see it in interactive surveys. I am frequently asked to complete surveys on the performance of technical support people by Dell, among others. What I've noticed is that if I rate something 7 or above (0 being bad and 10 being great), the survey goes on to the next question of set of questions. If I rate something 5 or lower, I'll be asked a question about what was wrong or what could be improved (the reaction to a six varies with the type of question). So basically, it doesn't matter whether I rate something 7, 8, 9, or 10, or if I rate it 1, 2, 3, 4, 5. Those categories are essentially clumped together.
Then there's the whole business of populations. As noted, our sample was from a large number of quality professionals. This means they would tend to be better educated and possibly higher wage earners than the average person. In our tests, though, all that mattered is that we had a variety of ages and hair types (we had male and female testers). We also kept track of test results by user to weed out those who didn't ever seem to find differences. The good thing is that we could compare historic data because the nature of the test group was very consistent.
One of the big problems with polls is that you can't be sure of the nature of the sample. If the poll today is top-heavy with high income conservatives compared to the last survey which had a larger blue-collar component, then saying that a Democratic candidate is doing worse today may not mean anything.
So, if you're going to evaluate a product claim or a popularity poll, you won't be able to do it without understanding how the data was collected, how it was grouped, and what the makeup of the population was. And, most of the time, you can't get any of that information without going to a lot of trouble. That is why I say that I don't put much stock in polls. You're better off making up your own mind.
That way, at least, you know who's responsible for the decision.