Bad Samples Some samples are bad in the sense that the method used to collect the data dooms the sample, so that it is likely to be somehow biased. That is, it is not representative of the population from which it has been obtained. The following definition refers to one of the most common and most serious misuses of statistics.
Defintion
A
voluntary response sample (or
self-selected sample) is one in which the respondents themselves decide whether to be included.
Caution
Do not ever use voluntary response sample data for making conclusions about a population
[collapse=Example1]
Voluntary Response Sample Newsweek magazine ran a survey about the Napster Web site, which had been providing free accesss to downloading copies of music CDs. Readers were asked this question: "Will you still use Napster if you have to pay a fee?" Readers could register their responses on the Web site newsweek.msnbc.com. Among the 1873 responses recieved, 19% said yes, it is still cheaper than buying CDs. Another 5% said yes, they felt more comfortable using it with a charge. When newsweek or anyone else runs a poll on the Internet, individuals decide themselves whether to participate, so they constitute a voluntary response sample. But the people with strong opinions are more likely to participate, so it is very possible that the responses are not representative of the whole population.[/collapse]
[collapse=Example2]
Why was the Literary Digest poll so wrong?
Founded in 1890, the Literary Digest magazine was famous for its success in conducting polls to predict winners in presidential elections. The magazine correctly predicted the winners in the presidential elections of 1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest between Alf Landon and Franklin D. Roosevelt, the magazine sent out 10 million ballots and received 1,293,669 ballots for Landon and 972,897 ballots for Roosevelt, so it appeared that landon would capture 57% of the vote. The size of this poll is extremely large when compared to the sizes of other typical polls, so it appeared that the poll would correctly predict the winner once again. James A. Farle, Chairman of the Democratic National Committee at the time, praised the poll by saying this: "Any sane person cannot escape the implication of such a gigantic sampling of popular opinion as is embraced in The Literary Digest straw vote. I consider this conclusive evidence as to the desire of the people of this country for a change in the National Government. The Literary Digest poll is an achievement of no little magnitude. It is a poll fairly and correctly conducted." Well, Landon received 16,679,583 votes to the 27,751,597 votes cast for Roosevelt. Instead of getting 57% of the vote as suggest by Literary Digest poll, Landon received only 37% of the vote. The results for Roosevelt are shown in Figure 1-1. The Literary Digest magazine suffered a humiliating defeat and soon went out of business.
In that same 1936 presidential election, George Gallup used a much smaller poll of 50,000 subjects and he correctly predicted that Roosevelt would win. How could it happen that a larger Literary Digest poll could be so wrong by such a large margin? What went wrong? As you learn the basics of statistics in this chapter, we will return to the Literary Digest poll and explain why it was so wrong in predicting the winner of the 1936 presidential contest.
What went wrong in the Literary Digest poll? Literary Digest Magazine conducted its poll by sending out 10 million ballots. The magazine received 2.3 million responses. The poll results suggested incorrectly that Alf Landon would win the presidency. In his much smaller poll of 50,000 people, George Gallup correctly predicted that Franklin D. Roosevelt would win. The lesson here is that it is not necessarily the size of the sample that makes it effective, but it is the sampling method. The Literary Digest ballots were sent to magazine subscribers as well as to registered car owners and those who used telephones. On the heels of the Great Depression, this group included disproportionately more wealthy people, who were Republicans. But the real flaw in the Literary Digest poll is that it resulted in a voluntary response sample. Gallup used an approach in which he obtained a representative sample based on demographic factors. (Gallup modified his his methods when he made a wrong prediction in the famous 1948 Dewey/Truman election. Gallup stopped polling too soon, and he failed to detect a late surge in support for Truman.) The Literary Digest poll is a classic illustration of the flaws inherent in basing conclusions on a voluntary response sample.[/collapse]
These are common examples of volutary response samples which, by their very nature, are seriously flawed because we should not make conclusions about a population based on such a biased sample:
- Polls conducted through the Internet in which subjects can decide whether to respond
- Mail-in-polls, in which subjects can decide whether to reply
- Telephone call-in polls, in which the newspaper, radio, or television announcements ask that you voluntarily call a special number to register your opinion
With such voluntary response sample, we can only make valid conclusions about the specific group of people who chose to participate, but a common practice is to incorrectly state or imply conclusions about a larger population. From a statistical viewpoint, such a sample is fundamentally flawed and should not be used for making general statements about a larger population.