logo
HomeTrainingPublicationsUsability ToolsAbout
Evidence-Based Information, Training and Tools for Optimizing the Usability of Computer Systems

The Usability of Punched Ballots

by Dr. Bob Bailey

December, 2000

 

Improving usability

Theresa LePore, the supervisor of elections in Palm Beach County Florida, has received much criticism for the ballot she designed for this year's presidential election. Actually she made several good decisions. For example, she attempted to improve the ballot for older voters by making the characters larger. Also, she wanted to have all presidential candidates on one page. Her solution was to use what has become known as the "butterfly ballot."

Actually, to ensure adequate reading performance, she should have focused on at least five issues:

  • font size,
  • font type,
  • text versus background color,
  • the light level where the voting occurred, and
  • the overall layout of the ballot.

Font size – For the majority of voters, a font size of 10 points would have been satisfactory. Most books are printed using type that is 10 or 11 points (a "point" is 1/72 of an inch). To accommodate older users, however, the research suggests that the characters should have been at least 12 points (maybe even 14 points). It is acceptable to use smaller fonts sizes when users can move closer to the text (or move the text closer to them) in order to make the image in the eye (angle subtended on the retina) larger.

Using all uppercase letters, which she elected to do, made the characters slightly larger for users. The most recent research on using uppercase versus lowercase letters for names shows clearly that there is no reliable difference between them in reading performance.

Font type – She used a "sans serif" font for the names. This decision was acceptable. There is one study that suggests that people over age 60 read "serif" fonts faster than sans serif fonts. In this case, the speed of reading is not as important as the accuracy of reading. Florida law allows each voter five minutes in the voting booth.

Text vs. background – The fastest and most accurate readability comes from using black text on a white background. This is what she did. The ballot appears to be black print on "white" card stock.

Illumination level – We do not know about the illumination level where the votes were cast. One recent research study found that in 71% of over 50 different "public places" in Florida, the light level was too low for adequate reading. Older adults need more illumination in order to see well.

In general, because the main usability issue was reading accurately , not reading quickly, Ms. LePore did an adequate job of dealing with these basic human factors issues.

Layout of the ballot

The issues surrounding the layout of the ballot are much more difficult to deal with, and are not nearly as easy to detect and resolve. It is difficult, even for usability experts, to identify some types of layout and formatting problems. For this reason, usability professionals make considerable use of usability tests.

Usability testing – Usability tests are intended to identify and correct problems before products are used by large numbers of users. In Ms. LePore's case, she would be interested in finding and fixing most of the serious problems voters would have on Election Day. In her case, a usability test would require several people pretending to vote while using the proposed ballot. While voting, these test participants would be observed by experienced usability testers. The testers would note and record any difficulties that the "voters" appeared to be having.

After voting, the participants would be individually interviewed about any concerns they had or any problems they may have experienced. This information would be used to change the ballot, and then a second round of usability testing would take place. Sometimes it takes three to five (or more) iterations (design, test, redesign) to achieve the desired outcome, i.e., to meet the performance goals for the ballot.

The Buchanan problem

Pat Buchanan got 3,411 machine-counted votes for president in this heavily Democratic county (62% voted for Al Gore and 35% voted for George Bush). The number of votes for Buchanan was higher than he received in any other Florida county. One explanation for the large number of votes related to the way Palm Beach County's punch-card style ballot was laid out for the presidential race. Candidates were listed on both sides of the front page in a vertical row of holes where the voters punched their choices. The top hole was for Bush, listed at top left; the second hole was for Buchanan, listed at top right, and the third hole was for Gore, listed under Bush on the left. The layout is shown below.

butterfly ballot

Informal evaluations – Theresa LePore designed the ballot and then had it reviewed. Her usability testing, however, was limited in its scope. It initially consisted of seeking approval by two other members of the canvassing board of which she was a member. These two evaluators were intelligent, and highly experienced in conducting elections – one was a county commissioner (Carol Roberts) and the other was a judge (Charles Burton). Even so, the probability of one or the other of these two people detecting the "Buchanan" problem by simply looking at the ballot was very low. I calculated it as being about two chances in 100.

Ms. LePore then sent the ballot to both the Democratic and Republican National Committees for review. If we assume that the two groups had a total of ten people look at the ballot, the probability that one or more people in this group would have found the "Buchanan" problem was also low. I calculated that they had about one chance in ten of finding the problem. Obviously, none of these reviewers identified the "Buchanan" problem.

Number of test participants

Ms. LePore was not familiar with usability testing, but neither are many other highly experienced designers. For example, shortly after the Florida voting issue became known, one highly experienced system developer wrote: "Would usability testing (which often only uses 5-20 people of each background) have caught it? I think so." He links users to Jakob Nielsen's Web site, where Nielsen has suggested that "100% of usability problems can be found using only 15 subjects." Neither is correct in their estimates of the number of test subjects needed.

How many usability test participants would have been required for Ms. LePore to feel confident of finding these types of problems?

This answer can be calculated.* If the voters in Palm Beach county voted for Buchanan at the same rate as those in the other Florida counties, Buchanan would have received around 600 votes, instead of 3,407. Many have proposed that this suggests that about 2,800 votes (3,400 minus 600) were erroneously made. We do not know for sure – the votes may have been correctly made for Buchanan. Keep in mind that Buchanan received over 8,000 votes in Palm Beach County in the 1996 presidential primary when he was running against Bob Dole.

For our purposes, we will assume that the "Buchanan" problem was only a difficulty for about 1% of all the voters (2,800 "erroneous" votes divided by the 269,951 actual and potential Gore voters). My calculations show that Ms. LePore would have required 289 test participants to find 95% of the problems, which most likely would have led to detection of the "Buchanan" problem before the election. Over four-hundred (423) Democratic test participants would have been required to find 99% of the problems.

What most of Ms. LePore's critics are ignoring is that more than 99% of the voters had no trouble voting when using Ms. LePore's ballot. They obviously intended to vote for Mr. Gore and actually did vote for him. Of significant interest to us is what was different about the 1% of people who had problems? Taken further, what could be done to change the ballot so that virtually everyone voted without problems?

There are several possibilities about those that had problems:

  • Were they using ballots that had been printed differently?
  • Were they much older or much younger?
  • Were they more or less intelligent?
  • Were they more or less educated?
  • Were they first-time or long-time voters?
  • Were they much taller or much shorter?
  • Did they forget their glasses?
  • Did they have trouble reading English?
  • Did they vote early in the day or late?
  • Did they vote when they were "fresh" or when they were very tired?
  • Did they have difficulty following instructions?
  • Did they receive no help or instructions, or special instructions from the voting staff?
  • Did they have accessibility problems (low vision, movement control, etc.)?
  • Did they have a condition that would hamper their voting, such as Parkinson's disease?
  • Were they taking a prescriptive (or illegal) drug that affected their concentration?
  • Were they very nervous (did they have high anxiety)?
  • Were they motivated just to vote, not to vote correctly?

A good usability tester would have tried to determine which of the above reasons (and possibly others) most affected the voters. Where possible, the ballot would have been changed to better accommodate the users that had problems.

The "multiple votes" problem

The same reasoning and calculations can be used with the other major problem of multiple votes. In Palm Beach County there were 19,020 other ballots that were not considered valid (they were disqualified) because the voters had voted (punched) for more than one presidential candidate. In the official results, there were 432,286 ballots completed in Palm Beach county. This means that 4.4% of the ballots were considered invalid (19,020/432,286).

The question is how many test participants would have been required to have almost certainly detected the problem? The same formula can be applied. I calculate that they would have required 65 participants to complete a sample ballot, in order to find 95% of the problems (94 subjects to detect 99% of problems). This is far fewer than were required for the "Buchanan" problem because a higher percentage of voters actually ended up making the "multiple votes" error.

The "dimpled ballot" problem

Even the highly publicized "dimpled ballot" problem could have been identified before the election.

Palm Beach county had the initial machine count on November 7, then a machine recount on November 8, and then the absentee ballots were added. They then manually counted all 432,286 ballots cast. After the manual recount, Gore had gained about 215 more votes then did Bush. The manual recount was complicated by about 3,300 ballots that did not have clear punches for either candidate. These included those that were mispunched (hole in the wrong place), partially punched (the chad was still hanging), pin-hole punched (some light could be seen through the hole), some that were almost punched (dimpled), etc. Each of these ballots were closely reviewed by the three-member canvassing board.

Would it have been possible to have done a usability test that would have identified these punched-card variations before the election? It would have required highly experienced usability testers. They would have required truly representative test participants, the actual ballots (not samples), some of the actual Votomatic punchcard machines and styluses, and test items that were truly representative of the voting experience (including the ability to not vote for certain candidates). The number of subjects needed to detect 95% of the errors would have been 115, and to detect 99% would have been 166.

Problems associated with using the punchcard machines have been known for many years. Many changes have been made to the machines to reduce these problems. In addition a set of instructions on how to vote is provided on (a) the sample ballots, (b) the actual ballots and (c) the walls of the voting booth itself in large letters . The instructions say:

"STEP 3 – To vote, hold the voting instrument straight up. Punch straight down through the ballot card for the candidates of your choice." (The bolding was on the voter's instructions in the ballot.)

One final point should be made. To help shift some of the responsibility for having each voter's ballot counted to the voter, a final instruction in all capital letters, is shown at the bottom of the "instructions" page:

"AFTER VOTING, CHECK YOUR BALLOT CARD TO BE SURE YOUR VOTING SELECTIONS ARE CLEARLY AND CLEANLY PUNCHED AND THERE ARE NO CHIPS LEFT HANGING ON THE BACK OF THE CARD."

Conclusion

My conclusion is that Theresa LePore should not be so severely criticized for making design decisions that led to the "Buchanan" and "Multiple votes" problems. In the past, few (if any) ballots (and their related instructions) have received the kind of rigorous usability testing that would have identified these problems before the actual election. Having a certain number of voter problems, and uncounted votes, has been more or less considered an acceptable part of holding elections with millions of voters. For elections that were not too close, the traditional ways of casting and counting votes has been "good enough."

Generally, usability testing has been considered too expensive. I figure that it would have cost about $20,000 to run the necessary performance tests on LePore's Palm Beach ballot. These usability tests would have enabled ballot designers to find and rectify the "Buchanan" problem, the "Multiple votes" problem, and maybe even the "dimpled ballot" problem. The two presidential candidates spent about one billion dollars trying to get elected.

Footnote

*Calculation of required number of test participants:
A reasonable estimate of the number of participants required to detect the problem can be made by using the formula:
1-(1-p)n , where p = the probability of the usability problem occurring, and n = the number of test participants required.

Home|Training|Publications|Usability Tools |About

Contact Dr. Bob Bailey at (801) 201-2002 or bob@webusability.com
Copyright 2002 - 2005