Cleaning RCV Ballots (for Analysis)
Accepted, safe, conservative ways to interpret spoiled RCV ballots
[In future posts, I analyze past Ranked Choice Voting (RCV) elections. This post explains one subordinate challenge and how I handle it. You may want to skip it until you find yourself interested in it.]
In Ranked Choice Voting (RCV) elections, some voters cast ballots that do not conform to the desired format. Here are common ways that ballots do not conform:
Ranking too few candidates: Some ballots only rank one or two candidates when it’s possible to rank many more. For instance, a ballot might indicate Alice≻Betty≻BLANK.
Leaving intermediate ranks blank: Because most RCV ballots require that you indicate your first, second, third, etc., candidates, it’s possible to omit some ranks. For instance, a ballot might indicate Alice≻BLANK>Betty.
Ranking a candidate more than once: Some RCV ballots includes multiple rankings for the same candidate. For instance, a ballot might indicate Alice≻Betty≻Alice.
Interpreting non-conforming ballots presents options. One could reject all nonconforming, or one could repair them. New York City repairs nonconforming ballots prior to processing them in the following ways:
Ranking too few candidates: Ranking too few ballots is not considered nonconforming because New York uses Instant Runoff Voting, which is well-defined on what to do with shorter ballots. (In fact, the “runoff” part of IRV can be viewed as shortening some, but not all ballots.). So, Alice≻Betty≻BLANK simply becomes Alice≻Betty.
Leaving intermediate ranks blank: New York repairs intermediate blanks by deleting them. So, Alice≻BLANK>Betty becomes Alice≻Betty.
Ranking a candidate more than once: New York repairs repeated candidates by keeping the highest ranking and deleting all lower rankings of the same candidate. So, Alice≻Betty≻Alice≻Carol becomes Alice≻Betty≻Carol.
The first two rules are quite reasonable and very probably reflect the voter’s intention. While it’s possible that a voter made a mistake when omitting a candidate, it’s reasonable to assume the remaining preferences are accurate (and, therefore, should be counted).
The repair for repeated candidates is less obviously representative of the voter’s intentions. Expressing Alice≻Betty≻Alice≻Carol could just as easily be interpreted as Alice≻Betty≻Carol or as Betty≻Alice≻Carol. In election analyses, I also adopt this rule, but not without some reservations.

