David Burton


David Burton <...> Mon, Jul 11, 2016 at 6:03 PM
To: John <...>
Hi John,

First of all, Barr, Coggeshall & Zhao looked at the entire United States. That drastically increases the chance that someone else will be found who shares a given person's name and dob, compared to populations which are geographically limited. The reason your dentist doesn't have trouble with duplicate name+dob matches is that all his patients are nearby, which tremendously decreases the pool of possible matches.

Second, I think they ignored middle names. What they called "full name" was really just first + last, I think. That greatly inflates the number of matches. They wouldn't get anything close to 8% if they didn't ignore middle names. My guess is that they'd get less than 1%.

I have a very common first name, and a fairly common surname. There are five other "David Burton"s registered to vote right here in Wake County, including two other "David A. Burton"s. It would not be terribly surprising if there were another David Andrew Burton with the same dob, somewhere in America. But 8% is still way too high.

In 2012, the SBOE matched NC voters' name+dob against probably 2/3 [is that the right number?] of all the voters who voted in the entire country, and got only 35,750 matches, out of 4,542,488 NC voters, and about 760 of them were actually the same people (actual or impersonated). That leaves only 34,990 coincidental matches.  34,990 / 4,542,488 = 0.770%, which is far short of 8%.

Of course, Including North Carolina, national voter turnout was only about 129 million, which is only 55% the total U.S. voting age population (some of whom can't legally vote, e.g. because they're felons).  If everyone of voting age had been checked, then number of checked people might have almost tripled (1 / (0.55 x (2/3)) = 2.73), so the number of incidental matches would probably have increased as well, perhaps to 0.77 x 2.73 = 2.1%.

2% is still far short of 8%. Even if NC has a below-average number of people with especially common names (but is there any evidence of that?), it seems highly unlikely that 8% of the U.S. population shares a full name + dob with someone else in the USA.

So I'm pretty confident that their very high "8%" figure is due mostly to ignoring middle names and initials. If you read their article, you'll see that they never mention middle names or initials. What they call "full names" are apparently just first name + last name. Ignoring middle names will multiply the number of collisions.

However, there's some fuzziness in the definitions. I'm registered to vote as "David A. Burton," but my actual name is "David Andrew Burton." The other David A. Burton at my church is "David Allen Burton." So if we had the same dob, Interstate Crosscheck would match us, yet a match which used full middle names would not. But even using just middle initials matching, I would not match "David R. Burton."

There are six "David Burton"s registered to vote in Wake County. Three of us are "David A. Burton."  If you match middle name or initial, as available, you get two matched pairs. (If I were registered as "David Andrew Burton" instead of "David A. Burton" you'd get zero.) If you ignore middle name and initial, you get 15 matched pairs.

There are 73 "David Williams"s registered to vote in Wake County, plus a "Dave Williams." Seven have middle initial A, six have middle initial B, seven have middle initial C, etc., and three are registered with no middle initial or name at all (so those would presumably match the names of all of the other 70). If you match middle name or initial, as available, on average each David Williams matches about a half-dozen others. (That's a guesstimate, I didn't tabulate them.) If you ignore middle name and initial, they each match 72 others.

My guess is that ignoring middle names and initials inflates the Barr, Coggeshall & Zhao "8%" figure by very roughly a factor of ten.


On Mon, Jul 11, 2016 at 3:35 PM, John <...> wrote:
Dave, please look over the attached analysis of how name and DOB-matching  could lead to an 8% false match rate.

Doesn't  seem to make sense to me and I'm wondering if it makes sense to you?

Very interested in your thoughts...

Thanks !!