NC 2012 Interstate Crosscheck election fraud Monte Carlo (statistical) simulator (source code)


  1. Usage instructions
  2. Perl source code
  3. Sample output
  4. Simulated v. calculated distribution
  5. Email discussion


Usage instructions:

perl -h

NC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator


  perl [options]

Where [options] is any or all of:

  '-d'          enable debug prints
  '-n12345678'  specify number of simulations to do (default = 1,000,000)
  '-sFilename'  specify file where intermediate results are periodically saved
  '-rFilename'  specify file(s) from which intermediate results should be loaded
                ('-rFilename' may be repeated to load multiple sets of results)
  '-q'          "quick mode" - does half as many rand() calls and runs ~32% faster
  '-b'          just benchmark the computer, do no simulations
  '-h' or '-?'  print these instructions


1. Run 1,500,000 simulations (instead of the default of one million):
perl -n1,500,000
(commas are optional.)

2. Use a save-file so that you can restart the simulations if the program
gets stopped before completion:
perl -n200000 -srun1.txt
(The data is saved at the end of each row of progress-dots.)

3. View result before it's done (or afterward):
type run1.txt
('type' is for Windows; use 'cat' on Linux.)

4. Restart the simulations from where they left off:
perl -n200000 -rrun1.txt
(Note: run1.txt must already exist.)

5. Restart the simulations where they left off, and periodically update the
save-file so you can restart the program again later:
perl -n200000 -rrun1.txt -srun1.txt
(Note: run1.txt need not already exist.)

6. Run four instances simultaneously (perhaps on a 4-core computer):
At 1st cmd prompt:
  perl -n250000 -rrun1.txt -srun1.txt
At 2nd cmd prompt:
  perl -n250000 -rrun2.txt -srun2.txt
At 3nd cmd prompt:
  perl -n250000 -rrun3.txt -srun3.txt
At 4th cmd prompt:
  perl -n250000 -rrun4.txt -srun4.txt

7. Make a combined report from the data from four different simulation runs:
perl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt

by Dave Buurton
M: 919-244-3316

This is free, uncopyrighted, open source software.

*** To view this 'help' message one screen-full at a time, pipe it to 'more':
perl -h | more


Perl source code:


# by David A. Burton
# Cary, NC  USA
# +1 919-481-2183
# Email:
# This is free, uncopyrighted, open source software.
# However, as a courtesy, I ask that you please retain this notice in copies of the program.

# TLIB Version Control fills in the version information for us:
$version_str = "";
#--=>keyflag<=-- "&(#)%n, version %v, %d "
$version_str = "&(#), version 25, 03-Dec-20 ";

# Number of simulations to run:
$numruns = 1000000;

# Note: On my 2011 Dell i5-2310 PC, 1 million simulations takes 170 minutes, or 115 minutes in Quick Mode

# The 2012 NC Interstate Crosscheck found 35750 cases w/ voter name & date-of-birth matching voters in other States
$number_of_name_and_dob_matches = 35750;

# This program is written for Perl 5.008, but works (more slowly) even with
# Perl 4.0036, if you delete this line:

use Time::HiRes 'time';  # for Perl 4 you'll need to delete this line

# immediate output of debug prints
$| = 1;

$numruns_defaulted = 1;  # changed to 0 (false) if they specify '-n...'

# echo the command line
print "\nperl $0 " . join(' ',@ARGV) . "\n\n";

# What version of Perl is this?
$hasperl5 = 0;
$perlver = "3 or earlier";
if ($] =~ /\$\$Revision\:\s*([0-9.]+)\s/) {
   $perlver = $1;  # probably 4.something
} elsif ($] =~ /([0-9][0-9.]*)/) {
   $perlver = $1;  # probably 5.something or 6.something
   $hasperl5 = 1;
print "You are using Perl version $perlver\n";

$debugmode=0;  # for debug prints

$start_time = time();  # for measuring the program's runtime

$savefile = '';  # file for periodically saving intermediate results
$restorefiles = ();  # files from which to read previously calculated results
$benchmarkonly = 0;  # 1 if '-b' was specified
$quickmode = 0;  # 1 if '-q' was specified

$numruns_with_commas = &commafy($numruns);  # initial value (in case user doesn't specify '-n...')

# parse command-line options
while (($#ARGV >= 0) && ('-' eq substr($ARGV[0],0,1))) {
   if ($ARGV[0] =~ /^\-(\-|)[h\?]/i) {
      &showhelp();  # display 'help' and exit
      exit 1;
   } elsif ($ARGV[0] =~ /^\-d$/i) {
      $debugmode++;  # turn on debug prints; specify twice for extra verbosity
   } elsif ($ARGV[0] =~ /^\-b$/i) {
      $benchmarkonly++;  # just benchmark the computer, do no simulations
   } elsif ($ARGV[0] =~ /^\-q$/i) {
      $quickmode++;  # "quick mode" -- does half as many rand() calls
   } elsif ($ARGV[0] =~ /^\-n([0-9\,]+)$/i) {  # specify number of simulations (default = 1 million)
      $numruns = $1;
      $numruns =~ s/,//g;
      if ($numruns <= 0) {
         $numruns = 1;
      $numruns_defaulted = 0;
   } elsif ($ARGV[0] =~ /^\-s(.+)$/i) {
      $savefile = $1;  # specify file into which results should periodically be saved
   } elsif ($ARGV[0] =~ /^\-r(.+)$/i) {
      push(@restorefiles,$1);  # specify file(s) from which results should be restored
   } else {
      printf "\nERROR: unrecognized command-line option: '%s'\n\n", $ARGV[0];
      &showhelp();  # display 'help' and exit
      exit 1;
   shift @ARGV;

# Detect whether HiRes is available for timing
$loResTimer = !$hasperl5;  # Perl 4 never has HiRes available
if ($hasperl5) {
   # Perl 5 should have HiRes, but let's double-check
   if ((0.0+int($start_time)) == $start_time) {
      # start_time is an exact integer -- looks suspiciously like HiRes is unavailable
      &num_coincidences();  # do something which takes more than 1 millisecond, but less than 1 second
      $start_time = time();
      if ((0.0+int($start_time)) == $start_time) {
         # yep, HiRes is unavailable
         $loResTimer = 1;

# benchmark this computer
if ($loResTimer) {
   # special Perl4 benchmarking kluge, since Time::HiRes is unavailable; wait for clock to 'tick'
   do {
      $end_time = time();
   } while ($end_time == $start_time);
   $start_time = $end_time;

$cntr = 0;
do {
   # time a run of at least ten simulations
   $end_time = time();
} while (($cntr < 10) || ($end_time == $start_time));
$speed = ($cntr / ($end_time - $start_time));
if ($hasperl5) {
   $passmark = $speed / (94/1616);  # My i5-2310 CPU has a Passmark rating of 1616, and it does 94 simulations / sec
} else {
   $passmark = $speed / (65/1616);  # Perl 4 is slower than Perl 5
   $speed *= 0.95;  # Perl 4 seems to slow down a bit for longer runs
if ($quickmode) {
   $passmark *= .684;  # correct for fact that num_coincidences runs faster w/ $quickmode=1
$passmark = int($passmark + 0.5);
$speed = int($speed + 0.5);
$speed = &commafy($speed);
print "Speed = $speed simulations/second (estimated single-thread Passmark score $passmark)\n";

if (-1 == $#ARGV) {
   print "\n" . "- "x29 . "-\nNote: for instructions, ctrl-break or ctrl-C now, and run:\n perl $0 -h\n" . "- "x29 . "-\n\n";

sub showhelp {
   print "\nNC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator\n" .
         "\n" .
         "USAGE:\n" .
         "\n" .
         "  perl $0 [options]\n" .
         "\n" .
         "Where [options] is any or all of:\n" .
         "\n" .
         "  '-d'          enable debug prints\n" .
         "  '-n12345678'  specify number of simulations to do (default = $numruns_with_commas)\n" .
         "  '-sFilename'  specify file where intermediate results are periodically saved\n" .
         "  '-rFilename'  specify file(s) from which intermediate results should be loaded\n" .
         "                ('-rFilename' may be repeated to load multiple sets of results)\n" .
         "  '-q'          \"quick mode\" - does half as many rand() calls and runs ~32% faster\n" .
         "  '-b'          just benchmark the computer, do no simulations\n" .
         "  '-h' or '-?'  print these instructions\n" .
         "\n" .
         "EXAMPLES:\n" .
         "\n" .
         "1. Run 1,500,000 simulations (instead of the default of one million):\n" .
         "perl -n1,500,000\n" .
         "(commas are optional.)\n" .
         "\n" .
         "2. Use a save-file so that you can restart the simulations if the program\n" .
         "gets stopped before completion:\n" .
         "perl -n200000 -srun1.txt\n" .
         "(The data is saved at the end of each row of progress-dots.)\n" .
         "\n" .
         "3. View result before it's done (or afterward):\n" .
         "type run1.txt\n" .
         "('type' is for Windows; use 'cat' on Linux.)\n" .
         "\n" .
         "4. Restart the simulations from where they left off:\n" .
         "perl -n200000 -rrun1.txt\n" .
         "(Note: run1.txt must already exist.)\n" .
         "\n" .
         "5. Restart the simulations where they left off, and periodically update the\n" .
         "save-file so you can restart the program again later:\n" .
         "perl -n200000 -rrun1.txt -srun1.txt\n" .
         "(Note: run1.txt need not already exist.)\n" .
         "\n" .
         "6. Run four instances simultaneously (perhaps on a 4-core computer):\n" .
         "At 1st cmd prompt:\n" .
         "  perl -n250000 -rrun1.txt -srun1.txt\n" .
         "At 2nd cmd prompt:\n" .
         "  perl -n250000 -rrun2.txt -srun2.txt\n" .
         "At 3nd cmd prompt:\n" .
         "  perl -n250000 -rrun3.txt -srun3.txt\n" .
         "At 4th cmd prompt:\n" .
         "  perl -n250000 -rrun4.txt -srun4.txt\n" .
         "\n" .
         "7. Make a combined report from the data from four different simulation runs:\n" .
         "perl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt\n" .
         "\n" .
         "by Dave Buurton\n" .
         "\n" .
         "M: 919-244-3316\n" .
         "\n" .
         "This is free, uncopyrighted, open source software.\n" .
         "\n" .
         "*** To view this 'help' message one screen-full at a time, pipe it to 'more':\n" .
         "perl $0 -h | more\n";

# '-b' was specified, so exit after reporting benchmark results
if ($benchmarkonly) {
   exit 0;

if ($debugmode) {
   print "dbg: save file = '$savefile'\n";
   print "dbg: restore files = '" . join("','", @restorefiles) . "'\n";

# we don't actually use this
$num_args = $#ARGV+1;

# Initialize the buckets. bucket[N] keeps track of how many simulations
# had N innocent coincidences of Last4SSN matching.
@buckets = ();
for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) {
   $buckets[$i] = 0;
# We go ahead and make 35751 buckets, even though less than 20 will ever be used,
# because it can't hurt, and it uses only an extra 1.8 MB of RAM and hardly affects
# performance at all.

# report the results, or save them to a file
sub report_results {
   local($outpfile) = shift;
   local($i, $highest_num, $sum, $percentage, $avg);
   $highest_num = 0;
   for ($i=0; $i < $#buckets; $i++) {
      if ($buckets[$i]) {
         $highest_num = $i;
   $sum = 0;
   for ($i=0; $i<=$highest_num; $i++) {
      $sum += ($buckets[$i] * $i);
      $percentage = 100 * ($buckets[$i] / $numruns);
      printf $outpfile "%3d  :%8d  : %10.6f\n", $i, $buckets[$i], $percentage;
   $avg = $sum / $numruns_done;
   printf $outpfile "Average = %7.5f\n", $avg;

# save current (intermediate) results to a text file
sub save_buckets {
   local($outfile) = shift;
   open( OUTPUT, ">$outfile" ) || die "ERROR: could not write '$outfile', $!\n";
   close OUTPUT;

# Load intermediate results from a text file which was created by save_buckets().
# Note that this can be called multiple times to combine results from several files.
sub load_buckets {
   local($inpfile) = shift;
   local($sum) = 0;
   if (open(INPUT, "$inpfile")) {
      while (<INPUT>) {
         @tmp = split(/\s*\:\s*/, $_);
         if (2 == $#tmp) {
            ($num,$cnt,$pct) = @tmp;
            $num =~ s/[\s\,]//g;  # delete whitespace and commas
            $cnt =~ s/[\s\,]//g;
            $buckets[$num] += $cnt;
            $sum += $cnt;
      close INPUT;
      print "Loaded $sum simulations from '$inpfile'\n";
   } elsif ($inpfile ne $savefile) {
      die "ERROR: could not read '$inpfile', $!\n";
   } # else if savefile and restorefile are identical, then it's okay if it doesn't initially exist

# if '-r...' was specified, then load initial buckets from file, to resume where we left off
for $fn (@restorefiles) {
$number_of_runs_preloaded = 0;
for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) {
   if ($buckets[$i]) {
      $number_of_runs_preloaded += $buckets[$i];
# $numruns_done is needed for calculating the average
$numruns_done = $number_of_runs_preloaded;

if ($numruns_defaulted && ($#restorefiles > 1) && ($number_of_runs_preloaded > 2)) {
   # we're just making a combined report, so don't default numruns to a million
   $numruns = $numruns_done;

if ($number_of_runs_preloaded > $numruns) {
   $numruns = $number_of_runs_preloaded;
} else {
   $remaining_numruns = $numruns - $number_of_runs_preloaded;
   $remaining_numruns_with_commas = &commafy($remaining_numruns);
   $estimated_runtime = $remaining_numruns / $speed;
   $readable_estimated_runtime = &human_time($estimated_runtime);
   print "Estimated run time = $readable_estimated_runtime for $remaining_numruns_with_commas simulations\n";
if (($estimated_runtime > (60*60)) && ('' eq $savefile)) {
   print "Note: for long simulation runs like this, you really should use '-sSavefile' so\nthat you can resume if it is interrupted.\n";

# put commas into an integer if it is > 4 digits long
sub commafy {
   local($number) = shift;
   local(@pieces) = ();
   $number .= '';
   # if ($debugmode) { print "dbg: number='$number'\n"; }
   if (length($number) > 4) {
      while (length($number) > 0) {
         if (length($number) <= 3) {
            # we could omit this 'if' clause for Perl 5, but Perl 4 needs it
            $number = '';
         } else {
            substr($number,-3) = '';
         # if ($debugmode) {
         #    $tm1 = join(',',@pieces);
         #    print "dbg: number='$number', pieces='$tm1'\n";
         # }
      $number = join(',',@pieces);
   # if ($debugmode) { print "dbg: number='$number'\n"; }
   return $number;

# Return a random integer between 1 and 9999, inclusive. (Won't return 0.)
sub rand10k {
   $result = rand(9999); # that's >= 0.0, and < 9999.0  (it can never return 9999)
   $result = 1 + int($result);
   return $result;  # valid SSNs cannot end in 0000

# convert input in floating point seconds to nice, human-friendly time (e.g., "xx.x minutes")
sub human_time {
   local($seconds) = shift;
   local($result) = '';
   if ($seconds >= 600) {
      $minutes = $seconds / 60;
      $minutes = int(($minutes * 10) + 0.5) / 10.0;
      $result = sprintf("%3.1f minutes", $minutes);
   } elsif ($seconds >= 60) {
      $minutes = $seconds / 60;
      $minutes = int(($minutes * 100) + 0.5) / 100.0;
      $result = sprintf("%4.2f minutes", $minutes);
   } else {
      $result = sprintf("%4.2f seconds", $seconds);
   return $result;

# Run one test: of 35,750 voters, how many match last-4-SSNs by innocent coincidence?
# The expected value, of course, is 35750/9999 = ~3.575
sub num_coincidences {
   local($coincidences) = 0;
   if ($quickmode) {
      $ssn1 = int(rand(9999));  # &rand10k(); -- 'inlined' for better performance
      for ($i=0; $i<$number_of_name_and_dob_matches; $i++) {
         $ssn2 = int(rand(9999));  # &rand10k();
         if ($ssn1 == $ssn2) {
   } else {
      for ($i=0; $i<$number_of_name_and_dob_matches; $i++) {
         $ssn1 = int(rand(9999));  # &rand10k(); -- 'inlined' for better performance
         $ssn2 = int(rand(9999));  # &rand10k();
         if ($ssn1 == $ssn2) {
   return $coincidences;

$simulations_per_dot = 50;
$dots_per_line = 60;

$modulo_of_dot = int($dots_per_line/3);
if ($quickmode) {
   $calls2rand = $numruns * (1 + $number_of_name_and_dob_matches);
} else {
   $calls2rand = $numruns * 2 * $number_of_name_and_dob_matches;
$calls2rand = &commafy($calls2rand);
$numruns_with_commas = &commafy($numruns);
print "\n$numruns_with_commas simulations";

if (($numruns - $number_of_runs_preloaded) >= 1000) {
   print " requires $calls2rand calls to rand(), which takes a while!\n";
   print "So, after every $simulations_per_dot" . "th simulation it prints a dot ($dots_per_line/line), as a progress indicator.";
print "\n\n";

# for the progress indicator
$dotcolumn = $dotrow = 0;

# Main loop to run the simulations and tabulate the results.
# Print "." as progress indicator every $simulations_per_dot simulations, up to $dots_per_line dots per line.
for ($i=$number_of_runs_preloaded; $i<$numruns; $i++) {
   $buckets[ &num_coincidences() ] ++;
   if (($i % $simulations_per_dot) == $modulo_of_dot) {
      # print a dot
      if ($dotcolumn == $dots_per_line) {
         $pctdone = ($i * 100) / $numruns;
         printf("%5.1f%%\n", $pctdone);
         $dotcolumn = 0;
         $numruns_done = $i;
         if ('' ne $savefile) {
      print ".";
$numruns_done = $numruns;
if ($dotcolumn > 0) {
   print "\n";
   $dotcolum = 0;
# save results one last time at the end
if ('' ne $savefile) {

# remind the user that the simulation results are also in the Savefile, if he specified '-sSavefile'
if ($debugmode && ('' ne $savefile)) {
   print "Note: results of $sum simulations were saved to '$savefile'\n";

# report the results:
print "First column is number of coincidences per 35,750 matches\n";
print " : second column is number of runs (out of $numruns_with_commas) which had that number of coincidences\n";
print "  : third column is percentage of runs which had that number of coincidences\n";

# report the run-time:
$end_time = time();
$run_time = $end_time - $start_time;  # in seconds
$run_time = &human_time($run_time);
if ($debugmode || ($number_of_runs_preloaded < $numruns)) {
   print "Run time = $run_time\n";

exit 0;


Sample output:

First column is number of coincidences per 35,750 matches
 : second column is number of runs (out of 25,000,000) which had that number of coincidences
  : third column is percentage of runs which had that number of coincidences
     0  :  700479  :   2.801916
     1  : 2503288  :  10.013152
     2  : 4469133  :  17.876532
     3  : 5333504  :  21.334016
     4  : 4768914  :  19.075656
     5  : 3410612  :  13.642448
     6  : 2031999  :   8.127996
     7  : 1038296  :   4.153184
     8  :  463046  :   1.852184
     9  :  184655  :   0.738620
    10  :   65817  :   0.263268
    11  :   21470  :   0.085880
    12  :    6339  :   0.025356
    13  :    1851  :   0.007404
    14  :     461  :   0.001844
    15  :     103  :   0.000412
    16  :      27  :   0.000108
    17  :       5  :   0.000020
    18  :       1  :   0.000004
Average = 3.57600

Simulated v. calculated distribution:

First column is number of coincidences, k, per 35,750 matches
 : second & third colums are copied from the results of 25,000,000 simulations (above)
  : fourth column is percentage calculated by my online binomial probability distribution calculator
    : fifth column is cumulative percentage from the binomial calculator, k
     k  |      simulated         |                  calculated
     0  :  700479  :   2.801916% :   2.80004041934694%      :   2.80004041934694%
     1  : 2503288  :  10.013152  :  10.01214692855101       :  12.81218734789794
     2  : 4469133  :  17.876532  :  17.89979198583566       :  30.71197933373361
     3  : 5333504  :  21.334016  :  21.33365886209420       :  52.04563819582780
     4  : 4768914  :  19.075656  :  19.06917141786560       :  71.11480961369340
     5  : 3410612  :  13.642448  :  13.63565916189286       :  84.75046877558626
     6  : 2031999  :   8.127996  :   8.125068959489566      :  92.87553773507582
     7  : 1038296  :   4.153184  :   4.149722300002787      :  97.02526003507861
     8  :  463046  :   1.852184  :   1.854414935099515      :  98.87967497017813
     9  :  184655  :   0.738620  :   0.7365973040199915     :  99.61627227419812
    10  :   65817  :   0.263268  :   0.2633199064110674     :  99.87959218060919
    11  :   21470  :   0.085880  :   0.08557214583945469    :  99.96516432644864
    12  :    6339  :   0.025356  :   0.02549062245912742    :  99.99065494890777
    13  :    1851  :   0.007404  :   0.007008969989723296   :  99.99766391889749
    14  :     461  :   0.001844  :   0.001789497617543090   :  99.99945341651503
    15  :     103  :   0.000412  :   0.0004264151954425543  :  99.99987983171048
    16  :      27  :   0.000108  :   0.00009525622005113322 :  99.99997508793053
    17  :       5  :   0.000020  :   0.00002002686282731367 :  99.99999511479336
    18  :       1  :   0.000004  :   0.00000397646134453779 :  99.99999909125470
Predicted average = 35,750 / 9999 = 3.575357535753...

Emails about this software: