NC 2012 Interstate Crosscheck election fraud Monte Carlo (statistical) simulator (source code)

Contents:

  1. Usage instructions
  2. Perl source code
  3. Sample output
  4. Simulated v. calculated distribution
  5. Email discussion

 

Usage instructions:


perl test_voter_fraud_stats.pl -h


NC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator

USAGE:

  perl test_voter_fraud_stats.pl [options]

Where [options] is any or all of:

  '-d'          enable debug prints
  '-n12345678'  specify number of simulations to do (default = 1,000,000)
  '-sFilename'  specify file where intermediate results are periodically saved
  '-rFilename'  specify file(s) from which intermediate results should be loaded
                ('-rFilename' may be repeated to load multiple sets of results)
  '-q'          "quick mode" - does half as many rand() calls and runs ~32% faster
  '-b'          just benchmark the computer, do no simulations
  '-h' or '-?'  print these instructions

EXAMPLES:

1. Run 1,500,000 simulations (instead of the default of one million):
perl test_voter_fraud_stats.pl -n1,500,000
(commas are optional.)

2. Use a save-file so that you can restart the simulations if the program
gets stopped before completion:
perl test_voter_fraud_stats.pl -n200000 -srun1.txt
(The data is saved at the end of each row of progress-dots.)

3. View result before it's done (or afterward):
type run1.txt
('type' is for Windows; use 'cat' on Linux.)

4. Restart the simulations from where they left off:
perl test_voter_fraud_stats.pl -n200000 -rrun1.txt
(Note: run1.txt must already exist.)

5. Restart the simulations where they left off, and periodically update the
save-file so you can restart the program again later:
perl test_voter_fraud_stats.pl -n200000 -rrun1.txt -srun1.txt
(Note: run1.txt need not already exist.)

6. Run four instances simultaneously (perhaps on a 4-core computer):
At 1st cmd prompt:
  perl test_voter_fraud_stats.pl -n250000 -rrun1.txt -srun1.txt
At 2nd cmd prompt:
  perl test_voter_fraud_stats.pl -n250000 -rrun2.txt -srun2.txt
At 3nd cmd prompt:
  perl test_voter_fraud_stats.pl -n250000 -rrun3.txt -srun3.txt
At 4th cmd prompt:
  perl test_voter_fraud_stats.pl -n250000 -rrun4.txt -srun4.txt

7. Make a combined report from the data from four different simulation runs:
perl test_voter_fraud_stats.pl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt

by Dave Buurton
http://www.sealevel.info/
M: 919-244-3316

This is free, uncopyrighted, open source software.

*** To view this 'help' message one screen-full at a time, pipe it to 'more':
perl test_voter_fraud_stats.pl -h | more

 


Perl source code:
test_voter_fraud_stats.pl


#!/usr/bin/perl

# by David A. Burton
# Cary, NC  USA
# +1 919-481-2183
# Email: http://www.burtonsys.com/email/
# This is free, uncopyrighted, open source software.
# However, as a courtesy, I ask that you please retain this notice in copies of the program.

# TLIB Version Control fills in the version information for us:
$version_str = "";
#--=>keyflag<=-- "&(#)%n, version %v, %d "
$version_str = "&(#)test_voter_fraud_stats.pl, version 25, 03-Dec-20 ";

# Number of simulations to run:
$numruns = 1000000;

# Note: On my 2011 Dell i5-2310 PC, 1 million simulations takes 170 minutes, or 115 minutes in Quick Mode


# The 2012 NC Interstate Crosscheck found 35750 cases w/ voter name & date-of-birth matching voters in other States
$number_of_name_and_dob_matches = 35750;


# This program is written for Perl 5.008, but works (more slowly) even with
# Perl 4.0036, if you delete this line:

use Time::HiRes 'time';  # for Perl 4 you'll need to delete this line


# immediate output of debug prints
$| = 1;


$numruns_defaulted = 1;  # changed to 0 (false) if they specify '-n...'


# echo the command line
print "\nperl $0 " . join(' ',@ARGV) . "\n\n";


# What version of Perl is this?
$hasperl5 = 0;
$perlver = "3 or earlier";
if ($] =~ /\$\$Revision\:\s*([0-9.]+)\s/) {
   $perlver = $1;  # probably 4.something
} elsif ($] =~ /([0-9][0-9.]*)/) {
   $perlver = $1;  # probably 5.something or 6.something
   $hasperl5 = 1;
}
print "You are using Perl version $perlver\n";


$debugmode=0;  # for debug prints


$start_time = time();  # for measuring the program's runtime

$savefile = '';  # file for periodically saving intermediate results
$restorefiles = ();  # files from which to read previously calculated results
$benchmarkonly = 0;  # 1 if '-b' was specified
$quickmode = 0;  # 1 if '-q' was specified

$numruns_with_commas = &commafy($numruns);  # initial value (in case user doesn't specify '-n...')


# parse command-line options
while (($#ARGV >= 0) && ('-' eq substr($ARGV[0],0,1))) {
   if ($ARGV[0] =~ /^\-(\-|)[h\?]/i) {
      &showhelp();  # display 'help' and exit
      exit 1;
   } elsif ($ARGV[0] =~ /^\-d$/i) {
      $debugmode++;  # turn on debug prints; specify twice for extra verbosity
   } elsif ($ARGV[0] =~ /^\-b$/i) {
      $benchmarkonly++;  # just benchmark the computer, do no simulations
   } elsif ($ARGV[0] =~ /^\-q$/i) {
      $quickmode++;  # "quick mode" -- does half as many rand() calls
   } elsif ($ARGV[0] =~ /^\-n([0-9\,]+)$/i) {  # specify number of simulations (default = 1 million)
      $numruns = $1;
      $numruns =~ s/,//g;
      if ($numruns <= 0) {
         $numruns = 1;
      }
      $numruns_defaulted = 0;
   } elsif ($ARGV[0] =~ /^\-s(.+)$/i) {
      $savefile = $1;  # specify file into which results should periodically be saved
   } elsif ($ARGV[0] =~ /^\-r(.+)$/i) {
      push(@restorefiles,$1);  # specify file(s) from which results should be restored
   } else {
      printf "\nERROR: unrecognized command-line option: '%s'\n\n", $ARGV[0];
      &showhelp();  # display 'help' and exit
      exit 1;
   }
   shift @ARGV;
}


# Detect whether HiRes is available for timing
$loResTimer = !$hasperl5;  # Perl 4 never has HiRes available
if ($hasperl5) {
   # Perl 5 should have HiRes, but let's double-check
   if ((0.0+int($start_time)) == $start_time) {
      # start_time is an exact integer -- looks suspiciously like HiRes is unavailable
      &num_coincidences();  # do something which takes more than 1 millisecond, but less than 1 second
      $start_time = time();
      if ((0.0+int($start_time)) == $start_time) {
         # yep, HiRes is unavailable
         $loResTimer = 1;
      }
   }
}

# benchmark this computer
if ($loResTimer) {
   # special Perl4 benchmarking kluge, since Time::HiRes is unavailable; wait for clock to 'tick'
   do {
      $end_time = time();
   } while ($end_time == $start_time);
   $start_time = $end_time;
}

$cntr = 0;
do {
   # time a run of at least ten simulations
   &num_coincidences();
   $cntr++;
   $end_time = time();
} while (($cntr < 10) || ($end_time == $start_time));
$speed = ($cntr / ($end_time - $start_time));
if ($hasperl5) {
   $passmark = $speed / (94/1616);  # My i5-2310 CPU has a Passmark rating of 1616, and it does 94 simulations / sec
} else {
   $passmark = $speed / (65/1616);  # Perl 4 is slower than Perl 5
   $speed *= 0.95;  # Perl 4 seems to slow down a bit for longer runs
}
if ($quickmode) {
   $passmark *= .684;  # correct for fact that num_coincidences runs faster w/ $quickmode=1
}
$passmark = int($passmark + 0.5);
$speed = int($speed + 0.5);
$speed = &commafy($speed);
print "Speed = $speed simulations/second (estimated single-thread Passmark score $passmark)\n";


if (-1 == $#ARGV) {
   print "\n" . "- "x29 . "-\nNote: for instructions, ctrl-break or ctrl-C now, and run:\n perl $0 -h\n" . "- "x29 . "-\n\n";
}


sub showhelp {
   print "\nNC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator\n" .
         "\n" .
         "USAGE:\n" .
         "\n" .
         "  perl $0 [options]\n" .
         "\n" .
         "Where [options] is any or all of:\n" .
         "\n" .
         "  '-d'          enable debug prints\n" .
         "  '-n12345678'  specify number of simulations to do (default = $numruns_with_commas)\n" .
         "  '-sFilename'  specify file where intermediate results are periodically saved\n" .
         "  '-rFilename'  specify file(s) from which intermediate results should be loaded\n" .
         "                ('-rFilename' may be repeated to load multiple sets of results)\n" .
         "  '-q'          \"quick mode\" - does half as many rand() calls and runs ~32% faster\n" .
         "  '-b'          just benchmark the computer, do no simulations\n" .
         "  '-h' or '-?'  print these instructions\n" .
         "\n" .
         "EXAMPLES:\n" .
         "\n" .
         "1. Run 1,500,000 simulations (instead of the default of one million):\n" .
         "perl test_voter_fraud_stats.pl -n1,500,000\n" .
         "(commas are optional.)\n" .
         "\n" .
         "2. Use a save-file so that you can restart the simulations if the program\n" .
         "gets stopped before completion:\n" .
         "perl test_voter_fraud_stats.pl -n200000 -srun1.txt\n" .
         "(The data is saved at the end of each row of progress-dots.)\n" .
         "\n" .
         "3. View result before it's done (or afterward):\n" .
         "type run1.txt\n" .
         "('type' is for Windows; use 'cat' on Linux.)\n" .
         "\n" .
         "4. Restart the simulations from where they left off:\n" .
         "perl test_voter_fraud_stats.pl -n200000 -rrun1.txt\n" .
         "(Note: run1.txt must already exist.)\n" .
         "\n" .
         "5. Restart the simulations where they left off, and periodically update the\n" .
         "save-file so you can restart the program again later:\n" .
         "perl test_voter_fraud_stats.pl -n200000 -rrun1.txt -srun1.txt\n" .
         "(Note: run1.txt need not already exist.)\n" .
         "\n" .
         "6. Run four instances simultaneously (perhaps on a 4-core computer):\n" .
         "At 1st cmd prompt:\n" .
         "  perl test_voter_fraud_stats.pl -n250000 -rrun1.txt -srun1.txt\n" .
         "At 2nd cmd prompt:\n" .
         "  perl test_voter_fraud_stats.pl -n250000 -rrun2.txt -srun2.txt\n" .
         "At 3nd cmd prompt:\n" .
         "  perl test_voter_fraud_stats.pl -n250000 -rrun3.txt -srun3.txt\n" .
         "At 4th cmd prompt:\n" .
         "  perl test_voter_fraud_stats.pl -n250000 -rrun4.txt -srun4.txt\n" .
         "\n" .
         "7. Make a combined report from the data from four different simulation runs:\n" .
         "perl test_voter_fraud_stats.pl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt\n" .
         "\n" .
         "by Dave Buurton\n" .
         "http://www.sealevel.info/\n" .
         "M: 919-244-3316\n" .
         "\n" .
         "This is free, uncopyrighted, open source software.\n" .
         "\n" .
         "*** To view this 'help' message one screen-full at a time, pipe it to 'more':\n" .
         "perl $0 -h | more\n";
   exit(1);
}


# '-b' was specified, so exit after reporting benchmark results
if ($benchmarkonly) {
   exit 0;
}


if ($debugmode) {
   print "dbg: save file = '$savefile'\n";
   print "dbg: restore files = '" . join("','", @restorefiles) . "'\n";
}


# we don't actually use this
$num_args = $#ARGV+1;


# Initialize the buckets. bucket[N] keeps track of how many simulations
# had N innocent coincidences of Last4SSN matching.
@buckets = ();
for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) {
   $buckets[$i] = 0;
}
# We go ahead and make 35751 buckets, even though less than 20 will ever be used,
# because it can't hurt, and it uses only an extra 1.8 MB of RAM and hardly affects
# performance at all.


# report the results, or save them to a file
sub report_results {
   local($outpfile) = shift;
   local($i, $highest_num, $sum, $percentage, $avg);
   $highest_num = 0;
   for ($i=0; $i < $#buckets; $i++) {
      if ($buckets[$i]) {
         $highest_num = $i;
      }
   }
   $sum = 0;
   for ($i=0; $i<=$highest_num; $i++) {
      $sum += ($buckets[$i] * $i);
      $percentage = 100 * ($buckets[$i] / $numruns);
      printf $outpfile "%3d  :%8d  : %10.6f\n", $i, $buckets[$i], $percentage;
   }
   $avg = $sum / $numruns_done;
   printf $outpfile "Average = %7.5f\n", $avg;
}


# save current (intermediate) results to a text file
sub save_buckets {
   local($outfile) = shift;
   open( OUTPUT, ">$outfile" ) || die "ERROR: could not write '$outfile', $!\n";
   &report_results(OUTPUT);
   close OUTPUT;
}


# Load intermediate results from a text file which was created by save_buckets().
# Note that this can be called multiple times to combine results from several files.
sub load_buckets {
   local($inpfile) = shift;
   local($sum) = 0;
   local($num,$cnt,@tmp);
   if (open(INPUT, "$inpfile")) {
      while (<INPUT>) {
         @tmp = split(/\s*\:\s*/, $_);
         if (2 == $#tmp) {
            ($num,$cnt,$pct) = @tmp;
            $num =~ s/[\s\,]//g;  # delete whitespace and commas
            $cnt =~ s/[\s\,]//g;
            $buckets[$num] += $cnt;
            $sum += $cnt;
         }
      }
      close INPUT;
      print "Loaded $sum simulations from '$inpfile'\n";
   } elsif ($inpfile ne $savefile) {
      die "ERROR: could not read '$inpfile', $!\n";
   } # else if savefile and restorefile are identical, then it's okay if it doesn't initially exist
}


# if '-r...' was specified, then load initial buckets from file, to resume where we left off
for $fn (@restorefiles) {
   &load_buckets($fn);
}
$number_of_runs_preloaded = 0;
for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) {
   if ($buckets[$i]) {
      $number_of_runs_preloaded += $buckets[$i];
   }
}
# $numruns_done is needed for calculating the average
$numruns_done = $number_of_runs_preloaded;

if ($numruns_defaulted && ($#restorefiles > 1) && ($number_of_runs_preloaded > 2)) {
   # we're just making a combined report, so don't default numruns to a million
   $numruns = $numruns_done;
}


if ($number_of_runs_preloaded > $numruns) {
   $numruns = $number_of_runs_preloaded;
} else {
   $remaining_numruns = $numruns - $number_of_runs_preloaded;
   $remaining_numruns_with_commas = &commafy($remaining_numruns);
   $estimated_runtime = $remaining_numruns / $speed;
   $readable_estimated_runtime = &human_time($estimated_runtime);
   print "Estimated run time = $readable_estimated_runtime for $remaining_numruns_with_commas simulations\n";
}
if (($estimated_runtime > (60*60)) && ('' eq $savefile)) {
   print "Note: for long simulation runs like this, you really should use '-sSavefile' so\nthat you can resume if it is interrupted.\n";
}


# put commas into an integer if it is > 4 digits long
sub commafy {
   local($number) = shift;
   local(@pieces) = ();
   $number .= '';
   # if ($debugmode) { print "dbg: number='$number'\n"; }
   if (length($number) > 4) {
      while (length($number) > 0) {
         if (length($number) <= 3) {
            # we could omit this 'if' clause for Perl 5, but Perl 4 needs it
            unshift(@pieces,$number);
            $number = '';
         } else {
            unshift(@pieces,substr($number,-3));
            substr($number,-3) = '';
         }
         # if ($debugmode) {
         #    $tm1 = join(',',@pieces);
         #    print "dbg: number='$number', pieces='$tm1'\n";
         # }
      }
      $number = join(',',@pieces);
   }
   # if ($debugmode) { print "dbg: number='$number'\n"; }
   return $number;
}


# Return a random integer between 1 and 9999, inclusive. (Won't return 0.)
sub rand10k {
   local($result);
   $result = rand(9999); # that's >= 0.0, and < 9999.0  (it can never return 9999)
   $result = 1 + int($result);
   return $result;  # valid SSNs cannot end in 0000
}


# convert input in floating point seconds to nice, human-friendly time (e.g., "xx.x minutes")
sub human_time {
   local($seconds) = shift;
   local($result) = '';
   if ($seconds >= 600) {
      $minutes = $seconds / 60;
      $minutes = int(($minutes * 10) + 0.5) / 10.0;
      $result = sprintf("%3.1f minutes", $minutes);
   } elsif ($seconds >= 60) {
      $minutes = $seconds / 60;
      $minutes = int(($minutes * 100) + 0.5) / 100.0;
      $result = sprintf("%4.2f minutes", $minutes);
   } else {
      $result = sprintf("%4.2f seconds", $seconds);
   }
   return $result;
}


# Run one test: of 35,750 voters, how many match last-4-SSNs by innocent coincidence?
# The expected value, of course, is 35750/9999 = ~3.575
sub num_coincidences {
   local($coincidences) = 0;
   local($i);
   local($ssn1);
   local($ssn2);
   if ($quickmode) {
      $ssn1 = int(rand(9999));  # &rand10k(); -- 'inlined' for better performance
      for ($i=0; $i<$number_of_name_and_dob_matches; $i++) {
         $ssn2 = int(rand(9999));  # &rand10k();
         if ($ssn1 == $ssn2) {
            $coincidences++;
         }
      }
   } else {
      for ($i=0; $i<$number_of_name_and_dob_matches; $i++) {
         $ssn1 = int(rand(9999));  # &rand10k(); -- 'inlined' for better performance
         $ssn2 = int(rand(9999));  # &rand10k();
         if ($ssn1 == $ssn2) {
            $coincidences++;
         }
      }
   }
   return $coincidences;
}



$simulations_per_dot = 50;
$dots_per_line = 60;

$modulo_of_dot = int($dots_per_line/3);
if ($quickmode) {
   $calls2rand = $numruns * (1 + $number_of_name_and_dob_matches);
} else {
   $calls2rand = $numruns * 2 * $number_of_name_and_dob_matches;
}
$calls2rand = &commafy($calls2rand);
$numruns_with_commas = &commafy($numruns);
print "\n$numruns_with_commas simulations";

if (($numruns - $number_of_runs_preloaded) >= 1000) {
   print " requires $calls2rand calls to rand(), which takes a while!\n";
   print "So, after every $simulations_per_dot" . "th simulation it prints a dot ($dots_per_line/line), as a progress indicator.";
}
print "\n\n";

# for the progress indicator
$dotcolumn = $dotrow = 0;


# Main loop to run the simulations and tabulate the results.
# Print "." as progress indicator every $simulations_per_dot simulations, up to $dots_per_line dots per line.
for ($i=$number_of_runs_preloaded; $i<$numruns; $i++) {
   $buckets[ &num_coincidences() ] ++;
   if (($i % $simulations_per_dot) == $modulo_of_dot) {
      # print a dot
      if ($dotcolumn == $dots_per_line) {
         $pctdone = ($i * 100) / $numruns;
         printf("%5.1f%%\n", $pctdone);
         $dotcolumn = 0;
         $dotrow++;
         $numruns_done = $i;
         if ('' ne $savefile) {
            &save_buckets($savefile);
         }
      }
      print ".";
      $dotcolumn++;
   }
}
$numruns_done = $numruns;
if ($dotcolumn > 0) {
   print "\n";
   $dotcolum = 0;
   $dotrow++;
}
# save results one last time at the end
if ('' ne $savefile) {
   &save_buckets($savefile);
}


# remind the user that the simulation results are also in the Savefile, if he specified '-sSavefile'
if ($debugmode && ('' ne $savefile)) {
   print "Note: results of $sum simulations were saved to '$savefile'\n";
}


# report the results:
print "First column is number of coincidences per 35,750 matches\n";
print " : second column is number of runs (out of $numruns_with_commas) which had that number of coincidences\n";
print "  : third column is percentage of runs which had that number of coincidences\n";
&report_results(STDOUT);


# report the run-time:
$end_time = time();
$run_time = $end_time - $start_time;  # in seconds
$run_time = &human_time($run_time);
if ($debugmode || ($number_of_runs_preloaded < $numruns)) {
   print "Run time = $run_time\n";
}

exit 0;

__END__


Sample output:


First column is number of coincidences per 35,750 matches
 : second column is number of runs (out of 25,000,000) which had that number of coincidences
  : third column is percentage of runs which had that number of coincidences
     0  :  700479  :   2.801916
     1  : 2503288  :  10.013152
     2  : 4469133  :  17.876532
     3  : 5333504  :  21.334016
     4  : 4768914  :  19.075656
     5  : 3410612  :  13.642448
     6  : 2031999  :   8.127996
     7  : 1038296  :   4.153184
     8  :  463046  :   1.852184
     9  :  184655  :   0.738620
    10  :   65817  :   0.263268
    11  :   21470  :   0.085880
    12  :    6339  :   0.025356
    13  :    1851  :   0.007404
    14  :     461  :   0.001844
    15  :     103  :   0.000412
    16  :      27  :   0.000108
    17  :       5  :   0.000020
    18  :       1  :   0.000004
Average = 3.57600


Simulated v. calculated distribution:


First column is number of coincidences, k, per 35,750 matches
 : second & third colums are copied from the results of 25,000,000 simulations (above)
  : fourth column is percentage calculated by my online binomial probability distribution calculator
    : fifth column is cumulative percentage from the binomial calculator, k
     k  |      simulated         |                  calculated
   -----+------------------------+-----------------------------------------------
     0  :  700479  :   2.801916% :   2.80004041934694%      :   2.80004041934694%
     1  : 2503288  :  10.013152  :  10.01214692855101       :  12.81218734789794
     2  : 4469133  :  17.876532  :  17.89979198583566       :  30.71197933373361
     3  : 5333504  :  21.334016  :  21.33365886209420       :  52.04563819582780
     4  : 4768914  :  19.075656  :  19.06917141786560       :  71.11480961369340
     5  : 3410612  :  13.642448  :  13.63565916189286       :  84.75046877558626
     6  : 2031999  :   8.127996  :   8.125068959489566      :  92.87553773507582
     7  : 1038296  :   4.153184  :   4.149722300002787      :  97.02526003507861
     8  :  463046  :   1.852184  :   1.854414935099515      :  98.87967497017813
     9  :  184655  :   0.738620  :   0.7365973040199915     :  99.61627227419812
    10  :   65817  :   0.263268  :   0.2633199064110674     :  99.87959218060919
    11  :   21470  :   0.085880  :   0.08557214583945469    :  99.96516432644864
    12  :    6339  :   0.025356  :   0.02549062245912742    :  99.99065494890777
    13  :    1851  :   0.007404  :   0.007008969989723296   :  99.99766391889749
    14  :     461  :   0.001844  :   0.001789497617543090   :  99.99945341651503
    15  :     103  :   0.000412  :   0.0004264151954425543  :  99.99987983171048
    16  :      27  :   0.000108  :   0.00009525622005113322 :  99.99997508793053
    17  :       5  :   0.000020  :   0.00002002686282731367 :  99.99999511479336
    18  :       1  :   0.000004  :   0.00000397646134453779 :  99.99999909125470
Predicted average = 35,750 / 9999 = 3.575357535753...


Emails about this software: