Sunday, 10 September 2023

About the 2023 Singapore Presidential Election Sample Count

Singapore's ninth President was elected on the first of this month. Like many Singaporeans, I stayed up to watch the results. At around 11 PM, a sample count was displayed, and it was mentioned that the final count's percentages usually deviate by about 5 percent.

Now, as a programmer, and by extension, a numbers and statistics nerd, I could not let this one go. Thus, once it was over and the results confirmed, I was up hitting my Python console. I simply had to confirm this.

Five percent, yo.

And this is how I began.

I was going to need some random functionality and some statistics crunching. Thus these two libraries were imported.
import random
import statistics

I followed up by declaring a list, pe_results. And then I populated the list with actual results. A 0 was a vote for Tharman Shanmugaratnam, a 1 was for Ng Kok Song, and a 2 was for Tan Kin Lian.
import random
import statistics

pe_results = []

for vote in range(1, 2480760):
    if (vote >= 1 and vote <= 1746427): pe_results.append(0)
    if (vote >= 1746427 and vote <= 1746427 + 390041): pe_results.append(1)
    if (vote >= 1746427 + 390041 + 1): pe_results.append(2)


Next, another list was needed, diff.
import random
import statistics

pe_results = []

for vote in range(1, 2480760):
    if (vote >= 1 and vote <= 1746427): pe_results.append(0)
    if (vote >= 1746427 and vote <= 1746427 + 390041): pe_results.append(1)
    if (vote >= 1746427 + 390041 + 1): pe_results.append(2)

diff = []

Then I used the shuffle() method from the random library to randomly sort the list pe_results. And then I declared another list, sample, comprising of the first 100 values of pe_results.
diff = []

random.shuffle(pe_results)
sample = pe_results[:100]


Now I had counts for the three candidates, all starting at 0. And I tallied up the counts using a For loop.
sample = pe_results[:100]

ts_count = 0
nks_count = 0
tkl_count = 0

for vote in sample:
    if (vote == 0): ts_count+=1
    if (vote == 1): nks_count+=1
    if (vote == 2): tkl_count+=1


After that, I needed the actual final results for each candidate, taken from the link at the start of this blogpost.
ts_count = 0
nks_count = 0
tkl_count = 0

for vote in sample:
    if (vote == 0): ts_count+=1
    if (vote == 1): nks_count+=1
    if (vote == 2): tkl_count+=1

final_ts = 70.4
final_nks = 15.72
final_tkl = 13.88


Next, I used the diff list, populating it with the differences in percentages for the sample counts and the final counts. I used the abs() function because a variance is a variance, regardless of which direction it is.
final_ts = 70.4
final_nks = 15.72
final_tkl = 13.88

diff.append(abs(ts_count - final_ts))
diff.append(abs(nks_count - final_nks))
diff.append(abs(tkl_count - final_tkl))


And then finally, I printed the Mean and Median of the diff list, using the statistics library, as well as the highest and lowest values.
diff.append(abs(ts_count - final_ts))
diff.append(abs(nks_count - final_nks))
diff.append(abs(tkl_count - final_tkl))

print ("Mean: " + str(statistics.mean(diff)))
print ("Median: " + str(statistics.median(diff)))
print ("Highest: " + str(max(diff)))
print ("Lowest: " + str(min(diff)))


Just to be sure, I ran this 10 different times so there would be more data.
for x in range(1, 10):
    random.shuffle(pe_results)
    sample = pe_results[:100]

    ts_count = 0
    nks_count = 0
    tkl_count = 0

    for vote in sample:
        if (vote == 0): ts_count+=1
        if (vote == 1): nks_count+=1
        if (vote == 2): tkl_count+=1

    final_ts = 70.4
    final_nks = 15.72
    final_tkl = 13.88

    diff.append(abs(ts_count - final_ts))
    diff.append(abs(nks_count - final_nks))
    diff.append(abs(tkl_count - final_tkl))

print ("Mean: " + str(statistics.mean(diff)))
print ("Median: " + str(statistics.median(diff)))
print ("Highest: " + str(max(diff)))
print ("Lowest: " + str(min(diff)))

And these were the results! Both the Mean and Median came to about 3 and change. No matter how many times I ran the code, it didn't vary much.
Mean: 3.5851851851851855
Median: 3.2799999999999994
Highest: 9.4
Lowest: 0.12

The Final Conclusion

It looks like the variance is generally even lower than reported, though it can go a lot higher and lower. But it was a fun experiment, nevertheless.

You can count on me, Singapore.
T___T

No comments:

Post a Comment