Saturday, 19 February 2022

Mean, Median and Mode in Python (Part 1/3)

The basic ways to analyze numeric data come in the form of formulae that help describe the numbers. These are the Mean, Median and Mode of a dataset. Before computer programming came along and changed the world forever, these were the calculations that were used in statistics. Now, we have languages such as Python, which automate these tasks for us. However, it is useful to know how to derive these numbers ourselves.

To do this, I am going to explain each method, and use Python to implement these methods. At the same time, I am going to use Python's NumPy library to check if these implementations are correct. For the purpose of this exercise, we will be disregarding edge cases such as negative numbers and zero values.

The Mean

This basically is the average value of an entire dataset. To achieve this, we add up all the values and divide this total by the total number of values.

Let's write a function to do this. We'll call it tt_mean(). In it, we accept a parameter, vals, which is a list.
import numpy as np
import statistics as stat

def tt_mean(vals):


We use a For loop to iterate through the list, totalling them up. This value is stored in a variable, total.
import numpy as np
import statistics as stat

def tt_mean(vals):
    total = 0
    
    for v in vals:
        total = total + v


And then we divide that total by the number of values in that list. And we return the final result.
import numpy as np
import statistics as stat

def tt_mean(vals):
    total = 0
    
    for v in vals:
        total = total + v
    
    return total / len(vals)


Compare the results against that of NumPy's mean() function. Here, the test dataset will be a list of 11 numeric values.
import numpy as np
import statistics as stat

def tt_mean(vals):
    total = 0
    
    for v in vals:
        total = total + v
    
    return total / len(vals)

test = [1, 3, 10, 45, 7, 8, 8, 10, 10, 8]

print(tt_mean(test))

print(np.mean(test))


An exact match!


Next

This one was easy. The next one is the Median, and that one will be slightly more complicated.

No comments:

Post a Comment