Monday 13 May 2024

Web Tutorial: Python Matplotlib Bar Chart (Part 1/2)

Let's do some charting!

It will be football staistics again of Liverpool Football Club, and we will use Python's matplotlib library. Python is among the top choices for Data Analytics, and its charting capabilities are just the tip of the iceberg.

We start off by importing some libraries. We want numpy, and we want the pyplot functionality of matplotlib.
import numpy as np
import matplotlib.pyplot as plt


Next, we create a dictionary, data.
import numpy as np
import matplotlib.pyplot as plt

data = {

}


Data will have statistics split into seasons.
data = {
    2017: {

    },
    2018: {

    },
    2019: {

    },
    2020: {

    },
    2021: {

    },
    2022: {

    }

}


In 2017, we have statistics for these players.
data = {
    2017: {
        "Mohamed Salah": {},
        "Roberto Firminho": {},
        "Sadio Mane": {},
        "Alex Oxlade-Chamberlain": {}

    },
    2018: {

    },
    2019: {

    },
    2020: {

    },
    2021: {

    },
    2022: {

    }
}


Each player, in turn, has numerical properties goals and appearances.
data = {
    2017: {
        "Mohamed Salah": {"goals": 44, "appearances": 52},
        "Roberto Firminho": {"goals": 27, "appearances": 54},
        "Sadio Mane": {"goals": 20, "appearances": 44},
        "Alex Oxlade-Chamberlain": {"goals": 5, "appearances": 42}
    },
    2018: {

    },
    2019: {

    },
    2020: {

    },
    2021: {

    },
    2022: {

    }
}


I filled up the statistics for the rest of the seasons.
data = {
    2017: {
        "Mohamed Salah": {"goals": 44, "appearances": 52},
        "Roberto Firminho": {"goals": 27, "appearances": 54},
        "Sadio Mane": {"goals": 20, "appearances": 44},
        "Alex Oxlade-Chamberlain": {"goals": 5, "appearances": 42}
    },
    2018: {
        "Mohamed Salah": {"goals": 27, "appearances": 52},
        "Roberto Firminho": {"goals": 16, "appearances": 48},
        "Sadio Mane": {"goals": 26, "appearances": 50},
        "Alex Oxlade-Chamberlain": {"goals": 0, "appearances": 2}

    },
    2019: {
        "Mohamed Salah": {"goals": 23, "appearances": 48},
        "Roberto Firminho": {"goals": 12, "appearances": 52},
        "Sadio Mane": {"goals": 22, "appearances": 47},
        "Alex Oxlade-Chamberlain": {"goals": 8, "appearances": 43}

    },
    2020: {
        "Mohamed Salah": {"goals": 31, "appearances": 51},
        "Roberto Firminho": {"goals": 9, "appearances": 48},
        "Sadio Mane": {"goals": 16, "appearances": 48},
        "Alex Oxlade-Chamberlain": {"goals": 1, "appearances": 17},
        "Diogo Jota": {"goals": 13, "appearances": 30}

    },
    2021: {
        "Mohamed Salah": {"goals": 31, "appearances": 51},
        "Roberto Firminho": {"goals": 11, "appearances": 35},
        "Sadio Mane": {"goals": 23, "appearances": 51},
        "Alex Oxlade-Chamberlain": {"goals": 3, "appearances": 29},
        "Diogo Jota": {"goals": 21, "appearances": 55},
        "Luis Diaz": {"goals": 6, "appearances": 26}

    },
    2022: {
        "Mohamed Salah": {"goals": 30, "appearances": 51},
        "Roberto Firminho": {"goals": 13, "appearances": 35},
        "Alex Oxlade-Chamberlain": {"goals": 1, "appearances": 13},
        "Diogo Jota": {"goals": 7, "appearances": 28},
        "Luis Diaz": {"goals": 5, "appearances": 21}

    }
}


And now we're going to call this function, barChart().
    2022: {
        "Mohamed Salah": {"goals": 30, "appearances": 51},
        "Roberto Firminho": {"goals": 13, "appearances": 35},
        "Alex Oxlade-Chamberlain": {"goals": 1, "appearances": 13},
        "Diogo Jota": {"goals": 7, "appearances": 28},
        "Luis Diaz": {"goals": 5, "appearances": 21}
    }
}

barChart()


And then we create this function at the beginning of the code, after the part where we import the libraries. It has four parameters - labels, vals, season and stat. The first two are arrays and the latter two are strings.
import numpy as np
import matplotlib.pyplot as plt

def barChart(labels, vals, season, stat):

data = {
    2017: {
        "Mohamed Salah": {"goals": 44, "appearances": 52},
        "Roberto Firminho": {"goals": 27, "appearances": 54},
        "Sadio Mane": {"goals": 20, "appearances": 44},
        "Alex Oxlade-Chamberlain": {"goals": 5, "appearances": 42}
    },


So now that these are in place, let's prepare some values to pass into barChart() as arguments. For now, we assume that we want season 2018's goals. We obtain a list of player names by using the list() function.
    2022: {
        "Mohamed Salah": {"goals": 30, "appearances": 51},
        "Roberto Firminho": {"goals": 13, "appearances": 35},
        "Alex Oxlade-Chamberlain": {"goals": 1, "appearances": 13},
        "Diogo Jota": {"goals": 7, "appearances": 28},
        "Luis Diaz": {"goals": 5, "appearances": 21}
    }
}

players = list()

barChart()


We then pass in the element of data pointed to by 2018, and get its keys using the keys() method.
players = list(data[2018].keys())

barChart()


We do something similar for values, which is another list. Except we don't want the keys - we want the values.
players = list(data[2018].keys())
values = list(data[2018].values())

barChart()


Now, each element in values will be a dictionary with goals and appearances properties. We only want the goals. So declare the list, stats, and iterate through values using a For loop.
players = list(data[2018].keys())
values = list(data[2018].values())

stats = [];
for v in values:


barChart()


And then we use the append() function to add the goals property of the current element, v, into stats.
players = list(data[2018].keys())
values = list(data[2018].values())

stats = [];
for v in values:
    stats.append(v["goals"])

barChart()


And we finish up by putting in these arguments into barChart().
players = list(data[2018].keys())
values = list(data[2018].values())

stats = [];
for v in values:
    stats.append(v["goals"])

barChart(players, stats, "2018", "goals")


Now back to the barChart() function. Start by running the figure() method of plt, and setting the figsize argument to a value pair. This defines the width and height of the plot in inches.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))
    
data = {
    2017: {
        "Mohamed Salah": {"goals": 44, "appearances": 52},
        "Roberto Firminho": {"goals": 27, "appearances": 54},
        "Sadio Mane": {"goals": 20, "appearances": 44},
        "Alex Oxlade-Chamberlain": {"goals": 5, "appearances": 42}
    },


Next, we use labels and vals in the bar() method. The last argument, color, is an RGB value. This is Liverpool FC, so we'll go with a brilliant red.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))

    plt.bar(labels, vals, color=(1, 0.2, 0.2))


Finally, we use the show() method to display the chart.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))

    plt.bar(labels, vals, color=(1, 0.2, 0.2))

    plt.show()


And there's the bar chart! With lovely red bars.


We haven't added titles for the bar chart. This is where the parameters season and stat come in. We use the xlabel() method and pass in the string "Players". Then we call the ylabel() method to use the stat parameter value. And finally, we call the title() method and use the season parameter value.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))

    plt.bar(labels, vals, color=(1, 0.2, 0.2))

    plt.xlabel("Players")
    plt.ylabel("No. of " + stat)
    plt.title("Liverpool FC Player Stats for " + season)

    plt.show()


You can see where on the chart the text appears.


I want the values to appear at the top of the bars. For this, I have to iterate through vals using a For loop. And because I want the index value, I have to first use enumerate() on vals.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))

    plt.bar(labels, vals, color=(1, 0.2, 0.2))

    for index, value in enumerate(vals):

    plt.xlabel("Players")
    plt.ylabel("No. of " + stat)
    plt.title("Liverpool FC Player Stats for " + season)
    plt.show()


In the For loop, we use the text() method to put the values in the chart. The first argument is the x value, the second is the y value, and the last is the value of the current element of vals, which we use the str() function on, to convert it to a string. For the first argument, index is reasonable because that's how the position of the bars are determined. For the second value, how high the text appears also depends on how high the bar is (because we want it on top) which in turn depends on the value of the current element of vals.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))

    plt.bar(labels, vals, color=(1, 0.2, 0.2))

    for index, value in enumerate(vals):
        plt.text(index, value, str(value))

    plt.xlabel("Players")
    plt.ylabel("No. of " + stat)
    plt.title("Liverpool FC Player Stats for " + season)
    plt.show()


This needs cleaning up.


We want the text to be a bit higher than the bar, leave a bit of space between the text and the top of the bar.
for index, value in enumerate(vals):
    plt.text(index, value + 1, str(value))


Then we want the text to move left horizontally so that it appears right in the middle of the bar. This is tricky, but basically I use a formula to subtract from the value of index, based on the length of the string value. Therefore, we have to use the str() and len() functions here.
for index, value in enumerate(vals):
    plt.text(index - (len(str(value)) * 0.02), value + 1, str(value))


Much better. But now we have a problem at the top where the text overlaps the top.


Here, we use the ylim() method. We pass in 0 for the first argument, to tell Python that the lowest value to be displayed on the scale is 0. The next value defines the upper limit, and we want this to be a fair bit higher than the maximum value, so that there's space for the numbers. We use the max() function and pass in the vals array as an argument to get the biggest number, then add 5 to it.
def barChart(labels, vals, season, stat):
    plt.figure(figsize = (10, 5))

    plt.bar(labels, vals, color=(1, 0.2, 0.2))

    for index, value in enumerate(vals):
        plt.text(index - (len(str(value)) * 0.02), value + 1, str(value))

    plt.ylim(0, max(vals) + 5)

    plt.xlabel("Players")
    plt.ylabel("No. of " + stat)
    plt.title("Liverpool FC Player Stats for " + season)
    plt.show()


Looking good!


Next, we want to show the average value for this.. To that end, we use the axline() method. We define the y parameter using Numpy's nanmean() method, passing in the vals array as an argument. This will take the average of vals, excluding values that aren't numbers. Since we've pretty much made sure that vals is all numbers, this is a bit of overkill, but let's go with it.
plt.ylim(0, max(vals) + 5)
plt.axhline(y=np.nanmean(vals))

plt.xlabel("Players")
plt.ylabel("No. of " + stat)
plt.title("Liverpool FC Player Stats for " + season)
plt.show()


There it is. The default color is a deep blue, but you can simply pass in the color argument like we did for the bars, to customize this.


One final thing. You know how the player names can be a bit long. Well, to mitigate the problem, let's rotate the labels. We use the xticks() method and pass in the value of 90 to define the rotation.
plt.ylim(0, max(vals) + 5)
plt.xticks(rotation=90)
plt.axhline(y=np.nanmean(vals))


Well done!


There are more ways to customize this. Check out the documentation here.

Next

Querying the dataset and generating new charts.

No comments:

Post a Comment