The Story Tells With Data. Part II.

Python has many different charts and other types of visual displays of information, but a handful will work for the important part of your needs. Waterfall charts! This chart type lends itself particularly well to showing the components of change.

A waterfall chart is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values.

In this blog, I will explore how to build a waterfall chart from scratch using Matplotlib, and then I will have a look at an easier implementation with Plotly. Waterfall charts for the first look may sound complicated, but they are just a variation of bar charts, the most common type of chart that we all use.

It visually illustrates how a starting value of something becomes a final value through a series of intermediate additions and subtractions. For example, the chart can start from a beginning monthly balance in a checking account. Positive values show deposits, transfer in, etc. Negative values show checks written, drafts from the account, ATM cash out, etc. The final part represents the balance in the account at the end of the month.

How we can see this chart:

  • Illustrates the cumulative effect of sequential or categorical positive and negative values applied to the starting value.
  • Totals and major subtotal are represented by full columns, while sub-components of the incoming and outgoing streams are represented by color-coded floating blocks.

Also, the additions and subtractions can either be time-based or category-based. The time-based waterfall can represent any time period, such as values in and out of the month, year, decade. The category-based waterfalls can represent various sources of revenues and various sources of expenses for a given time period.

Example of the category-based waterfall chart.

Let’s build a simple waterfalls chart in Python, for this I used Jupyter Notebooks to run my code, Pandas for data wrangling, Numpy for simple math operation, and Matplotlib for the visualizations.

I use the library for Python from Christopher Paul Csiszar.

# install library
!pip install waterfallcharts

Then import all libraries:

import pandas as pd
import numpy as np
import waterfall_chart
import matplotlib.pyplot as plt

I have a small data frame with year results:

Here we can see 5 sources of income and 5 sources of expenses.

For the waterfall chart, we need two variables — first for labels, second for number values.

x_list = df[['label']].values
y_list = [int(x[0]) if x[1]=='Income' else int(x[0])*-1 for x in zip(df['total'],df['type'])]

Now we ready for plots!

# x_list - variable for labels
# y_list - variable for numbers
# net_label - this plot calculate total value and we can named it
# sorted_value - sort by number values from big to small
waterfall_chart.plot (x_list, y_list, net_label='End of the year balance', sorted_value=True)
plt.rcParams.update({'figure.figsize':(13,8)})
plt.title("Year Balance")

The waterfall chart below shows the cash flow from 5 sources of income and 5 sources of expenses, along with the net end of year balance.

And we can see there is a deficit of $17,000 for the year. Most of the income came from Grands, expenses spread between Programs, Salaries, and Other expenses.

Considerations for waterfall charts:

  • it is useful to label the bars directly in a waterfall chart
  • it is important to differentiate the positive and negative values
  • it is a good idea to sort the values: smallest to largest or largest to smallest

One more way to create a waterfall chart is the Plotly library. Here data on the X-axis are chart captions, on the Y-axis we displayed the initial and final values, as well as their change. Then to make the graph look like the previous one, with the sum() function calculate the total and add it to the end of our list.

import plotly.graph_objects as gototal = round(sum(y_list))
y_list.append(total)
x_list.append("Total")
fig = go.Figure(go.Waterfall(
name = "20", orientation = "v",
x = x_list,
textposition = "outside",
text = y_list,
y = y_list,
decreasing = {"marker":{"color":"Maroon", "line":{"color":"red", "width":2}}},
increasing = {"marker":{"color":"Teal"}},
totals = {"marker":{"color":"deep sky blue", "line":{"color":"blue", "width":3}}}
))
fig.update_layout(
title = "Year balance",
showlegend = True
)
fig.show()
Waterfall chart by Plotly.

The waterfall chart from Plotly is interactive: if we point at the bars, we can see numbers and some additional information about it.

Here I tried to explain how to build a beautiful waterfall chart with bars that resemble stair steps, it’s especially useful when we want to compare planned targets with actual values.

Data Scientist | Python Developer | Mom