Download notebook (.ipynb)

Waterfall Plot#

A waterfall plot shows the cumulative effect of sequentially introduced positive or negative values.

To use it, you need to import the ‘bistro’ module.

import pandas as pd

from lets_plot import *
from lets_plot.bistro import *
LetsPlot.setup_html()
data = {
    "Accounts": ["Product revenue", "Services revenue", "Fixed costs", "Variable costs"],
    "Values": [830_000, 290_000, -360_000, -150_000],
}

Default View#

waterfall_plot(data, "Accounts", "Values")

Improved View#

waterfall_plot(data, "Accounts", "Values", \
               size=.75, width=.8, total_title="Profit", \
               hline=element_line(linetype='solid', size=1), \
               connector=element_line(linetype='dotted'), \
               label=element_text(size=20, family="Courier", face='bold'), \
               label_format="$~s") + \
    scale_y_continuous(name="Values", format="$~s") + \
    ggtitle("Company Profit (in USD)") + \
    ggsize(1000, 500) + \
    theme_minimal() + \
    theme(plot_title=element_text(size=20, face='bold', hjust=.5))

Additional Parameters#

measure and group#

df = pd.DataFrame({
    "Company": ["Badgersoft"] * 7 + ["AIlien Co."] * 7,
    "Accounts": ["initial", "revenue", "costs", "Q1", "revenue", "costs", "Q2"] * 2,
    "Values": [200, 200, -100, None, 250, -100, None, \
               150, 50, -100, None, 100, -100, None],
    "Measure": ['absolute', 'relative', 'relative', 'total', 'relative', 'relative', 'total'] * 2,
})
company_df = df[df["Company"] == "Badgersoft"]

waterfall_plot(df, "Accounts", "Values", measure="Measure", group="Company") + \
    facet_grid(x="Company", scales='free_x')

calc_total#

calc_total=False disables the calculation of the total.

If the measure serie is specified however, the calc_total setting has no effect.

gggrid([
    waterfall_plot(data, "Accounts", "Values", calc_total=False),
    waterfall_plot(company_df, "Accounts", "Values", measure="Measure", calc_total=False),
])

Labels#

There are several parameters that allow you to control the text labels on the waterfalls:

  • relative_labels: content and formatting of annotation labels on relative change bars (result of the call to the layer_labels() function);

  • absolute_labels: content and formatting of annotation labels on absolute value bars (result of the call to the layer_labels() function);

  • label: style settings for all text labels (result of the call to the element_text() function).

waterfall_plot(data, "Accounts", "Values", relative_labels=layer_labels().line("@{..flow_type..}d:\n@..label.."),
                                           absolute_labels=layer_labels().line("Result:\n@..label.."),
                                           label=element_text(face="bold_italic"))

Hiding Labels

gggrid([
    waterfall_plot(data, "Accounts", "Values", relative_labels='none') + ggtitle("Hide relative labels only"),
    waterfall_plot(data, "Accounts", "Values", absolute_labels='none') + ggtitle("Hide absolute labels only"),
    waterfall_plot(data, "Accounts", "Values", label='blank') + ggtitle("Hide all labels"),
])

Tooltips#

Tooltips for relative and absolute measures should be specified independently.

relative_tooltips = layer_tooltips().title("Account: @..xlabel..")\
                                    .format("@..initial..", " $,.3~s")\
                                    .format("@..value..", " $,.3~s")\
                                    .line("@{..flow_type..}d from @..initial.. to @..value..")\
                                    .disable_splitting()
absolute_tooltips = 'none'

gggrid([
    waterfall_plot(data, "Accounts", "Values",
                   relative_tooltips='detailed', absolute_tooltips='detailed') + \
        ggtitle("'detailed' tooltips"),
    waterfall_plot(data, "Accounts", "Values",
                   relative_tooltips=relative_tooltips, absolute_tooltips=absolute_tooltips) + \
        ggtitle("Custom tooltips"),
])

sorted_value#

waterfall_plot(data, "Accounts", "Values", sorted_value=True)

threshold/max_values#

gggrid([
    waterfall_plot(data, "Accounts", "Values") + ggtitle("Default"),
    waterfall_plot(data, "Accounts", "Values", threshold=300_000) + ggtitle("Specified threshold"),
    waterfall_plot(data, "Accounts", "Values", max_values=2) + ggtitle("Specified max_values"),
])

base#

waterfall_plot(data, "Accounts", "Values", base=400_000)

Combining waterfall_plot() with Other Geometry Layers#

Waterfall plots can be enhanced by adding background and foreground layers. Foreground layers can be added using the regular + operator. Background layers can be added using the background_layers parameter.

Limitations:

  • layers must provide their own data;

  • data coordinates must be numeric.

# background layer and its data
quarter_data = {
    "period_start": [0.5, 3.5],
    "period_end": [3.5, 6.5],
    "ai_introduced": [False, True],
}
quarter_layer = geom_band(
    aes(
        xmin="period_start",
        xmax="period_end",
        paint_a="ai_introduced"
    ),
    data=quarter_data,
    alpha=0.2,
    # we use "paint_a" to color the bands based on a separate category (e.g., quarters),
    # so they have their own color palette independent from the waterfalls
    fill_by="paint_a", color_by="paint_a"
)

# foreground layers and their data
quarter_label_data = {
    "name": ["Q1", "Q2"],
    "x": [2, 5],
    "y": [600, 600],
}
quarter_ai_status_data = {
    "text": ["Before AI\nintroduction", "After AI\nintroduction"],
    "x": [1.5, 4.5],
    "y": [100, 100],
}
text_layers = geom_text(aes(x="x", y="y", label="name"), data=quarter_label_data, size=8) + \
    geom_text(aes(x="x", y="y", label="text"), data=quarter_ai_status_data, size=12)

# whole plot
(waterfall_plot(company_df, "Accounts", "Values", measure="Measure",
                background_layers=quarter_layer)  # background layer
  + text_layers                                   # foreground layers
  + scale_hue("paint_a", guide="none")            # color for the background layer (bands)
  + ggtitle("Waterfall with additional layers"))

Customize Colors#

Let’s look at the names of the flow types using the show_legend parameter:

wp = waterfall_plot(company_df, "Accounts", "Values", measure="Measure", show_legend=True)
wp

Use these names to customize the colors:

wp + scale_fill_manual(values={
        "Increase": "#66c2a5",
        "Decrease": "#fc8d62",
        "Absolute": "#e78ac3",
        "Total": "#8da0cb",
    })

If desired, you can also change the names of the flow types in the legend:

wp + scale_fill_manual(values={
        "Increase": "#66c2a5",
        "Decrease": "#fc8d62",
        "Absolute": "#e78ac3",
        "Total": "#8da0cb",
    }, labels=["inc", "dec", "abs", "total"])

You can use a constant color for boxes and 'flow_type' color for their borders:

waterfall_plot(company_df, "Accounts", "Values", measure="Measure", size=.75,
               fill="gray90", color="flow_type")

To paint the text labels, combine color="flow_type" and label=element_text(color='inherit'):

waterfall_plot(company_df, "Accounts", "Values", measure="Measure",
               fill="gray90",
               color="flow_type",                   #  Needed for mapping color to flow type
               label=element_text(color='inherit')) #  Needed to inherit the text label color from the color of boxes border

The same can be done, for example, only for the relative text labels:

waterfall_plot(company_df, "Accounts", "Values", measure="Measure",
               fill="gray90",
               color="flow_type",                                                 #  Map color to flow type
               label=element_text(color="indigo"),                                #  Choose some default color for absolute labels
               relative_labels=layer_labels().line("@..label..").inherit_color()) #  Inherit color for the relative text labels