Correlation Plot#

The corr_plot builder takes a dataframe (can be Pandas Dataframe or just Python dict) as the input and builds a correlation plot.

It allows to combine ‘tile’, ‘point’ or ‘label’ layers in a matrix of ‘full’, ‘lower’ or ‘upper’ type.

A call to the terminal build() method will create a resulting ‘plot’ object. This ‘plot’ object can be further refined using regular Lets-Plot (ggplot) API, like + ggtitle(), + ggsize() and so on.

The Ames Housing dataset for this demo was downloaded from House Prices - Advanced Regression Techniques (train.csv), (c) Kaggle.

import numpy as np
import pandas as pd

from lets_plot import *
from lets_plot.bistro.corr import *

LetsPlot.setup_html()

mpg_df = pd.read_csv('https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv')\
    .drop(columns=['Unnamed: 0']).select_dtypes(include=np.number)
print(mpg_df.shape)
mpg_df.head()

(234, 5)

	displ	year	cyl	cty	hwy
0	1.8	1999	4	18	29
1	1.8	1999	4	21	29
2	2.0	2008	4	20	31
3	2.0	2008	4	21	30
4	2.8	1999	6	16	26

Combining ‘tile’, ‘point’ and ‘label’ layers.#

When combining layers, corr_plot chooses an acceptable plot configuration by default.

gggrid([
    corr_plot(mpg_df).tiles().build() + ggtitle("Tiles"),
    corr_plot(mpg_df).points().build() + ggtitle("Points"), 
    corr_plot(mpg_df).tiles().labels().build() + ggtitle("Tiles and labels"),
    corr_plot(mpg_df).points().labels().tiles().build() + ggtitle("Tiles, points and labels")
], ncol=2)

The default plot configuration adapts to the changing options - compare ‘Tiles and labels’ plot above and below.

You can also override the default plot configuration using the parameter ‘type’ - compare ‘Tiles, points and labels’ plot above and below.

gggrid([
    corr_plot(mpg_df).tiles().labels(color="white").build() + ggtitle("Tiles and labels"),
    (corr_plot(mpg_df)
     .tiles(type="upper")
     .points(type="lower")
     .labels(type="full").build() + ggtitle("Tiles, points and labels"))
], ncol=2)

Customizing colors.#

Instead of the default blue-grey-red gradient you can define your own lower-middle-upper colors, or choose one of the available ‘Brewer’ diverging palettes.

Let’s create a gradient resembling one of Seaborn gradients.

bld = corr_plot(mpg_df).points().labels().tiles()

# Configure gradient resembling one of Seaborn gradients.
gradient = (bld
            .palette_gradient(low='#417555', mid='#EDEDED', high='#963CA7')
            .build()) + ggtitle("Custom gradient")

# Configure Brewer 'BrBG' palette.
brewer = (bld
            .palette_BrBG()
            .build()) + ggtitle("Brewer")

gggrid([
    gradient,
    brewer
], ncol=2)