Preserving Data–Statistic Bijection in Lets-Plot¶
Some statistical geometries in Lets-Plot (such as geom_sina()) generate their own statistical data, while still keeping a one-to-one correspondence with the original input data points.
Previously, this correspondence was not preserved in the mapping: if you mapped an aesthetic (e.g., color) to a column from the original dataset, all points could end up with an aggregated value.
Now, Lets-Plot preserves the bijection between data and statistics for such geometries. This means you can safely map aesthetics to variables from the original dataset, and they will be correctly aligned with the statistical output.
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv")
print(df.shape)
df.head()
Map Columns to the Aesthetics¶
Sina Stat¶
ggplot(df, aes("drv", "hwy")) + \
geom_violin() + \
geom_sina(aes(color="displ", size="cyl"), seed=42) + \
scale_size(range=[2, 4])
Q-Q Stat¶
ggplot(df) + \
geom_qq(aes(sample="hwy", color="displ", size="cyl")) + \
scale_size(range=[3, 6])
Show Column Values in Tooltips¶
For the above-mentioned statistics, the tooltips can display not only the mapped values, but also any columns from the original dataframe.
ggplot(df, aes(sample="hwy")) + \
geom_qq_line(color='teal') + \
geom_qq(size=3, shape=21, color="black", fill="gold", alpha=.5,
tooltips=layer_tooltips().title("@manufacturer @model")
.line("theoretical|@..theoretical..")
.line("highway mileage (sample)|@..sample..")
.line("city mileage|@cty")
.line("engine displacement in liters|@displ")
.line("year of manufacturing|@year")
.line("number of cylinders|@cyl")
.line("type of transmission|@trans")
.line("drive type|@drv")
.line("fuel type|@fl")
.line("vehicle class|@class")
.format("year", "d")
.min_width(300)
.anchor("bottom_right")) + \
ggsize(1000, 600)