geom_pointdensity()¶
geom_pointdensity() is like geom_point(), but smarter in crowded spots. It plots each data point, and also colors that point based on how many other points are packed around it. Dense clusters get one color, sparse areas get another. So instead of just a scatterplot, you get a built-in heatmap of local point density, without needing a separate 2D density layer.
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
Prepare Data¶
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/diamonds.csv")
df["color"] = df["color"].map({"D": 7, "E": 6, "F": 5, "G": 4, "H": 3, "I": 2, "J": 1})
df = df.assign(is_ideal=(df["cut"] == "Ideal").map({True: "Quality: ideal", False: "Quality: not ideal"}))
print(df.shape)
df.head()
fair_cut_df = df[df["cut"] == "Fair"].drop(columns=["cut", "is_ideal"]).reset_index(drop=True)
print(fair_cut_df.shape)
fair_cut_df.head()
Default View¶
p = ggplot(fair_cut_df, aes("carat", "price"))
p + geom_pointdensity()
Parameters¶
adjust¶
gggrid([
p + geom_pointdensity() + ggtitle("adjust=1 (default)"),
p + geom_pointdensity(adjust=.1) + ggtitle("adjust=.1"),
p + geom_pointdensity(adjust=10) + ggtitle("adjust=10"),
])
method¶
Parameter method tells geom_pointdensity() how to estimate "how crowded is it here?" around each point.
Here are the options:
'neighbours'- for every point, it counts how many other points fall within some radius.Use when: you have a few thousand points (or less) and you want a very local, discrete crowding measure that treats each point individually.
'kde2d'- builds a smooth 2D density surface (kernel density estimate) and then looks up that smooth density at each point.Use when: you have a ton of points (tens of thousands+), or you want something smoother / less noisy than direct neighbour counts.
'auto'(default) - it chooses for you. For smaller datasets it behaves like'neighbours'; for larger datasets it switches to'kde2d', because that scales better.Use when: you’re not sure about performance trade-offs and just want a sensible default.
gggrid([
p + geom_pointdensity(aes(color='..count..')) + ggtitle("method='auto' (default)"),
p + geom_pointdensity(aes(color='..count..'), method='neighbours') + ggtitle("method='neighbours'"),
p + geom_pointdensity(aes(color='..count..'), method='kde2d') + ggtitle("method='kde2d'"),
])
Sometimes you may have additional reasons to explicitly specify the method:
ggplot(df, aes("carat", "price")) + \
geom_pointdensity() + \
facet_grid(x="is_ideal")
Although both subplots have the same distribution and a similar number of points, it is clear that the pictures are too different. This is because different methods were applied to different facets; the decision on which method to use is made independently for each data group.
This can easily be corrected by specifying the method explicitly:
ggplot(df, aes("carat", "price")) + \
geom_pointdensity(method='kde2d') + \
facet_grid(x="is_ideal")
Improved Appearance¶
p + \
geom_pointdensity(aes(alpha="color", color='..count..'),
tooltips=layer_tooltips().line("neighbours count|@..count..")
.line("diamond colour\nfrom 1 (worst) to 7 (best)|@color")
.line("clarity|@clarity")) + \
scale_color_viridis(name="neighbours count") + \
scale_alpha(range=[.1, .9], guide='none') + \
ggtb() + \
ggsize(1000, 600) + \
theme_classic()