geom_pointdensity()

geom_pointdensity() is like geom_point(), but smarter in crowded spots. It plots each data point, and also colors that point based on how many other points are packed around it. Dense clusters get one color, sparse areas get another. So instead of just a scatterplot, you get a built-in heatmap of local point density, without needing a separate 2D density layer.

In [1]:
import pandas as pd

from lets_plot import *
In [2]:
LetsPlot.setup_html()

Prepare Data

In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/diamonds.csv")
df["color"] = df["color"].map({"D": 7, "E": 6, "F": 5, "G": 4, "H": 3, "I": 2, "J": 1})
df = df.assign(is_ideal=(df["cut"] == "Ideal").map({True: "Quality: ideal", False: "Quality: not ideal"}))
print(df.shape)
df.head()
(53940, 11)
Out[3]:
carat cut color clarity depth table price x y z is_ideal
0 0.23 Ideal 6 SI2 61.5 55.0 326 3.95 3.98 2.43 Quality: ideal
1 0.21 Premium 6 SI1 59.8 61.0 326 3.89 3.84 2.31 Quality: not ideal
2 0.23 Good 6 VS1 56.9 65.0 327 4.05 4.07 2.31 Quality: not ideal
3 0.29 Premium 2 VS2 62.4 58.0 334 4.20 4.23 2.63 Quality: not ideal
4 0.31 Good 1 SI2 63.3 58.0 335 4.34 4.35 2.75 Quality: not ideal
In [4]:
fair_cut_df = df[df["cut"] == "Fair"].drop(columns=["cut", "is_ideal"]).reset_index(drop=True)
print(fair_cut_df.shape)
fair_cut_df.head()
(1610, 9)
Out[4]:
carat color clarity depth table price x y z
0 0.22 6 VS2 65.1 61.0 337 3.87 3.78 2.49
1 0.86 6 SI2 55.1 69.0 2757 6.45 6.33 3.52
2 0.96 5 SI2 66.3 62.0 2759 6.27 5.95 4.07
3 0.70 5 VS2 64.5 57.0 2762 5.57 5.53 3.58
4 0.70 5 VS2 65.3 55.0 2762 5.63 5.58 3.66

Default View

In [5]:
p = ggplot(fair_cut_df, aes("carat", "price"))
In [6]:
p + geom_pointdensity()
Out[6]:

Parameters

adjust

In [7]:
gggrid([
    p + geom_pointdensity() + ggtitle("adjust=1 (default)"),
    p + geom_pointdensity(adjust=.1) + ggtitle("adjust=.1"),
    p + geom_pointdensity(adjust=10) + ggtitle("adjust=10"),
])
Out[7]:

method

Parameter method tells geom_pointdensity() how to estimate "how crowded is it here?" around each point.

Here are the options:

  • 'neighbours' - for every point, it counts how many other points fall within some radius.

    Use when: you have a few thousand points (or less) and you want a very local, discrete crowding measure that treats each point individually.

  • 'kde2d' - builds a smooth 2D density surface (kernel density estimate) and then looks up that smooth density at each point.

    Use when: you have a ton of points (tens of thousands+), or you want something smoother / less noisy than direct neighbour counts.

  • 'auto' (default) - it chooses for you. For smaller datasets it behaves like 'neighbours'; for larger datasets it switches to 'kde2d', because that scales better.

    Use when: you’re not sure about performance trade-offs and just want a sensible default.

In [8]:
gggrid([
    p + geom_pointdensity(aes(color='..count..')) + ggtitle("method='auto' (default)"),
    p + geom_pointdensity(aes(color='..count..'), method='neighbours') + ggtitle("method='neighbours'"),
    p + geom_pointdensity(aes(color='..count..'), method='kde2d') + ggtitle("method='kde2d'"),
])
Out[8]:

Sometimes you may have additional reasons to explicitly specify the method:

In [9]:
ggplot(df, aes("carat", "price")) + \
    geom_pointdensity() + \
    facet_grid(x="is_ideal")
Out[9]:

Although both subplots have the same distribution and a similar number of points, it is clear that the pictures are too different. This is because different methods were applied to different facets; the decision on which method to use is made independently for each data group.

This can easily be corrected by specifying the method explicitly:

In [10]:
ggplot(df, aes("carat", "price")) + \
    geom_pointdensity(method='kde2d') + \
    facet_grid(x="is_ideal")
Out[10]:

Improved Appearance

In [11]:
p + \
    geom_pointdensity(aes(alpha="color", color='..count..'),
                      tooltips=layer_tooltips().line("neighbours count|@..count..")
                                               .line("diamond colour\nfrom 1 (worst) to 7 (best)|@color")
                                               .line("clarity|@clarity")) + \
    scale_color_viridis(name="neighbours count") + \
    scale_alpha(range=[.1, .9], guide='none') + \
    ggtb() + \
    ggsize(1000, 600) + \
    theme_classic()
Out[11]: