Statistical Annotations: geomBracket() and geomBracketDodge()¶

geomBracket() is a general-purpose layer for annotating ranges. It is particularly effective for adding significance bars (p-values) to categorical plots.

geomBracketDodge() is a specialized layer designed for annotating ranges between dodged positions (usually dodged groups within a category).

Note: geomBracket() does not compute statistics internally. It is a visualization tool that renders the results of your analysis. In this demo, we use the Mann–Whitney U test to calculate p-values before passing them to the plot.

In [1]:
%useLatestDescriptors
%use dataframe
%use lets-plot
In [2]:
LetsPlot.getInfo()
Out[2]:
Lets-Plot Kotlin API v.4.13.0. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.9.0.
Outputs: Web (HTML+JS), Kotlin Notebook (Swing), Static SVG (hidden)
In [3]:
// In the examples below, we will use the Mann–Whitney U test to calculate p-values

// Mann-Whitney U test (two-sided) matching scipy.stats.mannwhitneyu defaults:
// - average-rank tie handling
// - tie correction in variance
// - continuity correction in z-score
fun mannWhitneyPValue(x: List<Double>, y: List<Double>): Double {
    val nx = x.size
    val ny = y.size
    val n = nx + ny
    val combined = (x.map { it to 0 } + y.map { it to 1 }).sortedBy { it.first }

    // Assign average ranks and accumulate tie correction term T = Σ(t³ - t)
    val ranks = DoubleArray(n)
    var tieCorrection = 0.0
    var i = 0
    while (i < n) {
        var j = i
        while (j < n - 1 && combined[j].first == combined[j + 1].first) j++
        val avgRank = (i + j + 2).toDouble() / 2.0   // 1-based
        for (k in i..j) ranks[k] = avgRank
        val t = (j - i + 1).toDouble()
        tieCorrection += t * t * t - t
        i = j + 1
    }

    val r1 = combined.indices.filter { combined[it].second == 0 }.sumOf { ranks[it] }
    val u1 = r1 - nx.toLong() * (nx + 1) / 2.0
    val mean = nx.toDouble() * ny / 2.0

    // Variance with tie correction (matches scipy formula)
    val variance = nx.toDouble() * ny / 12.0 * (n + 1 - tieCorrection / (n * (n - 1)))
    val std = kotlin.math.sqrt(variance)

    // Z with continuity correction (scipy default: use_continuity=True)
    val z = (kotlin.math.abs(u1 - mean) - 0.5) / std

    // Two-sided p-value: erfc(z / sqrt(2))  [Abramowitz & Stegun approximation]
    fun erfc(a: Double): Double {
        val t = 1.0 / (1.0 + 0.3275911 * a)
        val poly = t * (0.254829592 + t * (-0.284496736 + t * (1.421413741 + t * (-1.453152027 + t * 1.061405429))))
        return poly * kotlin.math.exp(-a * a)
    }
    return erfc(z / kotlin.math.sqrt(2.0))
}

// Matches mizani label_pvalue(add_p=True, accuracy=0.001):
// p < 0.001 is shown as "p < 0.001", otherwise "p = X.XXX"
fun formatPValue(p: Double): String =
    if (p < 0.001) "p < 0.001" else "p = ${"%.3f".format(p)}"
In [4]:
val mpgDf = DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv")
val mpg = mpgDf.toMap()
println("${mpgDf.rowsCount()} x ${mpgDf.columnsCount()}")
mpgDf.head()
234 x 12
Out[4]:

DataFrame: rowsCount = 5, columnsCount = 12

untitledmanufacturermodeldisplyearcyltransdrvctyhwyflclass
1audia41.80000019994auto(l5)f1829pcompact
2audia41.80000019994manual(m5)f2129pcompact
3audia42.00000020084manual(m6)f2031pcompact
4audia42.00000020084auto(av)f2130pcompact
5audia42.80000019996auto(l5)f1626pcompact

Adding Significance Bars (p-values) to Categorical Plots¶

Compute Significance Data¶

In [5]:
fun getPValuesData(
    df: DataFrame<*>,
    catCol: String,
    valCol: String,
    baseDy: Double,
    step: Double
): DataFrame<*> {
    val categories = df[catCol].distinct().toList().map { it.toString() }
    val pairs = categories.flatMapIndexed { i, a ->
        categories.drop(i + 1).map { b -> a to b }
    }
    val xmins = pairs.map { it.first }
    val xmaxs = pairs.map { it.second }
    val ys = pairs.indices.map { i -> baseDy + i * step }
    val ps = pairs.map { (a, b) ->
        val x = df.filter { it[catCol].toString() == a }[valCol].toList().map { it.toString().toDouble() }
        val y = df.filter { it[catCol].toString() == b }[valCol].toList().map { it.toString().toDouble() }
        mannWhitneyPValue(x, y)
    }
    return dataFrameOf(
        xmins.toColumn("xmin"),
        xmaxs.toColumn("xmax"),
        ys.toColumn("y"),
        ps.toColumn("p")
    )
}

val pValuesData = getPValuesData(mpgDf, "drv", "hwy", 47.0, 4.0)
println("${pValuesData.rowsCount()} rows")
pValuesData
3 rows
Out[5]:

DataFrame: rowsCount = 3, columnsCount = 4

xminxmaxyp
f447.0000000.000000
fr51.0000000.000000
4r55.0000000.041046
In [6]:
val p = letsPlot(mpg) { x = "drv"; y = "hwy"; fill = "drv" } +
    geomBoxplot(alpha = .25) +
    geomJitter(height = 0.0, shape = 1, alpha = .25, showLegend = false, seed = 42) { color = "drv" }

Basic p-value Annotations¶

In [7]:
p + geomBracket(data = pValuesData.toMap()) {  // <-- Pass p-values data to the brackets layer
    xmin = "xmin"; xmax = "xmax"; y = "y"; label = "p"
}
Out[7]:
9.26515·10 ​−28​ 5.98547·10 ​−11​ 0.0410456 f 4 r 10 15 20 25 30 35 40 45 50 55 hwy drv drv f 4 r

Format Labels with labelFormat¶

In [8]:
p + geomBracket(data = pValuesData.toMap(), labelFormat = ".2~g") {
    xmin = "xmin"; xmax = "xmax"; y = "y"; label = "p"
}
Out[8]:
9.3·10 ​−28​ 6·10 ​−11​ 0.041 f 4 r 10 15 20 25 30 35 40 45 50 55 hwy drv drv f 4 r

Format Labels with formatPValue()¶

A helper that formats p-values in the style of mizani's label_pvalue(add_p=True): values below 0.001 are shown as "p < 0.001", others as "p = X.XXX".

In [9]:
val pValuesDataLabeled = pValuesData.add("label") { formatPValue(this["p"] as Double) }

p + geomBracket(data = pValuesDataLabeled.toMap()) {
    xmin = "xmin"; xmax = "xmax"; y = "y"; label = "label"
}
Out[9]:
p < 0.001 p < 0.001 p = 0.041 f 4 r 10 15 20 25 30 35 40 45 50 55 hwy drv drv f 4 r

Apply Custom Formatting Logic (Significance Stars)¶

Map p-values to custom strings (like *** or ns) using your own function.

In [10]:
fun starsFormatter(p: Double): String = when {
    p <= 0.001 -> "***"
    p <= 0.01  -> "**"
    p <= 0.05  -> "*"
    else       -> "ns"
}

val pValuesDataStars = pValuesDataLabeled.add("star") { starsFormatter(this["p"] as Double) }

p + geomBracket(data = pValuesDataStars.toMap()) {
    xmin = "xmin"; xmax = "xmax"; y = "y"; label = "star"
}
Out[10]:
*** *** * f 4 r 10 15 20 25 30 35 40 45 50 55 hwy drv drv f 4 r

Annotating Dodged Groups¶

To draw brackets between dodged groups, use geomBracketDodge(). Instead of using data coordinates, this layer uses group indices:

  • x: The main category on the x-axis.
  • istart, iend: The zero-based indices of the dodged groups to compare.
In [11]:
// Group indices must be specified in the same order as they appear on the plot for other layers
val groupOrder = mapOf(4.0 to 0, 6.0 to 1, 8.0 to 2, 5.0 to 3)

val pValuesGroupedData = mpgDf["class"].distinct().toList().map { cls ->
    val subDf = mpgDf.filter { it["class"].toString() == cls.toString() }
    getPValuesData(subDf, "cyl", "hwy", 47.0, 4.0)
        .add("x") { cls }
        .add("start") { groupOrder[this["xmin"].toString().toDoubleOrNull()] }
        .add("end") { groupOrder[this["xmax"].toString().toDoubleOrNull()] }
        .add("star") { starsFormatter(this["p"] as Double) }
        .select("x", "y", "start", "end", "star")
}.concat()

println("${pValuesGroupedData.rowsCount()} rows")
pValuesGroupedData.head(5)
19 rows
Out[11]:

DataFrame: rowsCount = 5, columnsCount = 5

xystartendstar
compact47.00000001***
compact51.00000003ns
compact55.00000013*
midsize47.00000012*
midsize51.00000010***
In [12]:
letsPlot(mpg) { x = "class"; y = "hwy"; fill = asDiscrete("cyl") } +
    geomBoxplot(alpha = .25) +
    geomPoint(
        position = positionJitterDodge(jitterWidth = .2, jitterHeight = 0.0, seed = 42),
        shape = 1, alpha = .25, showLegend = false
    ) { color = asDiscrete("cyl") } +
    geomBracketDodge(data = pValuesGroupedData.toMap()) {
        x = "x"; y = "y"; istart = "start"; iend = "end"; label = "star"
    } +
    scaleFillBrewer(palette = "Accent", format = "d") +
    scaleColorBrewer(palette = "Accent", format = "d") +
    ggsize(1000, 500)
Out[12]:
*** ns * * *** * ** *** *** ns ** * ** ** *** * *** ns ns compact midsize minivan subcompact suv pickup 2seater 10 15 20 25 30 35 40 45 50 55 60 65 hwy class cyl 4 6 8 5

Customizing Bracket Geometry¶

Bottom-Axis (Upside Down) Annotations¶

To place brackets at the bottom of a plot, use negative values for lenstart and lenend.
You may also need to adjust vjust to position the labels correctly outside the bracket.

In [13]:
val upsideDownData = getPValuesData(mpgDf, "drv", "hwy", 8.0, -4.0)
    .add("label") { formatPValue(this["p"] as Double) }

println("${upsideDownData.rowsCount()} rows")
upsideDownData
3 rows
Out[13]:

DataFrame: rowsCount = 3, columnsCount = 5

xminxmaxyplabel
f48.0000000.000000p < 0.001
fr4.0000000.000000p < 0.001
4r0.0000000.041046p = 0.041
In [14]:
p + geomBracket(
    data = upsideDownData.toMap(),
    lenstart = -5, lenend = -5,  // Negative values to reverse the direction of the brackets
    vjust = 2.0                  // Specify vjust to move the labels under the brackets
) {
    xmin = "xmin"; xmax = "xmax"; y = "y"; label = "label"
}
Out[14]:
p < 0.001 p < 0.001 p = 0.041 f 4 r 0 5 10 15 20 25 30 35 40 45 hwy drv drv f 4 r

Precise Alignment with tipLengthUnit = "identity"¶

By default, bracket tips use a fixed length. Setting tipLengthUnit = "identity" allows you to specify tip lengths in data units.
This is essential when you want a bracket tip to reach a specific coordinate, such as the exact top of a boxplot or a particular data point.

In [15]:
// Recall the y-coordinates from the datasets
val maxByDrv = (mpg["drv"] as List<*>).zip(mpg["hwy"] as List<*>)
    .groupBy { it.first.toString() }
    .mapValues { (_, pairs) -> pairs.maxOf { it.second.toString().toInt() } }
println("Upper limits of boxes for each category:\n$maxByDrv\n")
Upper limits of boxes for each category:
{f=44, 4=28, r=26}

In [16]:
// The tip lengths are chosen so that each tip reaches either the previous bracket or the box
// Example (first bracket):
//   lenstart = y - upper_box_limit_at_xmin - 1
//     -> 47 - 44 - 1 = 2
//   lenend   = y - upper_box_limit_at_xmax - 1
//     -> 47 - 28 - 1 = 18
p + geomBracket(
    data = pValuesDataLabeled.toMap(),
    tipLengthUnit = "identity"  // identity units (data space)
) {
    xmin = "xmin"; xmax = "xmax"; y = "y"; label = "label"
    lenstart = listOf(2, 1, 1)
    lenend = listOf(18, 24, 1)
}
Out[16]:
p < 0.001 p < 0.001 p = 0.041 f 4 r 10 15 20 25 30 35 40 45 50 55 hwy drv drv f 4 r

Non-Statistical Annotations: Range Grouping¶

Brackets can be used for more than just drawing p-values.
For example, they can be used to highlight cluster boundaries in the following scatter plot:

In [17]:
fun getContinuousBracketsData(
    df: DataFrame<*>,
    catCol: String,
    primaryValCol: String,
    secondaryValCol: String,
    stepMult: Double
): DataFrame<*> {
    val secondaryVals = df[secondaryValCol].toList().map { it.toString().toDouble() }
    val step = (secondaryVals.max() - secondaryVals.min()) * stepMult
    val base = secondaryVals.max() + step

    val categories = listOf("r", "4", "f")
        .filter { cat -> df[catCol].toList().any { it.toString() == cat } }

    val primaryValsPerCat = categories.map { cat ->
        df.filter { it[catCol].toString() == cat }[primaryValCol].toList().map { it.toString().toDouble() }
    }
    return dataFrameOf(
        categories.toColumn(catCol),
        primaryValsPerCat.map { it.min() }.toColumn("min_$primaryValCol"),
        primaryValsPerCat.map { it.max() }.toColumn("max_$primaryValCol"),
        categories.indices.map { i -> base + i * step }.toColumn("${secondaryValCol}_level")
    )
}

val hwyBracketsData = getContinuousBracketsData(mpgDf, "drv", "hwy", "cty", 0.1)
val ctyBracketsData = getContinuousBracketsData(mpgDf, "drv", "cty", "hwy", 0.05)

letsPlot() +
    geomPoint(data = mpg, alpha = .25, showLegend = false) { x = "hwy"; y = "cty"; color = "drv" } +
    // Horizontal brackets for hwy ranges
    geomBracket(data = hwyBracketsData.toMap()) {
        xmin = "min_hwy"; xmax = "max_hwy"; y = "cty_level"; label = "drv"; color = "drv"
    } +
    // Vertical brackets for cty ranges
    geomBracket(data = ctyBracketsData.toMap()) {
        x = "hwy_level"; ymin = "min_cty"; ymax = "max_cty"; label = "drv"; color = "drv"
    } +
    themeMinimal() +
    labs(x = "hwy", y = "cty")
Out[17]:
r 4 f r 4 f 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 cty hwy