Download notebook (.ipynb)

Annotating Regression with smooth_labels()#

The geom_smooth() layer includes a labels parameter designed to display statistical summaries of the fitted model directly on the plot. This parameter accepts a smooth_labels() object, which provides access to model-specific variables like $R^2$ and the regression equation.

smooth_labels() is a specialized annotation helper designed for smooth layers. It shares the same logic and API as the standard layer_labels() function, supporting familiar methods like .line(), .format(), and .size().

However, it is uniquely equipped to handle regression-specific statistics and markers that are not available in other layers.

Supported variables and markers:

  • ..r2..R² (coefficient of determination). A goodness-of-fit measure showing what fraction of the variance in the response is explained by the fitted model. Values are typically between 0 and 1 (higher means the model explains more of the observed variation).

  • ..adjr2..adjusted R². A variant of R² that accounts for model complexity: it penalizes adding extra terms/parameters and is therefore more suitable for comparing models with different numbers of predictors (e.g., different polynomial degrees). Adjusted R² can be lower than R² and may even be negative for a very poor fit.

  • ..aic..Akaike Information Criterion (AIC) of the fitted model.

  • ..bic..Bayesian Information Criterion (BIC) of the fitted model.

  • ..f.. — F-statistic for the overall model significance test.

  • ..df1.. — numerator degrees of freedom for the F-test.

  • ..df2.. — denominator degrees of freedom for the F-test.

  • ..p.. — p-value for the overall model F-test.

  • ..method.. — smoothing method label (lm or loess).

  • ..n.. — number of observations used in model fitting.

  • ..cilevel.. — confidence level used for the R² confidence interval.

  • ..cilow.. — lower bound of the confidence interval for R².

  • ..cihigh.. — upper bound of the confidence interval for R².

  • ~eqfitted equation. Inserts the model equation into the annotation (can be configured with eq()).

import numpy as np

from lets_plot import *
LetsPlot.setup_html()
np.random.seed(42)
plot = ggplot({'x': [0, 1.5, 1.7, 2], 'y': [0, 1, 1.8, 4]}, aes('x', 'y')) + geom_point() 

Basic Annotation#

By default, smooth_labels() adds the Coefficient of Determination ($R^2$) to the plot without requiring additional configuration.

plot + geom_smooth(
    deg=2, 
    labels = smooth_labels()    # <-- Default displays R² value.
)

Customizing Content and Style#

plot + geom_smooth(
    deg=2, 
    labels = smooth_labels()
        .line(r'\(R\^2=\)@..r2..')  # Add custom R² label using LaTeX notation
        .line('~eq')                # Add the auto-generated equation on a separate line
        .size(20)) + \
    theme(label_text=element_text(
        family='DejaVu Sans',
        face='bold-italic',
        color='gray60'
    ))
t = [0.0, 1.5, 1.7, 2.0, 0, 0.5, 0.7, 2]
y = [0.0, 1.0, 1.8, 4.0, 2, 5.5, 6.0, 4.5]
g = ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']

plot_groups = ggplot({'t': t, 'y': y, 'g': g}, aes(x ='t', y='y', color='g')) + geom_point(show_legend=False) 

Grouping and Equation Customization#

plot_groups + geom_smooth(
    deg=2,
    show_legend=False,
    labels = smooth_labels()
        .line(r'\(R\^2=\)@..r2.., \(R_{{adj}}\^2=\)@..adjr2.., ~eq')  # Combine all markers into a single line
        .eq(lhs='y(t)', rhs='t', format='.3f', threshold=0.01)        # Set variable name, precision, and hide small coefficients
        .format('..r2..', '.4f')
        .format('..adjr2..', '.4f')
        .label_x(['right', 'left'])                   # Position labels horizontally
        .label_y(['bottom', 'top'])                   # Position labels vertically
        .inherit_color()
        .size(20)) + theme_classic()