Skip to content

Pooling

Functions for aggregating token-level activations into a single representation.

Pooling strategies can be prefixed with score: (post-probe) or activation: (pre-probe) to control when reduction happens. See parse_pooling_strategy for details.


lmprobe.pooling.parse_pooling_strategy

parse_pooling_strategy(strategy: str) -> ParsedPooling

Parse a pooling strategy string into its components.

Supports prefix convention: "score:mean", "activation:max", or bare names like "mean" (uses default stage).

Parameters:

Name Type Description Default
strategy str

Pooling strategy, optionally prefixed with score: or activation:.

required

Returns:

Type Description
ParsedPooling

Parsed components.

Raises:

Type Description
ValueError

If the strategy or prefix is not recognized.


lmprobe.pooling.get_pooling_fn

get_pooling_fn(strategy: str) -> Callable[[torch.Tensor, torch.Tensor | None], torch.Tensor]

Get the pooling function for a strategy name.

For score-level pooling ("score:mean", "max", etc.), this returns pool_all so that all token activations are preserved for classification before score reduction.

For activation-level pooling ("mean", "activation:max", etc.), this returns the appropriate activation pooling function.

Parameters:

Name Type Description Default
strategy str

Name of the pooling strategy, optionally prefixed with score: or activation:.

required

Returns:

Type Description
Callable

The pooling function.

Raises:

Type Description
ValueError

If the strategy is not recognized.

lmprobe.pooling.resolve_pooling

resolve_pooling(pooling: str | None, train_pooling: str | None, inference_pooling: str | None) -> tuple[str, str]

Resolve pooling parameters to concrete train/inference strategies.

Parameters:

Name Type Description Default
pooling str | None

Base pooling strategy for both train and inference.

required
train_pooling str | None

Override for training. Takes precedence over pooling.

required
inference_pooling str | None

Override for inference. Takes precedence over pooling.

required

Returns:

Type Description
tuple[str, str]

(train_strategy, inference_strategy)

Raises:

Type Description
ValueError

If no pooling strategy is specified, or if invalid strategies are used.

lmprobe.pooling.pool_last_token

pool_last_token(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Extract the last non-padding token's activation.

Parameters:

Name Type Description Default
activations Tensor

Shape (batch, seq_len, hidden_dim)

required
attention_mask Tensor | None

Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding (uses last position).

None

Returns:

Type Description
Tensor

Shape (batch, hidden_dim)

lmprobe.pooling.pool_mean

pool_mean(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Compute mean activation across all non-padding tokens.

Parameters:

Name Type Description Default
activations Tensor

Shape (batch, seq_len, hidden_dim)

required
attention_mask Tensor | None

Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding.

None

Returns:

Type Description
Tensor

Shape (batch, hidden_dim)

lmprobe.pooling.pool_first_token

pool_first_token(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Extract the first token's activation.

Parameters:

Name Type Description Default
activations Tensor

Shape (batch, seq_len, hidden_dim)

required
attention_mask Tensor | None

Ignored for first_token pooling (first token is never padding).

None

Returns:

Type Description
Tensor

Shape (batch, hidden_dim)

lmprobe.pooling.pool_max

pool_max(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Compute max activation per dimension across all non-padding tokens.

Parameters:

Name Type Description Default
activations Tensor

Shape (batch, seq_len, hidden_dim)

required
attention_mask Tensor | None

Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding.

None

Returns:

Type Description
Tensor

Shape (batch, hidden_dim)

lmprobe.pooling.pool_min

pool_min(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Compute min activation per dimension across all non-padding tokens.

Parameters:

Name Type Description Default
activations Tensor

Shape (batch, seq_len, hidden_dim)

required
attention_mask Tensor | None

Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding.

None

Returns:

Type Description
Tensor

Shape (batch, hidden_dim)

lmprobe.pooling.pool_all

pool_all(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Return all token activations unchanged.

Parameters:

Name Type Description Default
activations Tensor

Shape (batch, seq_len, hidden_dim)

required
attention_mask Tensor | None

Not used, but accepted for API consistency.

None

Returns:

Type Description
Tensor

Shape (batch, seq_len, hidden_dim) - unchanged

lmprobe.pooling.reduce_scores

reduce_scores(scores: Tensor, strategy: str, attention_mask: Tensor | None = None) -> torch.Tensor

Reduce per-token scores to a single score per sequence.

Used for score-level pooling after classification. Supports all base strategies: max, min, mean, last_token, first_token.

The strategy may include a score: prefix (e.g., "score:mean"), which is stripped before processing.

Parameters:

Name Type Description Default
scores Tensor

Shape (batch, seq_len) or (batch, seq_len, n_classes)

required
strategy str

Base strategy name (e.g., "max", "mean", "score:mean").

required
attention_mask Tensor | None

Shape (batch, seq_len). 1 for real tokens, 0 for padding.

None

Returns:

Type Description
Tensor

Shape (batch,) or (batch, n_classes)