Pooling¶

Functions for aggregating token-level activations into a single representation.

Pooling strategies can be prefixed with score: (post-probe) or activation: (pre-probe) to control when reduction happens. See parse_pooling_strategy for details.

lmprobe.pooling.parse_pooling_strategy ¶

parse_pooling_strategy(strategy: str) -> ParsedPooling

Parse a pooling strategy string into its components.

Supports prefix convention: "score:mean", "activation:max", or bare names like "mean" (uses default stage).

Parameters:

Name	Type	Description	Default
`strategy`	`str`	Pooling strategy, optionally prefixed with `score:` or `activation:`.	required

Returns:

Type	Description
`ParsedPooling`	Parsed components.

Raises:

Type	Description
`ValueError`	If the strategy or prefix is not recognized.

lmprobe.pooling.get_pooling_fn ¶

get_pooling_fn(strategy: str) -> Callable[[torch.Tensor, torch.Tensor | None], torch.Tensor]

Get the pooling function for a strategy name.

For score-level pooling ("score:mean", "max", etc.), this returns pool_all so that all token activations are preserved for classification before score reduction.

For activation-level pooling ("mean", "activation:max", etc.), this returns the appropriate activation pooling function.

Parameters:

Name	Type	Description	Default
`strategy`	`str`	Name of the pooling strategy, optionally prefixed with `score:` or `activation:`.	required

Returns:

Type	Description
`Callable`	The pooling function.

Raises:

Type	Description
`ValueError`	If the strategy is not recognized.

lmprobe.pooling.resolve_pooling ¶

resolve_pooling(pooling: str | None, train_pooling: str | None, inference_pooling: str | None) -> tuple[str, str]

Resolve pooling parameters to concrete train/inference strategies.

Parameters:

Name	Type	Description	Default
`pooling`	`str \| None`	Base pooling strategy for both train and inference.	required
`train_pooling`	`str \| None`	Override for training. Takes precedence over pooling.	required
`inference_pooling`	`str \| None`	Override for inference. Takes precedence over pooling.	required

Returns:

Type	Description
`tuple[str, str]`	(train_strategy, inference_strategy)

Raises:

Type	Description
`ValueError`	If no pooling strategy is specified, or if invalid strategies are used.

lmprobe.pooling.pool_last_token ¶

pool_last_token(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Extract the last non-padding token's activation.

Parameters:

Name	Type	Description	Default
`activations`	`Tensor`	Shape (batch, seq_len, hidden_dim)	required
`attention_mask`	`Tensor \| None`	Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding (uses last position).	`None`

Returns:

Type	Description
`Tensor`	Shape (batch, hidden_dim)

lmprobe.pooling.pool_mean ¶

pool_mean(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Compute mean activation across all non-padding tokens.

Parameters:

Name	Type	Description	Default
`activations`	`Tensor`	Shape (batch, seq_len, hidden_dim)	required
`attention_mask`	`Tensor \| None`	Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding.	`None`

Returns:

Type	Description
`Tensor`	Shape (batch, hidden_dim)

lmprobe.pooling.pool_first_token ¶

pool_first_token(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Extract the first token's activation.

Parameters:

Name	Type	Description	Default
`activations`	`Tensor`	Shape (batch, seq_len, hidden_dim)	required
`attention_mask`	`Tensor \| None`	Ignored for first_token pooling (first token is never padding).	`None`

Returns:

Type	Description
`Tensor`	Shape (batch, hidden_dim)

lmprobe.pooling.pool_max ¶

pool_max(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Compute max activation per dimension across all non-padding tokens.

Parameters:

Name	Type	Description	Default
`activations`	`Tensor`	Shape (batch, seq_len, hidden_dim)	required
`attention_mask`	`Tensor \| None`	Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding.	`None`

Returns:

Type	Description
`Tensor`	Shape (batch, hidden_dim)

lmprobe.pooling.pool_min ¶

pool_min(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Compute min activation per dimension across all non-padding tokens.

Parameters:

Name	Type	Description	Default
`activations`	`Tensor`	Shape (batch, seq_len, hidden_dim)	required
`attention_mask`	`Tensor \| None`	Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding.	`None`

Returns:

Type	Description
`Tensor`	Shape (batch, hidden_dim)

lmprobe.pooling.pool_all ¶

pool_all(activations: Tensor, attention_mask: Tensor | None = None) -> torch.Tensor

Return all token activations unchanged.

Parameters:

Name	Type	Description	Default
`activations`	`Tensor`	Shape (batch, seq_len, hidden_dim)	required
`attention_mask`	`Tensor \| None`	Not used, but accepted for API consistency.	`None`

Returns:

Type	Description
`Tensor`	Shape (batch, seq_len, hidden_dim) - unchanged

lmprobe.pooling.reduce_scores ¶

reduce_scores(scores: Tensor, strategy: str, attention_mask: Tensor | None = None) -> torch.Tensor

Reduce per-token scores to a single score per sequence.

Used for score-level pooling after classification. Supports all base strategies: max, min, mean, last_token, first_token.

The strategy may include a score: prefix (e.g., "score:mean"), which is stripped before processing.

Parameters:

Name	Type	Description	Default
`scores`	`Tensor`	Shape (batch, seq_len) or (batch, seq_len, n_classes)	required
`strategy`	`str`	Base strategy name (e.g., `"max"`, `"mean"`, `"score:mean"`).	required
`attention_mask`	`Tensor \| None`	Shape (batch, seq_len). 1 for real tokens, 0 for padding.	`None`

Returns:

Type	Description
`Tensor`	Shape (batch,) or (batch, n_classes)