Pooling¶
Functions for aggregating token-level activations into a single representation.
Pooling strategies can be prefixed with score: (post-probe) or activation: (pre-probe) to control when reduction happens. See parse_pooling_strategy for details.
lmprobe.pooling.parse_pooling_strategy ¶
Parse a pooling strategy string into its components.
Supports prefix convention: "score:mean", "activation:max",
or bare names like "mean" (uses default stage).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
str
|
Pooling strategy, optionally prefixed with |
required |
Returns:
| Type | Description |
|---|---|
ParsedPooling
|
Parsed components. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the strategy or prefix is not recognized. |
lmprobe.pooling.get_pooling_fn ¶
Get the pooling function for a strategy name.
For score-level pooling ("score:mean", "max", etc.), this returns
pool_all so that all token activations are preserved for classification
before score reduction.
For activation-level pooling ("mean", "activation:max", etc.),
this returns the appropriate activation pooling function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
str
|
Name of the pooling strategy, optionally prefixed with |
required |
Returns:
| Type | Description |
|---|---|
Callable
|
The pooling function. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the strategy is not recognized. |
lmprobe.pooling.resolve_pooling ¶
resolve_pooling(pooling: str | None, train_pooling: str | None, inference_pooling: str | None) -> tuple[str, str]
Resolve pooling parameters to concrete train/inference strategies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pooling
|
str | None
|
Base pooling strategy for both train and inference. |
required |
train_pooling
|
str | None
|
Override for training. Takes precedence over pooling. |
required |
inference_pooling
|
str | None
|
Override for inference. Takes precedence over pooling. |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, str]
|
(train_strategy, inference_strategy) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no pooling strategy is specified, or if invalid strategies are used. |
lmprobe.pooling.pool_last_token ¶
Extract the last non-padding token's activation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activations
|
Tensor
|
Shape (batch, seq_len, hidden_dim) |
required |
attention_mask
|
Tensor | None
|
Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding (uses last position). |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch, hidden_dim) |
lmprobe.pooling.pool_mean ¶
Compute mean activation across all non-padding tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activations
|
Tensor
|
Shape (batch, seq_len, hidden_dim) |
required |
attention_mask
|
Tensor | None
|
Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch, hidden_dim) |
lmprobe.pooling.pool_first_token ¶
Extract the first token's activation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activations
|
Tensor
|
Shape (batch, seq_len, hidden_dim) |
required |
attention_mask
|
Tensor | None
|
Ignored for first_token pooling (first token is never padding). |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch, hidden_dim) |
lmprobe.pooling.pool_max ¶
Compute max activation per dimension across all non-padding tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activations
|
Tensor
|
Shape (batch, seq_len, hidden_dim) |
required |
attention_mask
|
Tensor | None
|
Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch, hidden_dim) |
lmprobe.pooling.pool_min ¶
Compute min activation per dimension across all non-padding tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activations
|
Tensor
|
Shape (batch, seq_len, hidden_dim) |
required |
attention_mask
|
Tensor | None
|
Shape (batch, seq_len). 1 for real tokens, 0 for padding. If None, assumes no padding. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch, hidden_dim) |
lmprobe.pooling.pool_all ¶
Return all token activations unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activations
|
Tensor
|
Shape (batch, seq_len, hidden_dim) |
required |
attention_mask
|
Tensor | None
|
Not used, but accepted for API consistency. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch, seq_len, hidden_dim) - unchanged |
lmprobe.pooling.reduce_scores ¶
Reduce per-token scores to a single score per sequence.
Used for score-level pooling after classification. Supports all base
strategies: max, min, mean, last_token, first_token.
The strategy may include a score: prefix (e.g., "score:mean"),
which is stripped before processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
Tensor
|
Shape (batch, seq_len) or (batch, seq_len, n_classes) |
required |
strategy
|
str
|
Base strategy name (e.g., |
required |
attention_mask
|
Tensor | None
|
Shape (batch, seq_len). 1 for real tokens, 0 for padding. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Shape (batch,) or (batch, n_classes) |