Scaling¶
Per-layer feature scaling for multi-layer probes.
lmprobe.scaling.PerLayerScaler ¶
Standardize features on a per-layer basis.
When using multiple layers (concatenated), each layer may have different activation magnitude distributions. This scaler normalizes each layer's features to zero mean and unit variance.
Two strategies are available: - "per_neuron": Each neuron gets its own mean/std (more parameters, higher variance) - "per_layer": All neurons in a layer share one mean/std (fewer parameters, lower variance)
The "per_layer" strategy may be preferable when: - Sample size is small relative to hidden dimension - Neurons within a layer have similar activation distributions (symmetry assumption)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_layers
|
int
|
Number of layers in the concatenated features. |
required |
hidden_dim
|
int
|
Hidden dimension per layer (features per layer). |
required |
strategy
|
str
|
Scaling strategy: - "per_neuron": Each neuron has its own mean/std - "per_layer": All neurons in a layer share one mean/std |
"per_neuron"
|
Attributes:
| Name | Type | Description |
|---|---|---|
means_ |
ndarray | None
|
Feature means. Shape depends on strategy: - "per_neuron": (n_layers, hidden_dim) - "per_layer": (n_layers,) |
stds_ |
ndarray | None
|
Feature standard deviations. Shape matches means_. |
Examples:
>>> scaler = PerLayerScaler(n_layers=3, hidden_dim=128, strategy="per_neuron")
>>> X_train_scaled = scaler.fit_transform(X_train)
>>> X_test_scaled = scaler.transform(X_test)
>>> # Use per_layer for small sample sizes
>>> scaler = PerLayerScaler(n_layers=3, hidden_dim=128, strategy="per_layer")
fit ¶
Compute per-layer means and standard deviations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature matrix, shape (n_samples, n_layers * hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
PerLayerScaler
|
Self, for method chaining. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If X has wrong number of features. |
transform ¶
Apply per-layer standardization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature matrix, shape (n_samples, n_layers * hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Standardized features, same shape as input. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If scaler has not been fitted. |
ValueError
|
If X has wrong number of features. |
fit_transform ¶
Fit scaler and transform data in one step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature matrix, shape (n_samples, n_layers * hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Standardized features, same shape as input. |
inverse_transform ¶
Reverse the standardization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Standardized feature matrix, shape (n_samples, n_layers * hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Original-scale features, same shape as input. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If scaler has not been fitted. |
get_layer_stats ¶
Get per-layer statistics for analysis.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with layer-level statistics. Contents depend on strategy: - "per_neuron": includes 'mean_norms', 'std_norms', 'mean_per_layer', 'std_per_layer' - "per_layer": includes 'means', 'stds' (the raw per-layer values) |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If scaler has not been fitted. |