Layer Selection & Sweep¶

Different layers of a language model encode different information. Finding the right layer (or combination of layers)can be an impactful tuning decision.

Layer specification syntax¶

Spec	Description
`16`	Single layer (negative indexing: `-1` = last)
`[14, 15, 16]`	Multiple layers (concatenated)
`"middle"`	Middle third of layers
`"last"`	Last layer only
`"all"`	All layers concatenated
`"auto"`	Automatic via Group Lasso (`pip install lmprobe[auto]`)
`"fast_auto"`	Fast selection via coefficient importance
`"sweep"`	Train independent probe per layer
`"sweep:10"`	Sweep every 10th layer
`"sweep:55-65"`	Sweep layers 55 through 65

Layer sweep¶

Train an independent probe at every layer to identify which are most informative:

result = Probe.sweep_layers(
    model="meta-llama/Llama-3.1-8B-Instruct",
    positive_prompts=positive_prompts,
    negative_prompts=negative_prompts,
    layers="all",
    classifier="ridge",
)

# Score all layers
scores = result.score(test_prompts, test_labels)
# {0: 0.52, 1: 0.55, ..., 31: 0.78}

# Best layer
best = result.best_layer(test_prompts, test_labels)
print(f"Best layer: {best}")

# Predict with a specific layer's probe
preds = result.probes[best].predict(test_prompts)

You can also use sweep as a layer spec string, useful when you want to sweep as part of a normal Probe workflow:

probe = Probe(model=model, layers="sweep")        # sweep all
probe = Probe(model=model, layers="sweep:10")      # every 10th
probe = Probe(model=model, layers="sweep:55-65")   # a range

Layer importance analysis¶

When using multiple layers (e.g., layers="all"), compute per-layer importance from the fitted classifier's coefficients:

probe = Probe(
    model="meta-llama/Llama-3.1-8B-Instruct",
    layers="all",
    classifier="ridge",
)

probe.fit(positive_prompts, negative_prompts)

importances = probe.compute_layer_importance(metric="l2")
# array([0.03, 0.05, ..., 0.42]) — shape (n_layers,), sums to 1.0
# Use probe.candidate_layers_ to map index → layer number
best_idx = importances.argmax()
print(f"Most important layer: {probe.candidate_layers_[best_idx]}")

Fast auto layer selection¶

Automatically select the most important layers using importance analysis, then refit on just those layers:

probe = Probe(
    model="meta-llama/Llama-3.1-8B-Instruct",
    layers="fast_auto",
    fast_auto_top_k=3,       # keep top 3 layers
    normalize_layers=True,
)

probe.fit(positive_prompts, negative_prompts)
print(f"Selected layers: {probe.selected_layers_}")

This is a two-stage process: first fit on all layers, then refit on the top-k layers by importance.

Automatic layer selection via Group Lasso¶

Use structured sparsity (Group Lasso) to let the optimizer choose which layers to keep. More principled than fast_auto but slower:

# Requires: pip install lmprobe[auto]
probe = Probe(
    model="meta-llama/Llama-3.1-8B-Instruct",
    layers="auto",
    auto_candidates=[0.25, 0.5, 0.75],  # fractional positions or explicit indices
    auto_alpha=0.01,                     # regularization strength
)

probe.fit(positive_prompts, negative_prompts)
print(f"Selected layers: {probe.selected_layers_}")

Practical guidance¶

Where to start:

Middle layers (12–20 in a 32-layer model) are often best for semantic properties
Last layer is usually best for surface/output-level properties
First few layers are mostly syntactic/positional

When to sweep:

When you have no prior knowledge about the task
When you want to verify your layer choice is principled

When to use "all" with concatenation:

When signal is distributed across many layers
Combined with PCA or Group Lasso to manage dimensionality

When to use fast_auto:

When you want a data-driven choice without the overhead of Group Lasso
Good default when you don't want to hand-tune layers