Cache¶
Functions and classes for managing the activation cache.
Configuration¶
lmprobe.cache.set_cache_backend ¶
Set the cache storage backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend
|
CacheBackend | str | None
|
|
required |
lmprobe.cache.set_cache_limit ¶
Set maximum cache size in GB for LRU eviction.
This sets the target size cap. To actually enforce it, call
evict() — eviction is intentionally decoupled from writes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gb
|
float | None
|
Maximum cache size in GB. None disables the limit. 0 disables caching entirely. |
None
|
lmprobe.cache.set_cache_dtype ¶
Set cache storage dtype for 2x disk reduction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtype
|
str | None
|
Storage dtype: "float16", "bfloat16", "float32", or None (no conversion). |
None
|
lmprobe.cache.enable_cache_logging ¶
Enable cache logging to see cache hit/miss information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
Logging level. Use logging.INFO for basic hit/miss info, logging.DEBUG for detailed cache operations. |
INFO
|
Inspection¶
lmprobe.cache.cache_info ¶
Report cache size and breakdown.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str | None
|
If provided, only report on this model. Otherwise, report all models. |
None
|
Returns:
| Type | Description |
|---|---|
CacheInfo
|
Structured cache usage report. |
lmprobe.cache.CacheInfo
dataclass
¶
Cache usage report.
lmprobe.cache.ModelCacheInfo
dataclass
¶
Cache info for a single model.
lmprobe.cache.discover_cached ¶
Introspect what's cached for a model+prompt combination.
Returns None if nothing is cached. Otherwise returns a
:class:CachedPromptInfo describing available layers, pooling
strategies, logits, etc.
This is the public API for cache introspection — sharing.py
and other modules should use this rather than parsing internal
cache key names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
HuggingFace model ID. |
required |
prompt
|
str
|
The prompt text. |
required |
Returns:
| Type | Description |
|---|---|
CachedPromptInfo | None
|
Description of cached tensors, or None if nothing is cached. |
lmprobe.cache.CachedPromptInfo
dataclass
¶
What's cached for a single prompt.
Returned by :func:discover_cached to describe the available
tensors without loading any data.
Cache operations¶
lmprobe.cache.clear_cache ¶
Clear all cached activations (both v1 and v2 formats).
Returns the number of cache entries deleted.
lmprobe.cache.invalidate_extraction_cache ¶
Delete all cached data for an extraction (both v1 and v2).
CachedExtractor¶
lmprobe.cache.CachedExtractor ¶
Wraps an ActivationExtractor with per-prompt caching.
Checks which prompts are already cached before extraction, and only extracts the missing ones. Saves after each batch for interrupt resilience.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
extractor
|
ActivationExtractor
|
The underlying extractor. |
required |
extract ¶
extract(prompts: list[str], remote: bool = False, invalidate_cache: bool = False, max_retries: int | None = None, cache_only: bool = False) -> tuple[torch.Tensor, torch.Tensor] | None
Extract activations, using per-prompt cache when available.