Skip to content

Cache

Functions and classes for managing the activation cache.


Configuration

lmprobe.cache.set_cache_backend

set_cache_backend(backend: CacheBackend | str | None) -> None

Set the cache storage backend.

Parameters:

Name Type Description Default
backend CacheBackend | str | None
  • A CacheBackend instance
  • A URI string (e.g. "s3://bucket/prefix" or "/path/to/cache")
  • None to reset to default (lazy re-initialization)
required

lmprobe.cache.set_cache_limit

set_cache_limit(gb: float | None = None) -> None

Set maximum cache size in GB for LRU eviction.

This sets the target size cap. To actually enforce it, call evict() — eviction is intentionally decoupled from writes.

Parameters:

Name Type Description Default
gb float | None

Maximum cache size in GB. None disables the limit. 0 disables caching entirely.

None

lmprobe.cache.set_cache_dtype

set_cache_dtype(dtype: str | None = None) -> None

Set cache storage dtype for 2x disk reduction.

Parameters:

Name Type Description Default
dtype str | None

Storage dtype: "float16", "bfloat16", "float32", or None (no conversion).

None

lmprobe.cache.enable_cache_logging

enable_cache_logging(level: int = logging.INFO) -> None

Enable cache logging to see cache hit/miss information.

Parameters:

Name Type Description Default
level int

Logging level. Use logging.INFO for basic hit/miss info, logging.DEBUG for detailed cache operations.

INFO

Inspection

lmprobe.cache.cache_info

cache_info(model: str | None = None) -> CacheInfo

Report cache size and breakdown.

Parameters:

Name Type Description Default
model str | None

If provided, only report on this model. Otherwise, report all models.

None

Returns:

Type Description
CacheInfo

Structured cache usage report.

lmprobe.cache.CacheInfo dataclass

Cache usage report.

lmprobe.cache.ModelCacheInfo dataclass

Cache info for a single model.

lmprobe.cache.discover_cached

discover_cached(model_name: str, prompt: str) -> CachedPromptInfo | None

Introspect what's cached for a model+prompt combination.

Returns None if nothing is cached. Otherwise returns a :class:CachedPromptInfo describing available layers, pooling strategies, logits, etc.

This is the public API for cache introspection — sharing.py and other modules should use this rather than parsing internal cache key names.

Parameters:

Name Type Description Default
model_name str

HuggingFace model ID.

required
prompt str

The prompt text.

required

Returns:

Type Description
CachedPromptInfo | None

Description of cached tensors, or None if nothing is cached.

lmprobe.cache.CachedPromptInfo dataclass

What's cached for a single prompt.

Returned by :func:discover_cached to describe the available tensors without loading any data.


Cache operations

lmprobe.cache.clear_cache

clear_cache() -> int

Clear all cached activations (both v1 and v2 formats).

Returns the number of cache entries deleted.

lmprobe.cache.invalidate_extraction_cache

invalidate_extraction_cache(cache_dir: Path) -> None

Delete all cached data for an extraction (both v1 and v2).


CachedExtractor

lmprobe.cache.CachedExtractor

Wraps an ActivationExtractor with per-prompt caching.

Checks which prompts are already cached before extraction, and only extracts the missing ones. Saves after each batch for interrupt resilience.

Parameters:

Name Type Description Default
extractor ActivationExtractor

The underlying extractor.

required

extract

extract(prompts: list[str], remote: bool = False, invalidate_cache: bool = False, max_retries: int | None = None, cache_only: bool = False) -> tuple[torch.Tensor, torch.Tensor] | None

Extract activations, using per-prompt cache when available.