Core Abstractions API

veloxquant_mlx.core

Abstract base classes

veloxquant_mlx.core.abstractions

All concrete implementations subclass these ABCs. You should program to these interfaces when building custom integrations.

Quantizer

from veloxquant_mlx.core.abstractions import Quantizer

class Quantizer(ABC):
    @abstractmethod
    def encode(self, x: mx.array) -> EncodedVector: ...

    @abstractmethod
    def decode(self, encoded: EncodedVector) -> mx.array: ...

    def encode_values(self, x: mx.array) -> EncodedVector:
        return self.encode(x)

    def decode_values(self, encoded: EncodedVector) -> mx.array:
        return self.decode(encoded)

All quantizers implement encode and decode with these signatures:

encode(x) — input shape [batch, heads, seq, head_dim], returns EncodedVector
decode(encoded) — returns mx.array of shape [batch, heads, seq, head_dim]

KVCache

from veloxquant_mlx.core.abstractions import KVCache

class KVCache(ABC):
    @abstractmethod
    def update(self, keys: mx.array, values: mx.array) -> tuple[mx.array, mx.array]: ...

    @property
    @abstractmethod
    def state(self) -> tuple[mx.array, mx.array]: ...

update(keys, values) is called once per generation step. It writes the new keys/values to the compressed cache and returns the full (dequantized) cache for attention computation.

Preconditioner

from veloxquant_mlx.core.abstractions import Preconditioner

Linear transforms applied before quantization.

class Preconditioner(ABC):
    @abstractmethod
    def apply(self, x: mx.array) -> mx.array: ...

    @abstractmethod
    def inverse(self, x: mx.array) -> mx.array: ...

Codebook

from veloxquant_mlx.core.abstractions import Codebook

class Codebook(ABC):
    @abstractmethod
    def quantize(self, x: mx.array) -> mx.array: ...   # returns indices

    @abstractmethod
    def dequantize(self, indices: mx.array) -> mx.array: ...

Context types

veloxquant_mlx.core.context

EncodedVector

from veloxquant_mlx.core.context import EncodedVector

@dataclass
class EncodedVector:
    indices: mx.array           # packed integer codes
    scale: mx.array | None      # per-channel or per-block scale
    metadata: dict              # algorithm-specific extra data
    original_shape: tuple[int, ...]
    dtype: mx.Dtype             # original dtype (usually mx.float16)

EncodedVector is the currency passed between encode() and decode(). Different algorithms store different things in metadata (e.g., cluster IDs for RaBitQ, rotation info for SpectralQuant).

QuantizationContext

from veloxquant_mlx.core.context import QuantizationContext

Request-scoped context passed through a quantization pipeline.

@dataclass
class QuantizationContext:
    layer_name: str
    step: int           # generation step (0-indexed)
    config: KVCacheConfig
    artifacts: ArtifactStore

TransformResult

from veloxquant_mlx.core.context import TransformResult

Output of a Preconditioner.apply() call, including pre-transform metadata needed for the inverse.

Registry

from veloxquant_mlx.core.registry import QuantizerRegistry

Plugin registry for quantizer discovery.

# Register a custom quantizer
@QuantizerRegistry.register("my_quantizer")
class MyQuantizer(Quantizer):
    def encode(self, x): ...
    def decode(self, encoded): ...

# Create by name
q = QuantizerFactory.create("my_quantizer", bits=2)

CLI reference

The veloxquant_mlx package exposes two CLI commands via python -m veloxquant_mlx:

`precompute`

python -m veloxquant_mlx precompute \
    --method {vecinfer,spectral,ratequant} \
    --model MODEL_PATH \
    --output ARTIFACT_DIR \
    [--num-samples N] \
    [--sequence-length L] \
    [--target-bits BITS]

Runs calibration for the specified method and saves artifacts.

`benchmark`

python -m veloxquant_mlx benchmark \
    --model MODEL_PATH \
    --method METHOD \
    --bits BITS \
    [--value-bits BITS] \
    [--seq-len SEQ] \
    [--num-runs N] \
    [--output JSON_PATH]

Benchmarks a configuration and prints/saves metrics. See Benchmarking guide.

Abstract base classes​

Quantizer​

KVCache​

Preconditioner​

Codebook​

Context types​

EncodedVector​

QuantizationContext​

TransformResult​

Registry​

CLI reference​

precompute​

benchmark​

See also​