Changelog
All notable changes to VeloxQuant-MLX are documented here.
v0.7.0 — Latest
New
- RaBitQ — randomised Hadamard + 1-bit sign packing with IVF clustering for extreme key compression
- SpectralQuant — eigenvector-rotated quantization with signal/noise codebooks and water-filling bit allocation
- CommVQ — RoPE-commutative residual VQ for exact positional encoding compatibility
SpectralQuantKVCache,PolarQuantKVCache— new cache wrapperscalibrate_spectral_rotation(),save_rotations(),load_cached_rotations()compute_participation_ratio(),compute_spectral_gap()water_fill_bits()— per-dimension water-filling allocatorrabitq_hamming_score— Metal XOR+popcount Hamming distance kernelcomm_vq_decode_metal— fused centroid gather + RoPE Metal kernel- 212+ passing tests
Changed
KVCacheConfiggainssignal_bits,noise_bits,rotationsfields for SpectralQuantKVCacheFactoryandKVCacheBuilderupdated for all new cache types
v0.6.0
New
- PolarQuant — recursive polar coordinate decomposition for spherical key distributions
PolarQuantizer,PolarQuantKVCacheCommVQQuantizer— first version (flat codebook, no Metal fusion yet)TurboQuantProdAdaptive— distortion-driven dynamic bit allocation
Changed
CompositeQuantizer— supports arbitrary-depth chains; cycle detection viaCyclicPipelineError
v0.5.1
New
- Metal GPU kernels for VecInfer — hand-written Metal Shading Language shaders replacing pure-MLX hot paths
vecinfer_quantize_metal— fused nearest-centroid argmin, 13× speedup, 98% peak-memory reductionvecinfer_dequant_metal— bit-exact drop-in fordequantize_vqmetal_available()— capability probe
KVCacheConfig.use_metal_kernels— three-state flag (None= auto-detect,True= require,False= force MLX)VecInferKVCachenow dispatches to Metal kernels when available (zero API change)- 7 new parity tests in
tests/cache/test_vecinfer_metal_parity.py
v0.5.0
New
- VecInfer — product VQ with outlier-suppressing dual transform
calibrate_smooth_factors()— per-channelλᵢ = √max|Kᵢ|walsh_hadamard_matrix(),apply_dual_transform_keys/queries()train_codebook(),quantize_vq(),dequantize_vq()compute_query_lut()— fused-score fast path
VecInferKVCache— mlx_lm-compatible cache withupdate_and_fetch- Benchmarks: 8× key compression at 2-bit, 16× at 1-bit on Llama-3.2-1B/3B
Notes
- Throughput trades slightly vs fp16 (CUDA kernel fusion not available on Metal at this version)
v0.3.6
Breaking change
- Package renamed:
mlx_kv_quant→veloxquant_mlx - All imports must be updated:
from mlx_kv_quant import ...→from veloxquant_mlx import ... - No backward-compatibility shim
v0.3.5
New
- RateQuant becomes a first-class feature
allocate_bits_ratequant()— reverse-waterfilling allocator (arxiv:2605.06675)calibrate_layer_sensitivities()— activation-norm sensitivity probe (1.6s)fit_distortion_curve()— fitsD(b) = α·β^(-b)per layer
TurboQuantRVQKVCache— mlx_lm-compatible cache wrapper for RVQKeyNormObserver,KeyNormReport— per-token key norm trackingKVCacheConfig.bit_width_inlieracceptslist[int]for per-layer allocation- 27 new tests (187 total passing)
Results (M4 24 GB)
| Model | fp16 PPL | RVQ 1-bit | RateQuant 1.5-bit | Compression |
|---|---|---|---|---|
| Falcon3 7B | 22.9 | 23.1 | 22.8 | 5.22× |
| Gemma3 4B | 39.8 | 37.8 | 36.3 | 5.22× |
v0.3.0
New
- QJL — Johnson-Lindenstrauss 1-bit sign sketch cache
QJLQuantizer,QJLKVCacheqjl_encode,qjl_inner_productMetal kernelsDistortionObserver— cosine similarity and IP error trackingLatencyObserver— encode/decode timing profilingMemoryObserver— peak memory and compression ratio
v0.2.0
New
- TurboQuant RVQ — two-pass residual VQ with Gaussian + Laplacian codebooks
TurboQuantRVQquantizer with Walsh-Hadamard preprocessingturboquant_scalar_quantize,turboquant_hadamard_quantizeMetal kernelsturboquant_bit_pack,turboquant_bit_unpack— sub-byte packingKVCacheConfig,KVCacheFactory,KVCacheBuilder— unified configuration APINpyArtifactStore,MemoryArtifactStore— artifact persistenceQuantizerRegistry— plugin registration
v0.1.0
Initial release
- Core abstractions:
Quantizer,KVCache,Preconditioner,CodebookABCs TurboQuantMSE— MSE-optimal rotation + Lloyd-Max scalar quantizationScalarCodebook,AdaptiveScalarCodebookRotationPreconditioner,JLSketchPreconditionerRingBuffer,AVLTree,BitPackBufferdata structures- Basic test suite (48 tests)
Full commit-level history: GitHub Commits