Independent Research / 2026
Adaptive Tensor-Network Compression of LLMs: An Extension of CompactifAI
Layer-sensitive MPO tensorization with policy-guided bond dimensions
Memory reduction
93%
Best reproduced tensor-network compression setting.
Adaptive gain
+1.2%
Additional recovered accuracy over uniform schedules.
Benchmarks
5
MMLU, HellaSwag, BoolQ, TriviaQA, and GSM8K.
Abstract
This project reproduces and extends CompactifAI-style tensor-network compression on real open-weight LLMs. It profiles layer sensitivity, replaces uniform bond dimensions with adaptive schedules, and evaluates healing runs across standard language benchmarks.
Problem Statement
Uniform tensor-network compression treats transformer blocks as equally redundant, but LLMs show layer-specific fragility. The research question is whether adaptive bond-dimension assignment can preserve downstream quality at the same compression ratio.
Methodology
- Implemented Matrix Product Operator tensorization for self-attention and MLP matrices using sequential SVD.
- Swept bond dimension chi from 10 to 90 independently across attention blocks and layer types.
- Trained a REINFORCE policy to assign per-block bond dimensions using downstream MMLU accuracy as reward.
- Combined adaptive MPO schedules with model healing and optional soft gating adapters.
Experimental Design
- Reproduced baseline compression behavior on LLaMA-3.2-1B and Qwen2.5-1.5B style targets.
- Profiled 32 attention blocks and seven layer families before constructing a non-uniform compression schedule.
- Ran one epoch of Alpaca-style healing after compression.
- Evaluated with lm-evaluation-harness across MMLU, HellaSwag, BoolQ, TriviaQA, and GSM8K.
Results
- Matched the original 93% memory-reduction target at 1B-scale reproduction settings.
- Found initial blocks collapse below chi=50 while terminal blocks tolerate chi=10 with under 1% MMLU drop.
- Adaptive policy recovered 1.2% additional accuracy at matched compression versus uniform chi baselines.
- 70% parameter reduction produced an observed 2% to 3% downstream accuracy drop after healing.
Limitations
- The reproduction is computationally verifiable at smaller model scale, not a full 7B training campaign.
- Policy search cost grows with block count and benchmark feedback latency.
- Compression interacts with quantization and adapter healing in ways that need more isolation.
Future Directions
- Replace REINFORCE with differentiable schedule search or bandit-style block allocation.
- Profile attention heads and MLP projections separately instead of block-level schedules.
- Test adaptive tensorization under long-context inference and KV-cache pressure.
- Publish per-layer sensitivity traces as reusable compression priors.
References
BibTeX
@misc{pal2026compactifai,
title={Adaptive Tensor-Network Compression of LLMs: An Extension of CompactifAI},
author={Pal, Mani},
year={2026},
note={Independent research manuscript}
}