| On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters | Efficiency & systems | Mind Lab +39 | Jun 1, 2026 | 224 |
| OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources | Efficiency & systems | Jinheon Baek +7 | May 28, 2026 | 77 |
| Trust-Region Behavior Blending for On-Policy Distillation | Efficiency & systems | Daniil Plyusov +6 | May 29, 2026 | 65 |
| NITP: Next Implicit Token Prediction for LLM Pre-training | Efficiency & systems | Xiangdong Zhang +5 | May 24, 2026 | 35 |
| UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering | Efficiency & systems | Yingdong Shi +6 | May 28, 2026 | 26 |
| KletterMix: Climbing Toward High-Quality German Pretraining Data | Efficiency & systems | Maurice Kraus +6 | Jun 2, 2026 | 17 |
| MobileMoE: Scaling On-Device Mixture of Experts | Efficiency & systems | Yanbei Chen +7 | May 26, 2026 | 15 |
| Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference | Efficiency & systems | Sangyun Lee +3 | May 25, 2026 | 12 |
| Reflective Prompt Tuning through Language Model Function-Calling | Efficiency & systems | Farima Fatahi Bayat +3 | May 20, 2026 | 9 |
| Skip a Layer or Loop It? Learning Program-of-Layers in LLMs | Efficiency & systems | Ziyue Li +2 | Jun 4, 2026 | 8 |
| Value-Aware Stochastic KV Cache Eviction for Reasoning Models | Efficiency & systems | Ting-Yun Chang +5 | Jun 2, 2026 | 8 |
| PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective | Efficiency & systems | Yangyi Huang +6 | May 27, 2026 | 8 |
| Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency | Efficiency & systems | Itay Elam +3 | Jun 5, 2026 | 7 |
| The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs | Efficiency & systems | Xu Wan +6 | Jun 2, 2026 | 7 |
| αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion | Efficiency & systems | Xiang Zhang +5 | May 29, 2026 | 7 |
| Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation | Efficiency & systems | M. Ali Bayram +2 | May 28, 2026 | 6 |
| The Hidden Power of Scaling Factor in LoRA Optimization | Efficiency & systems | Zicheng Zhang +12 | Jun 11, 2026 | 5 |
| Dynamic Linear Attention | Efficiency & systems | Xin Wang +9 | Jun 9, 2026 | 5 |
| Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering | Efficiency & systems | Gal Bloch +4 | Jun 9, 2026 | 3 |
| When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges | Efficiency & systems | Parth Darshan +1 | May 25, 2026 | 3 |
| STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations | Efficiency & systems | Rishit Dagli +6 | Jun 3, 2026 | 3 |
| Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay | Efficiency & systems | Joanito Agili Lopo +2 | Jun 10, 2026 | 2 |
| Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation | Efficiency & systems | Maxime Griot +2 | Jun 4, 2026 | 2 |
| Can Predicted Dynamics Exist in the Physical World? | Efficiency & systems | Barak Or | May 23, 2026 | 2 |
| The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction | Efficiency & systems | Shu Wan +3 | May 28, 2026 | 2 |
| SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices | Efficiency & systems | Ernests Lavrinovics +5 | Jun 5, 2026 | 1 |
| Pruning and Distilling Mixture-of-Experts into Dense Language Models | Efficiency & systems | Junhyuck Kim +5 | May 27, 2026 | 1 |
| Deep Embedded Multiplicative DMD for Algebra-Preserving Koopman Learning | Efficiency & systems | Kelan Gray +3 | Jun 3, 2026 | 1 |
| The Hamilton-Jacobi Theory of Deep Learning | Efficiency & systems | Jose Marie Antonio Miñoza +2 | May 27, 2026 | 1 |