| OCC-RAG: Optimal Cognitive Core for Faithful Question Answering | Safety & alignment | Maksim Savkin +9 | May 30, 2026 | 89 |
| Redesign Mixture-of-Experts Routers with Manifold Power Iteration | Safety & alignment | Songhao Wu +3 | Jun 10, 2026 | 84 |
| GENEB: Why Genomic Models Are Hard to Compare | Safety & alignment | Daria Ledneva +2 | Jun 3, 2026 | 46 |
| Human Psychometric Questionnaires Mischaracterize LLM Behavior | Safety & alignment | Woojung Song +5 | May 29, 2026 | 35 |
| A Geometric Account of Activation Steering through Angle-Norm Decomposition | Safety & alignment | Georgii Aparin +1 | Jun 4, 2026 | 21 |
| Is Position Bias in Dense Retrievers Built In-or Learned from Data? | Safety & alignment | Daegon Yu +2 | May 26, 2026 | 20 |
| Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models | Safety & alignment | Mingze Wang +5 | May 26, 2026 | 20 |
| LLM Explainability with Counterfactual Chains and Causal Graphs | Safety & alignment | Nirit Nussbaum-Hoffer +3 | Jun 4, 2026 | 16 |
| ICA Lens: Interpreting Language Models Without Training Another Dictionary | Safety & alignment | Sida Liu +1 | Jun 10, 2026 | 15 |
| Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It | Safety & alignment | Xinyu Zhou +6 | Jun 9, 2026 | 15 |
| UniSHARP: Universal Sharp Monocular View Synthesis | Safety & alignment | Meixi Song +6 | Jun 5, 2026 | 14 |
| Less is More: Early Stopping Rollout for On-Policy Distillation | Safety & alignment | Zhou Ziheng +4 | May 26, 2026 | 13 |
| PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams | Safety & alignment | Fuqiang Wang +10 | Jun 5, 2026 | 12 |
| Parallax: Parameterized Local Linear Attention for Language Modeling | Safety & alignment | Yifei Zuo +5 | May 27, 2026 | 11 |
| Why Muon Outperforms Adam: A Curvature Perspective | Safety & alignment | Shuche Wang +4 | Jun 3, 2026 | 10 |
| Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces | Safety & alignment | Chen He +4 | May 28, 2026 | 9 |
| ACL-Verbatim: hallucination-free question answering for research | Safety & alignment | Gábor Recski +4 | May 20, 2026 | 8 |
| PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models | Safety & alignment | Gianluca Barmina +9 | Jun 8, 2026 | 6 |
| Neural Networks Provably Learn Spectral Representations for Group Composition | Safety & alignment | Jianliang He +4 | Jun 2, 2026 | 6 |
| Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior | Safety & alignment | Rafal Kocielnik +7 | Jun 10, 2026 | 5 |
| UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors | Safety & alignment | Zhiwen Yang +6 | Jun 9, 2026 | 5 |
| Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates | Safety & alignment | Paiheng Xu +2 | Jun 2, 2026 | 5 |
| Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating | Safety & alignment | Sicheng Wang +8 | Jun 8, 2026 | 4 |
| Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting | Safety & alignment | Avijit Ghosh +39 | Jun 8, 2026 | 4 |
| Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries | Safety & alignment | Mahdi Azhdari +1 | May 20, 2026 | 4 |
| Large Language Models Are Overconfident in Their Own Responses | Safety & alignment | Mario Sanz-Guerrero +2 | Jun 2, 2026 | 3 |
| The Role of Feedback Alignment in Self-Distillation | Safety & alignment | Semih Kara +1 | Jun 9, 2026 | 3 |
| When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models | Safety & alignment | Sai Kartheek Reddy Kasu +2 | Jun 9, 2026 | 2 |
| When Behavioral Safety Evaluation Fails: A Representation-Level Perspective | Safety & alignment | Enyi Jiang +3 | Jun 6, 2026 | 1 |
| ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives | Safety & alignment | Aarush Sinha +2 | Jun 5, 2026 | 1 |
| Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game | Safety & alignment | Chensong Huang +5 | Jun 3, 2026 | 1 |
| Measuring the Symmetry--Data Exchange Rate | Safety & alignment | Ahmed M. Adly | May 31, 2026 | 1 |