Literature

Papers

SortNotable Newest Most cited Oldest A–Z

Notable = Hugging Face daily papers (community-upvoted) · every paper links to arXiv · citations from OpenAlex

TopicAll Vision & multimodal Agents Safety & alignment Code Efficiency & systems Image & video gen Data & benchmarks Robotics Speech & audio Reinforcement learning Theory Science & bio Other LLMs & reasoning

Showing 1–32 of 32 notable papers

Paper	Topic	Authors	Published	HF ▲
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering	Safety & alignment	Maksim Savkin +9	May 30, 2026	89
Redesign Mixture-of-Experts Routers with Manifold Power Iteration	Safety & alignment	Songhao Wu +3	Jun 10, 2026	84
GENEB: Why Genomic Models Are Hard to Compare	Safety & alignment	Daria Ledneva +2	Jun 3, 2026	46
Human Psychometric Questionnaires Mischaracterize LLM Behavior	Safety & alignment	Woojung Song +5	May 29, 2026	35
A Geometric Account of Activation Steering through Angle-Norm Decomposition	Safety & alignment	Georgii Aparin +1	Jun 4, 2026	21
Is Position Bias in Dense Retrievers Built In-or Learned from Data?	Safety & alignment	Daegon Yu +2	May 26, 2026	20
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models	Safety & alignment	Mingze Wang +5	May 26, 2026	20
LLM Explainability with Counterfactual Chains and Causal Graphs	Safety & alignment	Nirit Nussbaum-Hoffer +3	Jun 4, 2026	16
ICA Lens: Interpreting Language Models Without Training Another Dictionary	Safety & alignment	Sida Liu +1	Jun 10, 2026	15
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It	Safety & alignment	Xinyu Zhou +6	Jun 9, 2026	15
UniSHARP: Universal Sharp Monocular View Synthesis	Safety & alignment	Meixi Song +6	Jun 5, 2026	14
Less is More: Early Stopping Rollout for On-Policy Distillation	Safety & alignment	Zhou Ziheng +4	May 26, 2026	13
PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams	Safety & alignment	Fuqiang Wang +10	Jun 5, 2026	12
Parallax: Parameterized Local Linear Attention for Language Modeling	Safety & alignment	Yifei Zuo +5	May 27, 2026	11
Why Muon Outperforms Adam: A Curvature Perspective	Safety & alignment	Shuche Wang +4	Jun 3, 2026	10
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces	Safety & alignment	Chen He +4	May 28, 2026	9
ACL-Verbatim: hallucination-free question answering for research	Safety & alignment	Gábor Recski +4	May 20, 2026	8
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models	Safety & alignment	Gianluca Barmina +9	Jun 8, 2026	6
Neural Networks Provably Learn Spectral Representations for Group Composition	Safety & alignment	Jianliang He +4	Jun 2, 2026	6
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior	Safety & alignment	Rafal Kocielnik +7	Jun 10, 2026	5
UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors	Safety & alignment	Zhiwen Yang +6	Jun 9, 2026	5
Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates	Safety & alignment	Paiheng Xu +2	Jun 2, 2026	5
Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating	Safety & alignment	Sicheng Wang +8	Jun 8, 2026	4
Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting	Safety & alignment	Avijit Ghosh +39	Jun 8, 2026	4
Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries	Safety & alignment	Mahdi Azhdari +1	May 20, 2026	4
Large Language Models Are Overconfident in Their Own Responses	Safety & alignment	Mario Sanz-Guerrero +2	Jun 2, 2026	3
The Role of Feedback Alignment in Self-Distillation	Safety & alignment	Semih Kara +1	Jun 9, 2026	3
When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models	Safety & alignment	Sai Kartheek Reddy Kasu +2	Jun 9, 2026	2
When Behavioral Safety Evaluation Fails: A Representation-Level Perspective	Safety & alignment	Enyi Jiang +3	Jun 6, 2026	1
ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives	Safety & alignment	Aarush Sinha +2	Jun 5, 2026	1
Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game	Safety & alignment	Chensong Huang +5	Jun 3, 2026	1
Measuring the Symmetry--Data Exchange Rate	Safety & alignment	Ahmed M. Adly	May 31, 2026	1