MMODELYST
Literature

Papers

Showing 1–32 of 32 notable papers
PaperTopicAuthorsPublishedHF ▲
OCC-RAG: Optimal Cognitive Core for Faithful Question AnsweringSafety & alignmentMaksim Savkin +9May 30, 202689
Redesign Mixture-of-Experts Routers with Manifold Power IterationSafety & alignmentSonghao Wu +3Jun 10, 202684
GENEB: Why Genomic Models Are Hard to CompareSafety & alignmentDaria Ledneva +2Jun 3, 202646
Human Psychometric Questionnaires Mischaracterize LLM BehaviorSafety & alignmentWoojung Song +5May 29, 202635
A Geometric Account of Activation Steering through Angle-Norm DecompositionSafety & alignmentGeorgii Aparin +1Jun 4, 202621
Is Position Bias in Dense Retrievers Built In-or Learned from Data?Safety & alignmentDaegon Yu +2May 26, 202620
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language ModelsSafety & alignmentMingze Wang +5May 26, 202620
LLM Explainability with Counterfactual Chains and Causal GraphsSafety & alignmentNirit Nussbaum-Hoffer +3Jun 4, 202616
ICA Lens: Interpreting Language Models Without Training Another DictionarySafety & alignmentSida Liu +1Jun 10, 202615
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix ItSafety & alignmentXinyu Zhou +6Jun 9, 202615
UniSHARP: Universal Sharp Monocular View SynthesisSafety & alignmentMeixi Song +6Jun 5, 202614
Less is More: Early Stopping Rollout for On-Policy DistillationSafety & alignmentZhou Ziheng +4May 26, 202613
PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper StreamsSafety & alignmentFuqiang Wang +10Jun 5, 202612
Parallax: Parameterized Local Linear Attention for Language ModelingSafety & alignmentYifei Zuo +5May 27, 202611
Why Muon Outperforms Adam: A Curvature PerspectiveSafety & alignmentShuche Wang +4Jun 3, 202610
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training TracesSafety & alignmentChen He +4May 28, 20269
ACL-Verbatim: hallucination-free question answering for researchSafety & alignmentGábor Recski +4May 20, 20268
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language ModelsSafety & alignmentGianluca Barmina +9Jun 8, 20266
Neural Networks Provably Learn Spectral Representations for Group CompositionSafety & alignmentJianliang He +4Jun 2, 20266
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict BehaviorSafety & alignmentRafal Kocielnik +7Jun 10, 20265
UniPET: a universal network for high-quality PET image denoising across varied dose reduction factorsSafety & alignmentZhiwen Yang +6Jun 9, 20265
Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified CovariatesSafety & alignmentPaiheng Xu +2Jun 2, 20265
Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment GatingSafety & alignmentSicheng Wang +8Jun 8, 20264
Evaluation Cards: An Interpretive Layer for AI Evaluation ReportingSafety & alignmentAvijit Ghosh +39Jun 8, 20264
Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language QueriesSafety & alignmentMahdi Azhdari +1May 20, 20264
Large Language Models Are Overconfident in Their Own ResponsesSafety & alignmentMario Sanz-Guerrero +2Jun 2, 20263
The Role of Feedback Alignment in Self-DistillationSafety & alignmentSemih Kara +1Jun 9, 20263
When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning ModelsSafety & alignmentSai Kartheek Reddy Kasu +2Jun 9, 20262
When Behavioral Safety Evaluation Fails: A Representation-Level PerspectiveSafety & alignmentEnyi Jiang +3Jun 6, 20261
ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard NegativesSafety & alignmentAarush Sinha +2Jun 5, 20261
Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg GameSafety & alignmentChensong Huang +5Jun 3, 20261
Measuring the Symmetry--Data Exchange RateSafety & alignmentAhmed M. AdlyMay 31, 20261