MMODELYST
Literature

Papers

Showing 101–155 of 155 notable papers
PaperTopicAuthorsPublishedHF ▲
SAM: State-Adaptive Memory for Long-Horizon Reasoning AgentAgentsYuyang Hu +7May 23, 20269
Economy of Minds: Emerging Multi-Agent Intelligence with Economic InteractionsAgentsZhenting Qi +15Jun 1, 20268
FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic SearchAgentsJames Xu Zhao +3May 30, 20268
AgensFlow: A Coordination-Policy Substrate for Multi-Agent SystemsAgentsNicole KoenigsteinMay 26, 20268
FastKernels: Benchmarking GPU Kernel Generation in ProductionAgentsGabriele Oliaro +7May 22, 20268
LACUNA: Safe Agents as Recursive Program HolesAgentsYaoyu Zhao +5May 27, 20267
Agentic CLEAR: Automating Multi-Level Evaluation of LLM AgentsAgentsAsaf Yehudai +2May 21, 20267
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded ReasoningAgentsLingyong Yan +15Jun 5, 20266
Lean4Agent: Formal Modeling and Verification for Agent Workflow and TrajectoryAgentsRuida Wang +5Jun 2, 20266
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent SystemsAgentsMingju Chen +4Jun 1, 20266
Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM AgentsAgentsZiyan Liu +9May 28, 20266
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting VerificationAgentsYan Wang +9Jun 2, 20266
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization AgentsAgentsChenyu Zhou +5May 27, 20266
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support ConversationsAgentsJie Zhu +7May 27, 20266
Verus-SpecGym: An Agentic Environment for Evaluating Specification AutoformalizationAgentsAnmol Agarwal +8May 26, 20266
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective ReasoningAgentsYuyang Hu +7May 23, 20266
How Far Will They Go? Red-Teaming Online Influence with Large Language ModelsAgentsDaniel C. Ruiz +4May 20, 20266
Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized TestsAgentsThanawat Lodkaew +5Jun 5, 20265
What Should Agents Say? Action-state Communication for Efficient Multi-Agent SystemsAgentsChen Huang +2Jun 3, 20265
SePO: Self-Evolving Prompt Agent for System Prompt OptimizationAgentsWangcheng Tao +2Jun 3, 20265
Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM AgentsAgentsAiliya Borjigin +6Jun 1, 20265
Decoupling Communication from Policy: Robust MARL under Bandwidth ConstraintsAgentsAlexi Canesse +3May 20, 20265
Measuring Epistemic Resilience of LLMs Under Misleading Medical ContextAgentsHongjian Zhou +21Jun 10, 20264
EvoBrowseComp: Benchmarking Search Agents on Evolving KnowledgeAgentsYunhan Wang +4Jun 11, 20264
Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding AgentsAgentsYujun Zhou +10Jun 11, 20264
Towards Retrieving Interaction Spaces for Agentic SearchAgentsShengyao Zhuang +4Jun 5, 20264
DAR: Deontic Reasoning with Agentic HarnessesAgentsGuangyao Dou +3Jun 3, 20264
STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming MediaAgentsLiang Xue +5May 24, 20264
See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous AgentsAgentsSiyi Chen +9Jun 11, 20263
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact AgentsAgentsKushal Raj Bhandari +6Jun 10, 20263
Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMsAgentsSanjay Adhikesaven +2Jun 10, 20263
Decentralized Multi-Agent Systems with Shared ContextAgentsYuzhen Mao +1Jun 9, 20263
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill MemoryAgentsHaoran Sun +10Jun 8, 20263
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context ManagementAgentsZherui Yang +3Jun 2, 20263
Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient CasesAgentsCheng Liang +5Jun 3, 20263
Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM AgentsAgentsYingqi ZhangJun 2, 20263
The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?AgentsXinyu Lu +10Jun 3, 20263
The Cold-Start Safety Gap in LLM AgentsAgentsChung-En Sun +2Jun 5, 20262
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMsAgentsAshutosh Hathidara +3Jun 4, 20262
PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on OverleafAgentsJiarui Liu +19Jun 7, 20262
LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language ModelsAgentsPrateek Kumar SikdarJun 1, 20262
AgentCL: Toward Rigorous Evaluation of Continual Learning in Language AgentsAgentsYiheng Shu +5Jun 2, 20262
Discovering Cooperative Pipelines: Autoresearch for Sequential Social DilemmasAgentsVíctor GallegoMay 28, 20262
τ-Rec: A Verifiable Benchmark for Agentic Recommender SystemsAgentsBharath Sivaram Narasimhan +1Jun 8, 20261
Hardening Agent Benchmarks with Adversarial Hacker-Fixer LoopsAgentsZiqian Zhong +5Jun 8, 20261
PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit AssignmentAgentsYang Tian +7Jun 8, 20261
Honest Lying: Understanding Memory Confabulation in Reflexive AgentsAgentsPrakhar Dixit +2May 28, 20261
Parametric Social Identity Injection and Diversification in Public Opinion SimulationAgentsHexi Wang +4Jun 1, 20261
AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM AgentsAgentsYang Li +3Jun 4, 20261
LLM Anonymization Against Agentic Re-IdentificationAgentsZiwen Li +2Jun 1, 20261
Agentic Chain-of-Thought Steering for Efficient and Controllable LLM ReasoningAgentsYu Xia +6Jun 2, 20261
AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?AgentsMaharshi Gor +6May 27, 20261
Beyond Recall: Behavioral Specification as an Interpretive Layer for AI PersonalizationAgentsAarik GulayaMay 27, 20261
ORACLE: Anticipating Scams from Partial Trajectories in Streaming App UsageAgentsWenbo Gao +8May 9, 20261
Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent SystemsAgentsAman Priyanshu +2May 26, 20261
← PrevPage 2 of 2Next →