MMODELYST
Literature

Papers

Showing 1–74 of 74 notable papers
PaperTopicAuthorsPublishedHF ▲
ABot-Earth 0.5: Generative 3D Earth ModelRoboticsMing Qian +27Jun 8, 2026466
Gamma-World: Generative Multi-Agent World Modeling Beyond Two PlayersRoboticsFangfu Liu +9May 27, 2026423
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot EmbodimentsRoboticsQiuyue Wang +39May 28, 2026140
Cosmos 3: Omnimodal World Models for Physical AIRoboticsAditi +39Jun 1, 2026108
InterleaveThinker: Reinforcing Agentic Interleaved GenerationRoboticsDian Zheng +6Jun 11, 202677
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?RoboticsHaosong Peng +12May 26, 202671
Self-Improving Language Models with Bidirectional Evolutionary SearchRoboticsGuowei Xu +6May 27, 202659
LabVLA: Grounding Vision-Language-Action Models in Scientific LaboratoriesRoboticsBaochang Ren +17Jun 11, 202653
GEM: Generative Supervision Helps Embodied IntelligenceRoboticsRuowen Zhao +11May 27, 202641
Task-Focused Memorization for Multimodal AgentsRoboticsTao Zou +4May 29, 202638
OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMsRoboticsYifei Li +6Jun 2, 202631
WorldOlympiad: Can Your World Model Survive a Triathlon?RoboticsYuke Zhao +10Jun 9, 202630
Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?RoboticsLiyang Li +7May 31, 202630
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution CustomizationRoboticsYu Li +10Jun 5, 202629
Robots Need More than VLA and World ModelsRoboticsElis Karcini +8Jun 4, 202627
LLaVA-OneVision-2: Towards Next-Generation Perceptual IntelligenceRoboticsXiang An +29May 25, 202627
Direct 3D-Aware Object Insertion via Decomposed Visual ProxiesRoboticsJingbo Gong +8Jun 4, 202626
RobotValues: Evaluating Household Robots When Human Values ConflictRoboticsJongwook Han +2Jun 2, 202626
World Pilot: Steering Vision-Language-Action Models with World-Action PriorsRoboticsZefu Lin +6Jun 10, 202623
WALL-WM: Carving World Action Modeling at the Event JointsRoboticsShalfun Li +30Jun 1, 202623
WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World ModelsRoboticsBohai Gu +11May 24, 202622
NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle SimulationRoboticsNVIDIA +33Jun 2, 202622
The Road Ahead in Autonomous Driving: The KITScenes Multimodal DatasetRoboticsRichard Schwarzkopf +23Jun 1, 202618
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI ScientistsRoboticsJunlin Yang +9May 25, 202618
GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic ManipulationRoboticsBoxiang Qiu +14May 26, 202617
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous DrivingRoboticsKewei Zhang +11May 22, 202617
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?RoboticsRui Zhao +8Jun 4, 202616
Rethinking VLM Representation for VLA InitializationRoboticsWeifeng Lin +7May 25, 202615
AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context RoutingRoboticsJisong Cai +12Jun 8, 202614
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided RepresentationRoboticsJusuk Lee +8May 28, 202613
The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse LanguagesRoboticsEric Onyame +4May 27, 202613
World Model Self-Distillation: Training World Models to Solve General TasksRoboticsSebastian Stapf +3Jun 10, 202612
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation ModelsRoboticsYifu Yuan +22Jun 9, 202611
MineExplorer: Evaluating Open-World Exploration of MLLM Agents in MinecraftRoboticsTianjie Ju +9May 29, 202611
Light-WAM: Efficient World Action Models with State-Fusion Action DecodingRoboticsZiang Li +7Jun 6, 202610
AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware UnderstandingRoboticsQize Yu +12Jun 4, 202610
RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied ScenesRoboticsLeyi Wu +13May 30, 202610
VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action PoliciesRoboticsMingjian Gao +11May 28, 202610
Category-Level 3D Correspondence in Camera Space via Morphable Object PriorsRoboticsLeonhard Sommer +3May 27, 202610
PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological MapsRoboticsJunlin Long +7Jun 1, 20269
Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime MonitoringRoboticsSeongheon Park +6May 29, 20269
World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action SynthesisRoboticsYi Yang +11Jun 4, 20268
AFUN: Towards an Affordance Foundation Model for Functionality UnderstandingRoboticsZhaoning Wang +4Jun 1, 20268
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and UpcyclingRoboticsGianluca Barmina +4Jun 8, 20267
Flash-WAM: Modality-Aware Distillation for World Action ModelsRoboticsArman Akbari +8Jun 3, 20267
GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video PriorsRoboticsTianyi Xie +19Jun 3, 20267
RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA ModelsRoboticsBin Yu +11Jun 1, 20267
FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit EncoderRoboticsDan Jacobellis +1May 27, 20267
PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in MinecraftRoboticsYuchen Guo +4May 26, 20267
ECHO: Terminal Agents Learn World Models for FreeRoboticsVaishnavi Shrivastava +3May 23, 20267
RepWAM: World Action Modeling with Representation Visual-Action TokenizersRoboticsJunke Wang +7Jun 11, 20266
SPACENUM: Revisiting Spatial Numerical Understanding in VLMsRoboticsJianshu Zhang +6May 22, 20266
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous SystemsRoboticsBarak OrMay 23, 20266
Learning High-Frequency Continuous Action Chunks in Latent SpaceRoboticsKunyun Wang +4May 24, 20266
DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language ModelsRoboticsZhuoming Liu +5Jun 4, 20265
Next Forcing: Causal World Modeling with Multi-Chunk PredictionRoboticsGangwei Xu +6Jun 9, 20265
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning StackRoboticsHe Zhang +25Jun 12, 20264
MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement LearningRoboticsManan TayalJun 6, 20264
Robotic Policy Adaptation via Weight-Space Meta-LearningRoboticsChristian Bianchi +6Jun 5, 20264
SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient ReconstructionRoboticsDan Jacobellis +1Jun 2, 20264
Can LLMs Introspect? A Reality CheckRoboticsShashwat Singh +2May 25, 20264
Test-Time Gradient Guidance of Flow Policies in Reinforcement LearningRoboticsZhiyuan Zhou +6Jun 9, 20263
VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon ManipulationRoboticsSiyi Chen +11Jun 5, 20263
PaintBench: Deterministic Evaluation of Precise Visual EditingRoboticsKai Xu +5May 29, 20263
AURA: Action-Gated Memory for Robot Policies at Constant VRAMRoboticsJosef ChenJun 1, 20263
WEAVER, Better, Faster, Longer: An Effective World Model for Robotic ManipulationRoboticsArnav Kumar Jain +4Jun 11, 20262
Revisiting Articulated Parts Perception in Robot ManipulationRoboticsXiaoqian Wu +5Jun 6, 20262
OASIS: From Simulation Data Collection to Real-World Humanoid Loco-ManipulationRoboticsZehao Yu +6Jun 7, 20262
TBD-VLA: Temporal Block Diffusion Vision Language Action ModelRoboticsSung-Wook Lee +2Jun 5, 20262
StressDream: Steering Video World Models for Robust Policy Evaluation and ImprovementRoboticsJunwon Seo +8May 29, 20262
Light Interaction: Training-Free Inference Acceleration for Interactive Video World ModelsRoboticsJiacheng Lu +5May 29, 20262
FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning EigenmodesRoboticsDonglai Xiang +7May 28, 20261
Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM DecodeRoboticsJosef ChenMay 28, 20261
Reducing Political Manipulation with Consistency TrainingRoboticsLong Phan +5May 21, 20261