| ABot-Earth 0.5: Generative 3D Earth Model | Robotics | Ming Qian +27 | Jun 8, 2026 | 466 |
| Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players | Robotics | Fangfu Liu +9 | May 27, 2026 | 423 |
| Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments | Robotics | Qiuyue Wang +39 | May 28, 2026 | 140 |
| Cosmos 3: Omnimodal World Models for Physical AI | Robotics | Aditi +39 | Jun 1, 2026 | 108 |
| InterleaveThinker: Reinforcing Agentic Interleaved Generation | Robotics | Dian Zheng +6 | Jun 11, 2026 | 77 |
| SpatialBench: Is Your Spatial Foundation Model an All-Round Player? | Robotics | Haosong Peng +12 | May 26, 2026 | 71 |
| Self-Improving Language Models with Bidirectional Evolutionary Search | Robotics | Guowei Xu +6 | May 27, 2026 | 59 |
| LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories | Robotics | Baochang Ren +17 | Jun 11, 2026 | 53 |
| GEM: Generative Supervision Helps Embodied Intelligence | Robotics | Ruowen Zhao +11 | May 27, 2026 | 41 |
| Task-Focused Memorization for Multimodal Agents | Robotics | Tao Zou +4 | May 29, 2026 | 38 |
| OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs | Robotics | Yifei Li +6 | Jun 2, 2026 | 31 |
| WorldOlympiad: Can Your World Model Survive a Triathlon? | Robotics | Yuke Zhao +10 | Jun 9, 2026 | 30 |
| Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? | Robotics | Liyang Li +7 | May 31, 2026 | 30 |
| AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization | Robotics | Yu Li +10 | Jun 5, 2026 | 29 |
| Robots Need More than VLA and World Models | Robotics | Elis Karcini +8 | Jun 4, 2026 | 27 |
| LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence | Robotics | Xiang An +29 | May 25, 2026 | 27 |
| Direct 3D-Aware Object Insertion via Decomposed Visual Proxies | Robotics | Jingbo Gong +8 | Jun 4, 2026 | 26 |
| RobotValues: Evaluating Household Robots When Human Values Conflict | Robotics | Jongwook Han +2 | Jun 2, 2026 | 26 |
| World Pilot: Steering Vision-Language-Action Models with World-Action Priors | Robotics | Zefu Lin +6 | Jun 10, 2026 | 23 |
| WALL-WM: Carving World Action Modeling at the Event Joints | Robotics | Shalfun Li +30 | Jun 1, 2026 | 23 |
| WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models | Robotics | Bohai Gu +11 | May 24, 2026 | 22 |
| NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation | Robotics | NVIDIA +33 | Jun 2, 2026 | 22 |
| The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset | Robotics | Richard Schwarzkopf +23 | Jun 1, 2026 | 18 |
| CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists | Robotics | Junlin Yang +9 | May 25, 2026 | 18 |
| GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation | Robotics | Boxiang Qiu +14 | May 26, 2026 | 17 |
| Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving | Robotics | Kewei Zhang +11 | May 22, 2026 | 17 |
| Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation? | Robotics | Rui Zhao +8 | Jun 4, 2026 | 16 |
| Rethinking VLM Representation for VLA Initialization | Robotics | Weifeng Lin +7 | May 25, 2026 | 15 |
| AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing | Robotics | Jisong Cai +12 | Jun 8, 2026 | 14 |
| DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation | Robotics | Jusuk Lee +8 | May 28, 2026 | 13 |
| The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages | Robotics | Eric Onyame +4 | May 27, 2026 | 13 |
| World Model Self-Distillation: Training World Models to Solve General Tasks | Robotics | Sebastian Stapf +3 | Jun 10, 2026 | 12 |
| Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models | Robotics | Yifu Yuan +22 | Jun 9, 2026 | 11 |
| MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft | Robotics | Tianjie Ju +9 | May 29, 2026 | 11 |
| Light-WAM: Efficient World Action Models with State-Fusion Action Decoding | Robotics | Ziang Li +7 | Jun 6, 2026 | 10 |
| AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding | Robotics | Qize Yu +12 | Jun 4, 2026 | 10 |
| RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes | Robotics | Leyi Wu +13 | May 30, 2026 | 10 |
| VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies | Robotics | Mingjian Gao +11 | May 28, 2026 | 10 |
| Category-Level 3D Correspondence in Camera Space via Morphable Object Priors | Robotics | Leonhard Sommer +3 | May 27, 2026 | 10 |
| PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps | Robotics | Junlin Long +7 | Jun 1, 2026 | 9 |
| Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring | Robotics | Seongheon Park +6 | May 29, 2026 | 9 |
| World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis | Robotics | Yi Yang +11 | Jun 4, 2026 | 8 |
| AFUN: Towards an Affordance Foundation Model for Functionality Understanding | Robotics | Zhaoning Wang +4 | Jun 1, 2026 | 8 |
| BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling | Robotics | Gianluca Barmina +4 | Jun 8, 2026 | 7 |
| Flash-WAM: Modality-Aware Distillation for World Action Models | Robotics | Arman Akbari +8 | Jun 3, 2026 | 7 |
| GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors | Robotics | Tianyi Xie +19 | Jun 3, 2026 | 7 |
| RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models | Robotics | Bin Yu +11 | Jun 1, 2026 | 7 |
| FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder | Robotics | Dan Jacobellis +1 | May 27, 2026 | 7 |
| PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft | Robotics | Yuchen Guo +4 | May 26, 2026 | 7 |
| ECHO: Terminal Agents Learn World Models for Free | Robotics | Vaishnavi Shrivastava +3 | May 23, 2026 | 7 |
| RepWAM: World Action Modeling with Representation Visual-Action Tokenizers | Robotics | Junke Wang +7 | Jun 11, 2026 | 6 |
| SPACENUM: Revisiting Spatial Numerical Understanding in VLMs | Robotics | Jianshu Zhang +6 | May 22, 2026 | 6 |
| Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems | Robotics | Barak Or | May 23, 2026 | 6 |
| Learning High-Frequency Continuous Action Chunks in Latent Space | Robotics | Kunyun Wang +4 | May 24, 2026 | 6 |
| DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models | Robotics | Zhuoming Liu +5 | Jun 4, 2026 | 5 |
| Next Forcing: Causal World Modeling with Multi-Chunk Prediction | Robotics | Gangwei Xu +6 | Jun 9, 2026 | 5 |
| Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack | Robotics | He Zhang +25 | Jun 12, 2026 | 4 |
| MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning | Robotics | Manan Tayal | Jun 6, 2026 | 4 |
| Robotic Policy Adaptation via Weight-Space Meta-Learning | Robotics | Christian Bianchi +6 | Jun 5, 2026 | 4 |
| SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction | Robotics | Dan Jacobellis +1 | Jun 2, 2026 | 4 |
| Can LLMs Introspect? A Reality Check | Robotics | Shashwat Singh +2 | May 25, 2026 | 4 |
| Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning | Robotics | Zhiyuan Zhou +6 | Jun 9, 2026 | 3 |
| VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation | Robotics | Siyi Chen +11 | Jun 5, 2026 | 3 |
| PaintBench: Deterministic Evaluation of Precise Visual Editing | Robotics | Kai Xu +5 | May 29, 2026 | 3 |
| AURA: Action-Gated Memory for Robot Policies at Constant VRAM | Robotics | Josef Chen | Jun 1, 2026 | 3 |
| WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation | Robotics | Arnav Kumar Jain +4 | Jun 11, 2026 | 2 |
| Revisiting Articulated Parts Perception in Robot Manipulation | Robotics | Xiaoqian Wu +5 | Jun 6, 2026 | 2 |
| OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation | Robotics | Zehao Yu +6 | Jun 7, 2026 | 2 |
| TBD-VLA: Temporal Block Diffusion Vision Language Action Model | Robotics | Sung-Wook Lee +2 | Jun 5, 2026 | 2 |
| StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement | Robotics | Junwon Seo +8 | May 29, 2026 | 2 |
| Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models | Robotics | Jiacheng Lu +5 | May 29, 2026 | 2 |
| FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes | Robotics | Donglai Xiang +7 | May 28, 2026 | 1 |
| Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode | Robotics | Josef Chen | May 28, 2026 | 1 |
| Reducing Political Manipulation with Consistency Training | Robotics | Long Phan +5 | May 21, 2026 | 1 |