MMODELYST
Literature

Papers

Showing 101–136 of 136 notable papers
PaperTopicAuthorsPublishedHF ▲
Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry PriorsVision & multimodalHanxun Yu +6Jun 5, 20264
Benchmark Everything Everywhere All at OnceVision & multimodalShiyun Xiong +7Jun 4, 20264
MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing UnderstandingVision & multimodalQian Kou +6May 29, 20264
ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector DisagreeVision & multimodalVincent Koc +5May 31, 20264
MindZero: Learning Online Mental Reasoning With Zero AnnotationsVision & multimodalShunchi Zhang +5May 29, 20264
Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement TrainingVision & multimodalMichal Chudoba +3Jun 10, 20263
Phase Marginalization for Patch-Grid Instability in Vision TransformersVision & multimodalOğuzhan ErcanJun 6, 20263
SDR: Set-Distance Rewards for Radiology Report GenerationVision & multimodalHalil Ibrahim Gulluk +3May 30, 20263
WorldBench: A Challenging and Visually Diverse Multimodal Reasoning BenchmarkVision & multimodalYida Yin +11Jun 4, 20263
Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language ModelsVision & multimodalHaibo Wang +1Jun 4, 20263
Video2LoRA: Parametric Video Internalization for Vision-Language ModelsVision & multimodalManan Suri +2Jun 3, 20263
BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD UnderstandingVision & multimodalMuhammad Usama +3Jun 3, 20263
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D ScenesVision & multimodalTianhui Liu +8May 29, 20263
Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable RegimesVision & multimodalChenxi Tao +1May 28, 20263
Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture AssemblyVision & multimodalAditya Chetan +7May 20, 20263
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language ModelsVision & multimodalYifan Jiang +3May 26, 20263
Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual PerturbationVision & multimodalYutszyuk Wong +3May 9, 20263
Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy DistillationVision & multimodalGuo Yu +5Jun 11, 20262
ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic LanguagesVision & multimodalTanmoy Kanti Halder +4Jun 11, 20262
WebChallenger: A Reliable and Efficient Generalist Web AgentVision & multimodalJayoo Hwang +2Jun 9, 20262
Can Generalist Agents Automate Data Curation?Vision & multimodalFeiyang Kang +7Jun 2, 20262
Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online DiscussionsVision & multimodalXinnong Zhang +4Jun 4, 20262
When Graph Tokens Sink: A Mechanistic Analysis of Graph Language ModelsVision & multimodalDing Zhang +5Jun 2, 20262
SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual MisinformationVision & multimodalJunxiao Yang +6Jun 2, 20262
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive RepresentationsVision & multimodalSachin KumarMay 27, 20262
Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language ModelsVision & multimodalGuangzhao He +3Jun 1, 20262
ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and FormatsVision & multimodalShangpin Peng +12May 31, 20262
Leveraging Morphology for Historical Script Metrological AnalysisVision & multimodalMalamatenia Vlachou Efstathiou +3Jun 8, 20261
APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge OperationsVision & multimodalSwadhin Pradhan +2Jun 10, 20261
Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their LatentsVision & multimodalXiuYu Zhang +2Jun 4, 20261
Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection DefenseVision & multimodalShuhao Zhang +4May 29, 20261
Quality-Guided Semi-Supervised Learning for Medical Image SegmentationVision & multimodalKumar Abhishek +1Jun 1, 20261
Benchmarking Composed Image Retrieval for Applied Earth ObservationVision & multimodalBill Psomas +8May 23, 20261
Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly DetectionVision & multimodalXiaona Zhou +4May 28, 20261
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric ReasoningVision & multimodalChun-Hsiao Yeh +5May 28, 20261
Don't Guess, Just Ask: Resolving Ambiguity in Referring Segmentation via Multi-turn ClarificationVision & multimodalYuting Yang +4May 24, 20261
← PrevPage 2 of 2Next →