MMODELYST
Literature

Papers

Showing 1–85 of 85 notable papers
PaperTopicAuthorsPublishedHF ▲
Latent Spatial Memory for Video World ModelsImage & video genWeijie Wang +9Jun 8, 202666
CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy DistillationImage & video genFangtai Wu +9May 25, 202661
Representation Forcing for Bottleneck-Free Unified Multimodal ModelsImage & video genYuqing Wang +12May 29, 202659
Beyond Scalar Rewards by Internalizing Reasoning into Score DistributionsImage & video genXin Jin +10Jun 8, 202658
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World ModelsImage & video genMin Zhao +11May 28, 202658
YoCausal: How Far is Video Generation from World Model? A Causality PerspectiveImage & video genYou-Zhe Xie +5May 28, 202654
Geometry-Aware Representation Denoising for Robust Multi-view 3D ReconstructionImage & video genJin Hyeon Kim +10May 25, 202641
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching ModelsImage & video genBowen Ping +5Jun 9, 202640
$D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware RoutingImage & video genAoxi Liu +7May 25, 202639
GenClaw: Code-Driven Agentic Image GenerationImage & video genJunyan Ye +6May 28, 202638
dMoE: dLLMs with Learnable Block ExpertsImage & video genSicheng Feng +4May 29, 202636
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion TransformerImage & video genYuyang Zhao +8May 28, 202636
Qwen-Image-Flash: Beyond Objective DesignImage & video genTianhe Wu +23Jun 2, 202635
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent CollaborationImage & video genYiren Song +4May 17, 202633
Echo-Memory: A Controlled Study of Memory in Action World ModelsImage & video genWayne King +15Jun 8, 202632
JLT: Clean-Latent Prediction in Latent Diffusion TransformersImage & video genFuning Fu +4May 26, 202632
OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired DataImage & video genJiwen Liu +10Jun 11, 202629
World Models Meet Language Models: On the Complementarity of Concrete and Abstract ReasoningImage & video genYucheng Zhou +3Jun 2, 202629
VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time OptimizationImage & video genJunhao Cheng +6Jun 1, 202629
Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video GenerationImage & video genYuxuan Bian +11Jun 3, 202628
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video DiffusionImage & video genHidir Yesiltepe +6May 28, 202626
Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation ModelsImage & video genHaozhan Shen +3May 27, 202625
Colored Noise Diffusion SamplingImage & video genHadar Davidson +2May 28, 202625
Triplet-Block Diffusion RWKVImage & video genKe Lin +4May 25, 202625
ARM: An AutoRegressive Large Multimodal Model with Unified Discrete RepresentationsImage & video genJunke Wang +18Jun 9, 202624
LoomVideo: Unifying Multimodal Inputs into Video Generation and EditingImage & video genJianzong Wu +13Jun 4, 202624
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement LearningImage & video genYunyang Ge +6May 27, 202624
Bootstrap Your Generator: Unpaired Visual Editing with Flow MatchingImage & video genYoad Tewel +3Jun 2, 202622
Linearizing Vision Transformer with Test-Time TrainingImage & video genYining Li +5May 28, 202620
Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield PerspectiveImage & video genHyunmin Cho +2May 26, 202620
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video GenerationImage & video genQixin Hu +4Jun 1, 202619
Recursive Flow MatchingImage & video genJiahe Huang +3May 26, 202619
VideoMDM: Towards 3D Human Motion Generation From 2D SupervisionImage & video genAmir Mann +3Jun 11, 202618
Complexity-Balanced Diffusion SplittingImage & video genNoam Issachar +2Jun 4, 202617
RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion ModelsImage & video genXing Cong +5May 26, 202616
SwiftVR: Real-Time One-Step Generative Video RestorationImage & video genJiaqi Yan +7Jun 8, 202615
Geometry Matters: 3D Foundation Priors for Learning Semantic CorrespondenceImage & video genArtur Jesslen +2May 28, 202615
Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image FeedbackImage & video genHuaisong Zhang +9Jun 4, 202614
Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases ThemImage & video genWoojung Han +5Jun 4, 202614
AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video GenerationImage & video genHaobo Li +8Jun 2, 202614
Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image TranslationImage & video genZiyue Lin +8May 31, 202614
LVSA: Training-Free Sparse Attention for Long Video DiffusionImage & video genGael Glorian +4May 29, 202614
Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-ResolutionImage & video genZixin Jessie Chen +6May 25, 202613
DecMem: Towards Minute-Long Consistent World Generation with Decoupled MemoryImage & video genZhenhao Yang +7May 29, 202612
AdaState: Self-Evolving Anchors for Streaming Video GenerationImage & video genYusuf Dalva +1May 28, 202612
NeuROK: Generative 4D Neural Object KinematicsImage & video genChen Geng +5May 28, 202612
Squeezing Capacity from Multimodal Large Language Models for Subject-driven GenerationImage & video genShuhong Zheng +4May 25, 202612
MoVerse: Real-Time Video World Modeling with Panoramic Gaussian ScaffoldImage & video genYang Zhou +6Jun 11, 202611
Policy and World Modeling Co-Training for Language AgentsImage & video genNing Lu +11Jun 1, 202611
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object InteractionsImage & video genOmer Benishu +2May 28, 202611
Text-to-Image Models Need Less from Text Encoders Than You ThinkImage & video genNurit Spingarn +3Jun 2, 202610
MAOAM: Unified Object and Material Selection with Vision-Language ModelsImage & video genJaden Park +7Jun 2, 202610
High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End DistillationImage & video genDongyang Liu +9Jun 10, 20269
i1: A Simple and Fully Open Recipe for Strong Text-to-Image ModelsImage & video genBoya Zeng +6Jun 9, 20269
Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified ModelsImage & video genJiazheng Xing +11May 29, 20268
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing ControlImage & video genZhida Zhang +7May 27, 20268
Learning A Unified Risk Map for Autonomous Driving in Partially Observable EnvironmentsImage & video genJie Jia +6May 21, 20268
MRT: Masked Region Transformer for Layered Image Generation and Editing at ScaleImage & video genZhicong Tang +8May 26, 20268
Bridging the Agent-World Gap: Text World Models for LLM-based AgentsImage & video genYixia Li +15Jun 8, 20267
GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language ModelsImage & video genXiaohang Tang +6May 28, 20267
One-Forcing: Towards Stable One-Step Autoregressive Video GenerationImage & video genJiaqi Feng +3May 22, 20267
RepFusion: Leveraging Multimodal Priors for Denoising in Representation SpaceImage & video genXichen Pan +5Jun 12, 20266
Injecting Image Guidance into Text-Conditioned Diffusion Models at InferenceImage & video genAgata Żywot +6May 24, 20265
MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion TrainingImage & video genLianyu Pang +7Jun 7, 20264
MilliVid: Hierarchical Latents for Long-Range Consistency in Video GenerationImage & video genIshaan Preetam Chandratreya +6Jun 8, 20264
A Cookbook of 3D Vision: Data, Learning Paradigms, and ApplicationImage & video genHongyang Du +10Jun 2, 20264
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State ReformulationImage & video genSamson Gourevitch +6May 21, 20264
EverAnimate: Minute-Scale Human Animation via Latent Flow RestorationImage & video genWuyang Li +6May 14, 20264
MotiMotion: Motion-Controlled Video Generation with Visual ReasoningImage & video genLee Hsin-Ying +5May 21, 20264
Avatar V: Scaling Video-Reference Avatar Video GenerationImage & video genBenjamin Liang +22Jun 11, 20263
Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual SaturationImage & video genSiyuan Liu +1Jun 8, 20263
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-TransformersImage & video genZongyuan Yang +10May 16, 20263
When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language ModelsImage & video genJungwon Park +4May 27, 20263
Cross-scale Aligned Supervision for Training GANsImage & video genSangeek Hyun +2May 26, 20263
IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoderImage & video genYitong Chen +7Jun 9, 20262
Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy DistillationImage & video genXingyu Su +8Jun 4, 20262
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy LearningImage & video genZiyang Yao +12Jun 4, 20262
Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference OptimizationImage & video genZhuohan Liu +3May 27, 20262
Guidance Contrastive Token Credit Assignment for Discrete Policy OptimizationImage & video genShufan Li +4May 29, 20262
MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulationImage & video genDongxia Liu +9Apr 8, 20262
MBench: A Comprehensive Benchmark on Memory Capability for Video World ModelsImage & video genShengjun Zhang +13Jun 8, 20261
Building Social World Models with Large Language ModelsImage & video genHaofei Yu +3Jun 9, 20261
FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow MatchingImage & video genDanilo Danese +4Jun 8, 20261
Streaming Video Generation with Streaming Force ControlImage & video genHanhui Wang +5Jun 5, 20261
Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image EditingImage & video genYixuan Ding +4Apr 16, 20261