Latest Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Tianqing Fang, Zhisong Zhang, Xiaoyang Wang, et al.
21 days ago

Co-Producing AI: Toward an Augmented, Participatory Lifecycle
Rashid Mushkani, Hugo Berard, Toumadher Ammar, et al.
21 days ago

iLRM: An Iterative Large 3D Reconstruction Model
Gyeongjin Kang, Seungtae Nam, Xiangyu Sun, et al.
21 days ago

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action
Models
Xiaoyu Chen, Hangxing Wei, Pushi Zhang, et al.
21 days ago

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring
Challenges in Complex Conversations
Chengqian Ma, Wei Tao, Yiwen Guo
21 days ago

RecGPT Technical Report
Chao Yi, Dian Chen, Gaoyang Guo, et al.
21 days ago

Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Miaosen Zhang, Ziqiang Xu, Jialiang Zhu, et al.
21 days ago

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Luoxin Chen, Jinming Gu, Liankai Huang, et al.
21 days ago

Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification
Yuke Liao, Blaise Genest, Kuldeep Meel, et al.
24 days ago

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Ping Yu, Jack Lanchantin, Tianlu Wang, et al.
24 days ago

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual
Segmentation
Kaining Ying, Henghui Ding, Guanquan Jie, et al.
24 days ago

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
Xiao Fang, Minhyek Jeon, Zheyang Qin, et al.
24 days ago

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Ruifeng Yuan, Chenghao Xiao, Sicong Leng, et al.
24 days ago

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency
and Performance
Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, et al.
24 days ago

BANG: Dividing 3D Assets via Generative Exploded Dynamics
Longwen Zhang, Qixuan Zhang, Haoran Jiang, et al.
24 days ago

ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents
Yilei Jiang, Yaozhi Zheng, Yuxuan Wan, et al.
24 days ago

MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification
Dingkun Liu, Zhu Chen, Jingwei Luo, et al.
24 days ago

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
Zihan Zhao, Bo Chen, Ziping Wan, et al.
24 days ago

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng, Yibing Wang, Yeyao Ma, et al.
24 days ago

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, et al.
24 days ago

AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, et al.
25 days ago

Toward long-range ENSO prediction with an explainable deep learning model
Qi Chen, Yinghao Cui, Guobin Hong, et al.
25 days ago

OmniArch: Building Foundation Model for Scientific Computing
Tianyu Chen, Haoyi Zhou, Ying Li, et al.
25 days ago

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding
Shuquan Lian, Yuhang Wu, Jia Ma, et al.
a month ago

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework
Kuiye Ding, Fanda Fan, Yao Wang, et al.
a month ago

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token
Compression across Images, Videos, and Audios
Kele Shao, Keda Tao, Kejia Zhang, et al.
a month ago

SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment
Yixin Song, Zhenliang Xue, Dongliang Wei, et al.
a month ago

Reconstructing 4D Spatial Intelligence: A Survey
Yukang Cao, Jiahao Lu, Zhisheng Huang, et al.
a month ago

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for
Multi-Task Learning
Zedong Wang, Siyuan Li, Dan Xu
a month ago

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World
Shorts
Yuying Ge, Yixiao Ge, Chen Li, et al.
a month ago