HyperAIHyperAI

Latest Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical
  Register Indexing
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
Zhuoqun Li, Xuanang Chen, Hongyu Lin, et al.
5 days ago
DINOv3
DINOv3
Oriane Siméoni, Huy V. Vo, Maximilian Seitzer, et al.
5 days ago
SSRL: Self-Search Reinforcement Learning
SSRL: Self-Search Reinforcement Learning
Yuchen Fan, Kaiyan Zhang, Heng Zhou, et al.
5 days ago
Thyme: Think Beyond Images
Thyme: Think Beyond Images
Yi-Fan Zhang, Xingyu Lu, Shukang Yin, et al.
5 days ago
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Jean de Dieu Nyandwi, Yueqi Song, Simran Khanuja, et al.
6 days ago
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset
Ryan Langman, Xuesong Yang, Paarth Neekhara, et al.
6 days ago
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
Zhihao Li, Zimo Ji, Tao Zheng, et al.
6 days ago
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Junde Wu, Jiayuan Zhu, Yunli Qi, et al.
6 days ago
Puppeteer: Rig and Animate Your 3D Models
Puppeteer: Rig and Animate Your 3D Models
Chaoyue Song, Xiu Li, Fan Yang, et al.
6 days ago
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Yushi Lan, Yihang Luo, Fangzhou Hong, et al.
6 days ago
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Mo Yu, Tsz Ting Chung, Chulun Zhou, et al.
6 days ago
ToonComposer: Streamlining Cartoon Production with Generative
  Post-Keyframing
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Lingen Li, Guangzhi Wang, Zhaoyang Zhang, et al.
6 days ago
NextStep-1: Toward Autoregressive Image Generation with Continuous
  Tokens at Scale
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
NextStep Team, Chunrui Han, Guopeng Li, et al.
6 days ago
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Runqi Qiao, Qiuna Tan, Peiqing Yang, et al.
6 days ago
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
Ishant Chintapatla, Kazuma Choji, Naaisha Agarwal, et al.
9 days ago
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Wen Huang, Jiarui Yang, Tao Dai, et al.
9 days ago
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving
Jian Wang, Chaokang Jiang, Haitao Xu
9 days ago
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
  Long-Term Memory
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long, Yichen He, Wentao Ye, et al.
9 days ago
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion
  Forcing
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
Xu Wang, Chenkai Xu, Yijie Jin, et al.
9 days ago
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust
  GAIA Problem Solving
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving
Zhitian Xie, Qintong Wu, Chengyue Yu, et al.
9 days ago
Story2Board: A Training-Free Approach for Expressive Storyboard
  Generation
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
David Dinkevich, Matan Levy, Omri Avrahami, et al.
9 days ago
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
Bowen Xue, Qixin Yan, Wenjing Wang, et al.
9 days ago
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Jiatong Li, Weida Wang, Qinggang Zhang, et al.
9 days ago
Llama-Nemotron: Efficient Reasoning Models
Llama-Nemotron: Efficient Reasoning Models
Akhiad Bercovich, Itay Levy, Izik Golan, et al.
10 days ago
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Goeric Huybrechts, Srikanth Ronanki, Sai Muralidhar Jayanthi, et al.
10 days ago
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Junyan Ye, Dongzhi Jiang, Zihao Wang, et al.
10 days ago
Virtual staining of label-free tissue in imaging mass spectrometry
Virtual staining of label-free tissue in imaging mass spectrometry
Yijie Zhang, Luzhe Huang, Nir Pillar, et al.
10 days ago
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
Lingjie Jiang, Shaohan Huang, Xun Wu, et al.
10 days ago
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Jiejun Tan, Zhicheng Dou, Yan Yu, et al.
10 days ago
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language
  Models
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Wen Wang, Bozhen Fang, Chenchen Jing, et al.
10 days ago