Nemotron-Pretraining-Dataset-sample Sampling Dataset
Date
a month ago
Size
79.87 MB
Publish URL
Paper URL
License
其他
* This dataset supports online use.Click here to jump.
Nemotron-Pretraining-Dataset-sample is a streamlined sampling version of the Nemotron pretraining dataset released by NVIDIA in 2025. The related paper results are "NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model".
The dataset contains 10 representative subsets selected from different components of the complete SFT and pre-training corpus, covering high-quality question-answering data, extracted content focused on the mathematical field, code metadata, and SFT-style instruction data, suitable for review and quick experiments.
Nemotron-Pretraining-Dataset-sample.torrent
Seeding 1Downloading 0Completed 11Total Downloads 46