HyperAI

Finance-Instruct-500k Financial Reasoning Dataset

Date

4 days ago

Publish URL

huggingface.co

License

Apache 2.0

Categories

Download Help

Finance-Instruct-500k is a financial reasoning dataset designed for training high-level language models for financial tasks, reasoning, and multi-turn dialogue.

The dataset contains more than 500,000 high-quality data in the financial field, covering financial question answering, reasoning, sentiment analysis, topic classification, multilingual named entity recognition and conversational AI.

Dataset features:

  • Multi-round dialogue: Rich dialogue content, emphasizing contextual understanding and reasoning ability.
  • Diverse data sources: Contains data from multiple high-quality datasets such as Cinder and Sujet-Finance-Instruct-177k.
  • RAG format data: In the Retrieval Augmentation Generation (RAG) task, external data is appended before the user field to enhance context understanding.
  • Deduplication and preprocessing: Eliminate overlapping and irregular entries to obtain cleaner, higher-quality data.
  • XBRL Tagging: Contains structured financial entity tags from Financial-NER-NLP for advanced extraction tasks.