HyperAIHyperAI

HyperAI Auto Modeling Data Format Specification Introduction

Introduction

HyperAI Data Format is a dataset format organization standard defined by HyperAI, used for HyperAI auto modeling and related products. After organizing the dataset format according to this specification, auto modeling can use this dataset to automatically build deep learning models.

The HyperAI data format uses meta.csv as the main format file for the dataset, with the file primarily in csv format:

  • The first row contains field types and field names, in the format: [type]_[name]
  • The second row and each subsequent row represents a data sample.

Field Names

  1. Field names are named using uppercase and lowercase English letters.
  2. Fields starting with "*" will be ignored; auto modeling training will ignore these fields.
  3. Label is a dedicated field that specifically refers to labels in training data. Only one field name can be Label in the field names.

Field Types

Field types indicate the data type of that column field, including simple fields: int, float, category, txt. The values of simple fields are the corresponding column values in each row of meta.csv. Another category is complex fields: text, image, video, json. Since complex fields cannot be represented in meta.csv, the corresponding value for complex fields is a relative path indicating the file in the dataset that corresponds to that field's value.

  • int - integer value
  • float - floating point number
  • category - categorical value
  • txt - short text value
  • text - text file, entire content within the file
  • image - image file, formats include: jpg, png, tif
  • video - video file, formats include: mp4
  • json - complex annotation data, with corresponding definition methods based on different problems

Data Formats for Various Problems

Object Detection

Since the Label field content for object detection is extensive, a separate Json file is used for annotation. 001.jpg is an original image, and 001.json contains annotations for multiple objects in that image along with their corresponding types.

json_Label,image_Source
labels/001.json,images/001.jpg

For detailed description, see Object Detection

Semantic Segmentation

001_mask.jpg and 001.jpg are two images of the same size, where each pixel in 001_mask.jpg is the annotation for the corresponding position in 001.jpg.

image_Label,image_Source
images/001_mask.jpg,images/001.jpg

Instance Segmentation

Instance segmentation

FAQ

  1. In annotation file names and file content, it is best to use only English characters, numbers, underscores, and similar characters. Avoid using Chinese characters to prevent unexpected encoding issues.
  2. All coordinates in the annotation specification are relative position coordinates. As shown in the figure below, the coordinate point is (X/800, Y/600)