HomeDocumentation
DocumentationLog In
Documentation

Step 1: Prepare dataset

This document contains an explanation of the dataset structure compatible with NetsPresso. NetsPresso supports tasks such as image classification, object detection, and semantic segmentation.


1. Datasets for image classification

Supported image file types

  • 'jpg', 'png', 'jpeg'

Supported formats

  • ImageNet format

ImageNet format

Dataset structure

  • ImageNet format has each class as a directory with the class name. If directories are not class names, the mapping.txt should be given.
  • Example of dataset structure
ImageNet format

mapping.txt for class information (optional)

  • All comments starting with '#', '//' are not allowed.
  • Example of mapping.txt
class-1 car
class-2 banana
...
class-n rice cake

2. Datasets for object detection

Supported image file types

  • 'bmp', 'jpg', 'jpeg', 'png', 'tif', 'tiff', 'dng', 'webp', 'mpo’

Supported formats

  • YOLO format (dataset yaml file needed) — Recommended
  • COCO format
  • VOC format

There are labeling tools, such as CVAT and labelimg, which support these formats.

If you are using Roboflow, YOLO v5 PyTorch, COCO, Pascal VOC formats are compatible with NetsPresso. (use Export-download zip to computer to download dataset)

YOLO format

Dataset structure

  • YOLO format has a '.txt' file for each image with the same file name. However, if there is no object in the image file, no '.txt' file is required for that image. Make sure that every '.txt' file requires a corresponding image file.
  • Example of dataset structure

Example of txt file

  • Example image with 4 objects (3 people, 1 hammer).
0 0.415730 0.494949 0.394864 0.828283
0 0.659711 0.476010 0.205457 0.648990
0 0.848315 0.375000 0.229535 0.674242
42 0.776083 0.356061 0.062600 0.161616
  • One row per object
  • Each row represents {class number} {center_x} {center_y} {width} {height}
  • Box coordinates must be in a normalized xywh format (from 0 - 1). If your boxes are in pixels, divide center_xand width by image width, and center_y and height by image height.
    • For example, an image above has a size of width 623px, height 396px. And the coordinates of the first object in its label are center_x 259, center_y 196, width 246, height 328. After normalization, the coordinates are center_x 0.415730, center_y 0.494949, width 0.394864, height 0.828283.
  • Class numbers are zero-indexed (starting from 0).

data.yaml for class information

  • YAML file that contains information about the class name and the number of classes is required for YOLO format.
  • All elements of the names must be written in the same class number as the dataset.
    • names: Names of the class
    • nc: The number of classes
  • Example of data.yaml
names:
- aeroplane
- bicycle
- bird
- boat
- bottle
- bus
- car
- cat
- chair
- cow
- diningtable
- dog
- horse
- motorbike
- person
- pottedplant
- sheep
- sofa
- train
- tvmonitor
nc: 20

COCO format

Dataset structure

  • Please refer to the official COCO Data format for COCO label format.
  • COCO format has one ‘.json’ file for all images in the same folder. Each set (train, test, and valid) should have its ‘.json’ file. Make sure that file names ‘.json’ file should be the same as the names of image files.
  • Example of dataset structure

VOC format

Dataset structure

  • Please refer to the official VOC Data format for VOC label format.
  • VOC format has one '.xml' file per image with the same file name. If there is no object in an image, no '.xml' file is required for the image. Make sure that every '.xml' file requires a corresponding image file.
  • Example of dataset structure

3. Datasets for semantic segmentation

Supported image file types

  • 'jpg', 'png'

Supported formats

  • UNet (like YOLO)

UNet (like YOLO)

Dataset structure

  • UNet (like YOLO) format has a single-channel image mask file for each image with the same file name. Mask file contains an integer value for every pixel that represents the segmentation class. The mask file and the image file should have the same image size. Make sure that every mask file requires a corresponding image file.
  • Example of dataset structure

id2label.json for class information

  • Number of classes of id2label.json should be same as the number of classes in the mask files.
  • Be sure that the name of json file must to be 'id2label.json', any single typo will cause the error.
  • Example of id2label.json
{
  "0": "road",
  "1": "sidewalk",
  "2": "building",
  "3": "wall",
  "4": "fence",
  "5": "pole",
  "6": "traffic light",
  "7": "traffic sign",
  "8": "vegetation",
  "9": "terrain",
  "10": "sky",
  "11": "person",
  "12": "rider",
  "13": "car",
  "14": "truck",
  "15": "bus",
  "16": "train",
  "17": "motorcycle",
  "18": "bicycle"
}

palette.json for color index for each class

  • Example of palette.json
{
	"0": [128, 64, 128], 
  "1": [244, 35, 232],
  "2": [70, 70, 70],
  "3": [102, 102, 156],
  "4": [190, 153, 153],
  "5": [153, 153, 153],
  "6": [250, 170, 30],
  "7": [220, 220, 0],
  "8": [107, 142, 35],
  "9": [152, 251, 152],
  "10": [70, 130, 180],
  "11": [220, 20, 60],
  "12": [255, 0, 0],
  "13": [0, 0, 142],
  "14": [0, 0, 70],
  "15": [0, 60, 100],
  "16": [0, 80, 100],
  "17": [0, 0, 230],
  "18": [119, 11, 32]
}