What is NetsPresso?

NetsPresso is

NetsPresso is a Hardware-aware AI Model Optimization Platform. NetsPresso automatically searches, compresses, and deploys optimized models on actual hardware.

One-stop-shop for optimized AI model development

NetsPresso has three modules in one pipeline. You can use the three modules independently to suit your development stage, and use the entire pipeline to automatically build, compress, and deploy models to create a production model.

NetsPresso Model Searcher: Automatically searches optimized models for a target device.

  • Model building : Build a new model by automatically training the model with the dataset.
  • Automatic search : Find a model that fits the target hardware and target performance.

NetsPresso Model Compressor: Compresses models for better computational efficiency

  • Compress : Compress the model by easily applying the compression technique to the model.
  • Accelerate: Make the model light and fast without sacrificing accuracy.

NetsPresso Model Launcher: Accelerated models, simplified deployment

  • Convert: Convert the model to a format executable on the target device.
  • Package: Package the model so that it can be deployed directly to the device.

Technologies in NetsPresso

NetsPresso Model Searcher


  • Automate time-consuming and repetitive AI model development tasks
  • The infrastructure is ready so that anyone can start developing the model right away, and it includes a retraining pipeline for experts.


  • Create multiple models within the search space using the Neural Architecture Search
  • With one project execution, it is possible to obtain model results that meet various purposes such as performance and trade-off

HW aware Model Searcher

  • Benchmarks can be created on the actual device without setting the device
  • Develop a model that meets the target latency according to the device

NetsPresso Model Compressor

Automatic Compression

  • Only focus on the compression, not wasting time implementing complicated methods
  • Compress PyTorch, TensorFlow models immediately

Structured Pruning, Filter Decomposition

  • Structured pruning directly improves the inference speed of AI models by reducing the amount of computation
  • Filter decomposition decomposes AI models and restores important information
  • Fine-tuning after compression is possible to restore the accuracy of the model.

HW-Aware Model Profiling

  • Visualize the neural network to check the structure and available layers to be compressed
  • Profile the neural network on the target hardware to decide which layers to compress and how much

NetsPresso Model Launcher


  • Quantization enables to run and accelerate an AI model on the target device.

Convert and package

  • Convert and compile AI models so that hardware can understand and run them.
  • Package AI models with processing codes to be ready for the deployment

Device farm

  • Actual hardwares are already installed in NetsPresso Device Farm to provide hardware-aware features.