#2 Classification: ResNet50 on CIFAR100

This page introduces how Nota could compress and optimize the ResNet50 model with NetsPresso Model Compressor and Nota’s unique search algorithm.


Let's briefly explain the input model and training details used in the example.

1) Model & Training Code

  • Data: CIFAR100
  • Model: ResNet50 (TensorFlow)

Model and training code are in the [NetsPresso Model Compressor ModelZoo] (https://github.com/Nota-NetsPresso/NetsPresso-CompressionToolkit-ModelZoo/tree/main/models/tensorflow).
Training code is required to fine-tune the model after compression.

2) Train Config

The following table describes the configurations for training:

NormalizationMean: [0.4914, 0.4822, 0.4465] SD: [0.2023, 0.1994, 0.2010]
Data augmentationRandomCrop(32, padding=4), RandomHorizontalFlip
Learning Rate0.1
OptimizerSGD (momentum: 0.9)
Batch size128
LR SchedulerReduceLROnPlateau

Simple Compression


Stage 1: Pruning

First, we adopted a Pruning. The aim of Pruning is to reduce the computational resources and accelerate the model by removing less important and redundant filters.

Please refer to the attached document if you would like to learn more about pruning.

Since there is a huge variation in layers' significance, the performance of a model may fluctuate dramatically depending on which filter is removed.

NPTK offers you two options to implement Pruning:

  1. Use our recommended values (automatically calculated)
  2. Use your own values (customized)

Recommendation function, based on SLAMP, presents the pruning ratio for each layer in a few seconds.

1) L2 Norm

1158 1103 497

Pruning ratio is how much filters to be pruned. As shown above, pruning_ratio=0.5 means to remove 50% of the network.

2) Fine-tuning

After the compression, a model culminates in a significant performance deterioration. Thus, fine-tuning is an essential process to recover the original performance.

To fine-tune the model, run the train code we provided NetsPresso Compression Toolkit ModelZoo.
Unlike VGG19, we maintained the same configuration for all parameters but a learning rate; the learning rate was reduced to 1/10 of its original value (from 0.1 to 0.01)

Stage 2: Filter Decomposition

Now, we adopted a Filter Decomposition (FD) for further lighten the model. The aim of FD is to approximate the original filter's representation with fewer filters (ranks).

Since each layer has different amounts of information, finding an optimum rank for each layer is crucial to minimize the accuracy drop.

Please refer to the attached document if you would like to learn more about FD.

Netspresso Compression Toolkit offers you two options to implement FD:

  1. Use our recommended values (automatically calculated)
  2. Use your own values (customized)

Recommendation function, based on VBMF, presents the optimal rank for each layer in a minute.

1) Tucker Decomposition

1122 496

Calibration ratio allows a user to easily increase or decrease filter's rank to search for a better performing model. The new rank is calculated by adding (removed rank x calibration ratio) to the remained rank. The ratio is set to 0 as default.

More detailed information can be found at Calibration Ratio Description.

This is what you will see when you click the Recommendation button.



Why there is no recommendation for the first layer?

It is because "In Channel" of the first layer is 3, so there is not much information.

Recommendation function automatically excludes such layers to avoid significant drop in accuracy.

2) Fine-tuning

As we did after Pruning, we fine-tuned the model after filter decomposition it.
It can be done using the training code provided. Again, we maintained the same configuration for all parameters but a learning rate; the learning rate was reduced to 1/10 of its original value (from 0.1 to 0.01)

Result of Compression

NameAcc (%)FLOPsParams (M)Model Size (MB)
Pruning77.04 (-0.99)859.28 (3.02x)5.59 (4.24x)22.94 (4.17x)
Pruning + FD76.92 (-1.11)613.43 (4.23x)2.64 (8.99x)11.51 (8.3x)

Search-Based Compression

To achieve the best, we utilized our own unique in-house search algorithm to find better compression parameters (rank & pruning ratio).

Unfortunately, the algorithm is only for internal usage and not provided in this trial version. If you are interested in our solution, please reach out to us.

The figure below illustrates our procedure.


The compression parameters obtained from our search algorithm are listed below. Using the parameters, you are able to create a model with the same performance.


Why is there a difference in performance between mine and Nota's?

Due to stochastic causes, fine-tuning may lead to a minor variance in accuracy.

Stage 1: Pruning

1) Pruning Compression Parameter

# layer name : Pruning Ratio
'conv1' : 0.7
'layer1.0.conv1' : 0.9
'layer1.0.conv2' : 0.7
'layer1.1.conv1' : 0.9
'layer1.1.conv2' : 0.7
'layer1.2.conv1' : 0.9
'layer1.2.conv2' : 0.7
'add_2' : 0.5
'layer2.0.conv1' : 0.8
'layer2.0.conv2' : 0.7
'layer2.1.conv1' : 0.9
'layer2.1.conv2' : 0.9
'layer2.2.conv1' : 0.9
'layer2.2.conv2' : 0.9
'layer2.3.conv1' : 0.9
'layer2.3.conv2' : 0.9
'add_6' : 0.4
'layer3.0.conv1' : 0.6
'layer3.0.conv2' : 0.4
'layer3.1.conv1' : 0.9
'layer3.1.conv2' : 0.9
'layer3.2.conv1' : 0.9
'layer3.2.conv2' : 0.9
'layer3.3.conv1' : 0.9
'layer3.3.conv2' : 0.9
'layer3.4.conv1' : 0.9
'layer3.4.conv2' : 0.7
'layer3.5.conv1' : 0.7
'layer3.5.conv2' : 0.6
'add_12' : 0.2
'layer4.0.conv1' : 0.4
'layer4.0.conv2' : 0.4
'layer4.1.conv1' : 0.9
'layer4.1.conv2' : 0.9
'layer4.2.conv1' : 0.3
'layer4.2.conv2' : 0.3
'add_15' : 0.4

Stage 2: Filter Decomposition

1) Filter Decomposition Compression Parameter

# layer name : [In Rank, Out Rank]
'layer1.0.shortcut.0': [13, 13]
'layer1.0.conv3': [11, 8]
'layer1.1.conv3': [6, 6]
'layer1.2.conv2': [4, 8]
'layer1.2.conv3': [7, 6]
'layer2.0.conv1': [10, 10]
'layer2.0.conv2': [15, 16]
'layer2.0.shortcut.0': [44, 45]
'layer2.0.conv3': [21, 16]
'layer2.1.conv1': [7, 7]
'layer2.1.conv2': [7, 9]
'layer2.1.conv3': [8, 5]
'layer3.0.conv1': [37, 37]
'layer3.0.conv2': [50, 60]
'layer3.0.shortcut.0': [98, 90]
'layer3.0.conv3': [54, 41]
'layer3.1.conv2': [12, 13]
'layer3.2.conv1': [6, 16]
'layer3.2.conv2': [14, 12]
'layer3.3.conv2': [13, 11]
'layer3.4.conv2': [14, 16]
'layer3.5.conv2': [33, 36]
'layer3.5.conv3': [42, 22]
'layer4.0.conv1': [75, 75]
'layer4.0.conv2': [100, 97]
'layer4.0.shortcut.0': [144, 177]
'layer4.0.conv3': [89, 79]
'layer4.1.conv2': [21, 16]
'layer4.2.conv1': [99, 99]
'layer4.2.conv2': [136, 109]
'layer4.2.conv3': [99, 98]

Result of Compression

NameAcc (%)FLOPsParams (M)Model Size (MB)
Pruning77.31 (-0.72)403.60 (6.43x)5.97 (3.97x)24.51 (3.9x)
Pruning + FD76.63 (-1.4)224.70 (11.55x)2.17 (10.91x)9.54 (10.02x)

NPTK-Simple vs NPTK-Search

The results from Nota-Simple and Nota-Search



We have walked you through how we could compress a CNN model with NetsPresso Compression Toolkit (NPTK). You could see how NPTK reduced FLOPs, parameters, and a model size while maintaining the accuracy of an original model.

There are more blogs coming - object detection, super resolution. To stay up to date, sign up or subscribe at netspresso.ai.