Step 2: Review profiling results

Once your AI model is uploaded, you can review its detailed structure and performance profile before starting optimization.

What you can check in this step:

Total Latency
The model’s end-to-end inference time measured in milliseconds.
Peak Memory
Maximum memory consumption during inference.
Compute Units
Resource usage across CPU, GPU, and NPU (if applicable).
Framework / Runtime / Graph Optimization
Information about the model’s framework version, runtime environment, and whether graph optimization has been applied.

View all layers in the model along with:

Layer name and operation type
Latency contribution and compute unit
Robustness Score:
Shows how safely a layer can be quantized without significantly affecting model accuracy. A higher score means the layer is more tolerant to precision reduction.
Robustness Rank:
Ranks all layers based on their robustness scores. A higher rank means the layer is more "quantization-friendly" and can be safely optimized earlier in the pipeline.
Sorting and filtering options for precision, latency, and sensitivity

This helps you understand which layers are performance bottlenecks or accuracy-critical.

On the right side, a graph view of the model architecture allows you to: