Recommendation precision

Recommendation precision

get_recommendation_precision(self, input_model_path: str, output_dir: str, dataset_path: str | None, weight_precision: QuantizationPrecision = QuantizationPrecision.INT8, activation_precision: QuantizationPrecision = QuantizationPrecision.INT8, metric: SimilarityMetric = SimilarityMetric.SNR, threshold: float | int = 0, input_layers: List[Dict[str, int]] | None = None, wait_until_done: bool = True, sleep_interval: int = 30) → QuantizerMetadata

Get recommended precision for a model based on a specified quality threshold.

This function analyzes each layer of the given model and recommends precision settings
for layers that do not meet the specified threshold, helping to balance quantization quality and performance.

  • Parameters:
    • input_model_path (str) – The file path where the model is located.
    • output_dir (str) – The local folder path to save the quantized model.
    • dataset_path (str) – Path to the dataset. Useful for certain quantizations.
    • weight_precision (QuantizationPrecision) – Target precision for weights.
    • activation_precision (QuantizationPrecision) – Target precision for activations.
    • metric (SimilarityMetric) – Metric used to evaluate quantization quality.
    • threshold (Union *[*float , int ]) – Quality threshold; layers below this threshold will
      receive precision recommendations.
    • input_layers (List *[*Dict *[*str , int ] ] , optional) – Specifications for input shapes
      (e.g., to convert from dynamic to static batch size).
    • wait_until_done (bool) – If True, waits for the quantization process to finish
      before returning. If False, starts the process and returns immediately.
    • sleep_interval (int) – Interval, in seconds, between checks when wait_until_done
      is True.
  • Raises:
    e – If an error occurs during the model quantization.
  • Returns:
    Quantize metadata.
  • Return type:
    QuantizerMetadata

Example

from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision


netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")

quantizer = netspresso.quantizer()
recommendation_metadata = quantizer.get_recommendation_precision(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/automatic_quantization",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    weight_precision=QuantizationPrecision.INT8,
    activation_precision=QuantizationPrecision.INT8,
    threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)