Custom Precision Quantization by Operator Type

Custom Precision Quantization by Operator Type

custom_precision_quantization_by_operator_type(self, input_model_path: str, output_dir: str, dataset_path: str | None, precision_by_operator_type: List[PrecisionByOperator], default_weight_precision: QuantizationPrecision = QuantizationPrecision.INT8, default_activation_precision: QuantizationPrecision = QuantizationPrecision.INT8, metric: SimilarityMetric = SimilarityMetric.SNR, input_layers: List[Dict[str, int]] | None = None, wait_until_done: bool = True, sleep_interval: int = 30) → QuantizerMetadata

Apply custom quantization to a model, specifying precision for each operator type.

This function allows for highly customizable quantization by enabling the user to specify
the quantization precision (e.g., INT8, FP16) for each operator type within a model. The precision_by_operator_type parameter is a list of mappings where each entry indicates the quantization precision for a specific operator type, such as convolution (Conv), matrix multiplication (MatMul), etc.

Using precision_by_operator_type, users can selectively fine-tune the quantization
strategy for different operators within the model, based on performance requirements or hardware capabilities. Operators not explicitly specified in precision_by_operator_type will fall back to default_weight_precision and default_activation_precision.

  • Parameters:
    • input_model_path (str) – The file path where the model is located.
    • output_dir (str) – The local folder path to save the quantized model.
    • dataset_path (str) – Path to the dataset. Useful for certain quantizations.
    • precision_by_operator_type (List *[*PrecisionByOperator ]) – List of PrecisionByOperator objects that specify the desired precision for each
      operator type in the model. Each entry includes: type (str): The operator type (e.g., Conv, MatMul). precision (QuantizationPrecision): The quantization precision level.
    • default_weight_precision (QuantizationPrecision) – Weight precision.
    • default_activation_precision (QuantizationPrecision) – Activation precision.
    • metric (SimilarityMetric) – Quantization quality metrics.
    • input_layers (List *[*InputShape ] , optional) – Target input shape for quantization (e.g., dynamic batch to static batch).
    • wait_until_done (bool) – If True, wait for the quantization result before returning
      the function. If False, request the quantization and return immediately.
    • sleep_interval (int) – Interval in seconds between checks when wait_until_done is True.
  • Raises:
    e – If an error occurs during the model quantization.
  • Returns:
    Quantization metadata containing status, paths, etc.
  • Return type:
    QuantizerMetadata

Example

from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision


netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")

quantizer = netspresso.quantizer()

recommendation_metadata = quantizer.get_recommendation_precision(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/automatic_quantization",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    weight_precision=QuantizationPrecision.INT8,
    activation_precision=QuantizationPrecision.INT8,
    threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)

quantization_result = quantizer.custom_precision_quantization_by_operator_type(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/custom_precision_quantization_by_operator_type",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    precision_by_operator_type=recommendation_precisions.operators,
)