Custom Precision Quantization by Operator Type
Custom Precision Quantization by Operator Type
custom_precision_quantization_by_operator_type(self, input_model_path: str, output_dir: str, dataset_path: str | None, precision_by_operator_type: List[PrecisionByOperator], default_weight_precision: QuantizationPrecision = QuantizationPrecision.INT8, default_activation_precision: QuantizationPrecision = QuantizationPrecision.INT8, metric: SimilarityMetric = SimilarityMetric.SNR, input_layers: List[Dict[str, int]] | None = None, wait_until_done: bool = True, sleep_interval: int = 30) → QuantizerMetadata
Apply custom quantization to a model, specifying precision for each operator type.
This function allows for highly customizable quantization by enabling the user to specify
the quantization precision (e.g., INT8, FP16) for each operator type within a model. The
precision_by_operator_type parameter is a list of mappings where each entry indicates
the quantization precision for a specific operator type, such as convolution (Conv),
matrix multiplication (MatMul), etc.
Using precision_by_operator_type, users can selectively fine-tune the quantization
strategy for different operators within the model, based on performance requirements
or hardware capabilities. Operators not explicitly specified in
precision_by_operator_type will fall back to default_weight_precision and
default_activation_precision.
- Parameters:
- input_model_path (str) – The file path where the model is located.
- output_dir (str) – The local folder path to save the quantized model.
- dataset_path (str) – Path to the dataset. Useful for certain quantizations.
- precision_by_operator_type (List *[*PrecisionByOperator ]) – List of PrecisionByOperator objects that specify the desired precision for each
operator type in the model. Each entry includes: type (str): The operator type (e.g., Conv, MatMul). precision (QuantizationPrecision): The quantization precision level. - default_weight_precision (QuantizationPrecision) – Weight precision.
- default_activation_precision (QuantizationPrecision) – Activation precision.
- metric (SimilarityMetric) – Quantization quality metrics.
- input_layers (List *[*InputShape ] , optional) – Target input shape for quantization (e.g., dynamic batch to static batch).
- wait_until_done (bool) – If True, wait for the quantization result before returning
the function. If False, request the quantization and return immediately. - sleep_interval (int) – Interval in seconds between checks when wait_until_done is True.
- Raises:
e – If an error occurs during the model quantization. - Returns:
Quantization metadata containing status, paths, etc. - Return type:
QuantizerMetadata
Example
from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision
netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")
quantizer = netspresso.quantizer()
recommendation_metadata = quantizer.get_recommendation_precision(
input_model_path="./examples/sample_models/test.onnx",
output_dir="./outputs/quantized/automatic_quantization",
dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
weight_precision=QuantizationPrecision.INT8,
activation_precision=QuantizationPrecision.INT8,
threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)
quantization_result = quantizer.custom_precision_quantization_by_operator_type(
input_model_path="./examples/sample_models/test.onnx",
output_dir="./outputs/quantized/custom_precision_quantization_by_operator_type",
dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
precision_by_operator_type=recommendation_precisions.operators,
)
Updated about 1 month ago