Recommendation precision
Recommendation precision
get_recommendation_precision(self, input_model_path: str, output_dir: str, dataset_path: str | None, weight_precision: QuantizationPrecision = QuantizationPrecision.INT8, activation_precision: QuantizationPrecision = QuantizationPrecision.INT8, metric: SimilarityMetric = SimilarityMetric.SNR, threshold: float | int = 0, input_layers: List[Dict[str, int]] | None = None, wait_until_done: bool = True, sleep_interval: int = 30) → QuantizerMetadata
Get recommended precision for a model based on a specified quality threshold.
This function analyzes each layer of the given model and recommends precision settings
for layers that do not meet the specified threshold, helping to balance quantization
quality and performance.
- Parameters:
- input_model_path (str) – The file path where the model is located.
- output_dir (str) – The local folder path to save the quantized model.
- dataset_path (str) – Path to the dataset. Useful for certain quantizations.
- weight_precision (QuantizationPrecision) – Target precision for weights.
- activation_precision (QuantizationPrecision) – Target precision for activations.
- metric (SimilarityMetric) – Metric used to evaluate quantization quality.
- threshold (Union *[*float , int ]) – Quality threshold; layers below this threshold will
receive precision recommendations. - input_layers (List *[*Dict *[*str , int ] ] , optional) – Specifications for input shapes
(e.g., to convert from dynamic to static batch size). - wait_until_done (bool) – If True, waits for the quantization process to finish
before returning. If False, starts the process and returns immediately. - sleep_interval (int) – Interval, in seconds, between checks when wait_until_done
is True.
- Raises:
e – If an error occurs during the model quantization. - Returns:
Quantize metadata. - Return type:
QuantizerMetadata
Example
from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision
netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")
quantizer = netspresso.quantizer()
recommendation_metadata = quantizer.get_recommendation_precision(
input_model_path="./examples/sample_models/test.onnx",
output_dir="./outputs/quantized/automatic_quantization",
dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
weight_precision=QuantizationPrecision.INT8,
activation_precision=QuantizationPrecision.INT8,
threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)
Updated about 1 month ago