Models are typically trained in full precision, such as FP32, which can be too large with too much latency for inference and production. With the latest release, Intel Distribution of OpenVINO toolkit adds a new tool called post-training optimization to convert models into low-precision formats, such as int8. That means developers can reduce latency, memory, and on-disk footprint without having to retrain their models.
The new release features support for custom layers deployed with Intel Movidius VPUs. Previously, some developers would need to customize certain deep learning “layers” in their trained models based on use case, latency, or other requirements. Now, they can do this customization across platforms with custom layer support for VPUs, as well as CPUs and iGPUs, from Intel.