For debug and profiling, users can now retrieve metrics for cpu_fallback, memory_defragmentation, recipe_cache using Metric APIs.
We have added a new kernel FusedSDPA, a fused implementation of the nn.functional.scaled_dot_product_attention() API on the HPU.
This release also includes some LLM inference performance improvements. Check out Habana’s model performance page.
Lastly a reminder that support for Habana Mixed Precision (HMP) is deprecated and will be dropped in the next release. Users should plan to switch to autocast for mixed precision support.
You can find more information on SynapseAI 1.11.0 on Habana’s release notes page.