Model Optimization and Debugging
Profiling and optimization are important to transform your model from functional to being optimized and performant. To achieve this, follow the steps outlined below and refer to the optimization guide documentation.
To get started, follow the Performance Optimization Guide Checklist and review the summary below.
The optimization of a model will fall into three main categories
- Initial Model porting – Ensures the model is functional on Intel Gaudi processors by running GPU migration. Follow the Getting Started for Training or Inference.
- Model Optimizations – Includes the general enhancements for performance and applies to most models. This includes managing Dynamic Shapes, using HPU_graphs for Training or Inference
- Profiling – Allows you to identify bottlenecks on the host CPU or Intel Gaudi. It is recommended to follow the steps below, by first using Tensorboard with the Intel Gaudi Plug-in to identify specific items
Activity | Details | Link |
---|---|---|
First Step: PyTorch Profiling with TensorBoard | Obtains Gaudi-Specific recommendations for performance using TensorBoard | Profiling with Pytorch |
Review the PT_HPU_METRICS_FILE | Looks for excessive re-compilations during runtime. | Setting HPU Metrics Review |
Intel Gaudi Profiling Trace Viewer | Uses the Intel Gaudi specific Perfetto trace viewer for in-depth analysis of CPU and Intel Gaudi activity. | Getting started with Perfetto Profiler |
Model Logging | Sets ENABLE_CONSOLE to set Logging for debug and analysis. | Runtime Environment Variables |
Tutorial
Profiling and Optimization
Videos
Maximizing Model Performance