Home » Videos » Gaudi: Synapse AI Profiler

Gaudi: Synapse AI Profiler

A how-to tutorial to configure the profiler and use it to gather profiling traces.

Video Transcript

Hi, my name is Milind Pandit. I’m a deep learning manager at Habana. I lead a team of data scientists who help our customers get maximum value out of Habana products. This video entitled “Gaudi: Synapse AI Profiler,” is meant for machine learning engineers and it covers tools and techniques to profile the performance of models trained on Gaudi.
The Habana Gaudi Processor was custom built for training acceleration. It features heterogeneous compute with a cluster of fully programmable tensor processing cores or TPCs and configurable matrix math engines or MMEs. The ASIC integrates 32 gigabytes of high bandwidth memory and networking ports for RDMA over converged ethernet. We also provide a full software stack supporting popular deep learning frameworks. The profiling subsystem instruments Habana hardware and software systems. It generates diagnostic information about resource utilization. This is important for performance analysis and optimization. The “hl-prof-config” tool can be used to set the options for running the profiler.
The tool functions in two modes, Command Line Interface or CLI, and Graphical User Interface or GUI. The tool enables adjusting the software and hardware settings of the profiling subsystem, including changing the session name, output directory, output formats, and basic hardware settings of the instrumentation. I can run the tool from the command line as follows: This gives me a help message with all of the various arguments. We’re going to be profiling the training of a DLRM recommendation model. So, let’s configure the profiling subsystem to target the Gaudi device. And, set the trace file name prefix to “dlrm-profile”. Next, we’ll demonstrate how the profiler is run in automatic mode to generate an end-to-end trace across the entire run of a model training. This mode works for any workload that happens to use Gaudi. As you can see, no profiling JSON files are present before the training run. We simply set two environment variables. Then run the training as usual. As you can see, a “DLRM profile dot JSON” file has been generated. Next, I’ll show you how the profiler traces can be loaded into the Habana Labs Trace Viewer or HLTV, for analysis.
In a Chrome browser, visit “https://hltv.habana.ai” Click load, And select the “DLRM profile dot JSON” file we just generated. This is a visualization of the execution of a recipe on the hardware. It enables you quickly to identify bottlenecks resulting in slow performance.
This tool lets you zoom in and out of the trace visualization. This tool lets you pan left and right. And this tool, lets you measure the time between any two events. Using the default configuration, the profiling results are divided into three processes, DMA, MME, and TPC. The DMA section displays the activity of various memory buses. The MME and TPC sections show the beginning and end of hardware events occurring on those engines. Clicking on an event shows additional information regarding the event including the user node name, the operation kernel name, and the data type.
Next, we will demonstrate iteration-specific profiling. This allows the data scientist to focus on specific training iterations. Iteration specific profiling does require the training program to make synapse API calls to turn profiling on and off at the selected iterations. Here, my training application is taking an argument that specifies which iterations to generate profiling information for. And you will see some additional output for those specific iterations. And you now see an additional “DLRM profile dot JSON” file.
If we open up that file in HLTV, we can see that iterations 2, 6, and 12, have been profiled. And we can analyze the memory TPC and MME operations for those specific iterations. I hope I’ve shown you how flexible and useful synapse profiling subsystem is and how HLTV helps you understand resource utilization and bottlenecks. Be sure to check out our other videos on advanced topics, like setting up your system with Gaudi, and migrating your deep learning model from CPU to Gaudi. Thanks for your attention and for more information, visit developer.habana.ai.

Sign up for the latest Habana developer news, events, training, and updates.