Scale systems with a single-card to thousands of Gaudi2s

Scaling deep learning with Gaudi2

Video Transcript

Scalability of deep learning AI systems is increasingly important as compute demands grow. Networking with Gaudi2 is based on the innovative gen one Gaudi architecture but brings even greater capacity and configuration flexibility to the AI data center.

Here’s how.

Habana’s second generation Gaudi2 advances AI networking with another first. It’s the only processor to integrate 24 100 gigabit RDMA over converged ethernet or RoCE ports on chip. That’s an increase from ten one hundred gigabit ethernet ports on the original generation Gaudi to 24. The HLS Gaudi2 server features eight mezzanine cards and integrates two dual socket Intel Xeon Ice Lake CPUs.

The eight Gaudi2 cards connect in an all to all configuration with 21 ports of each Gaudi dedicated to connecting to the other seven cards in the server. The remaining three 100 gigabit ethernet ports per Gaudi or 24 100 gigabit ethernet per server are dedicated to scaling out through six 400 gigabit QSFP-DD ports.

For workloads that require more than eight Gaudi2s, you can scale out system capacity by connecting each Gaudi2 HLS system to standard ethernet switching of your choice using up to six 400 gigabit ethernet cables. And as you can see in this Gaudi2 eight node cluster, you can scale out more capacity across more racks. For larger deep learning compute demands, multiple clusters can be easily and cost efficiently networked by a standard ethernet switching and scale out as far as you need.

The Gaudi2 data center you see here was built to conduct R&D at scale to inform hardware and software development. We are applying learnings from this data center to further optimize Habana’s software stack and Gaudi3 development.

Why? Because at Habana we are all in on AI.

Habana Labs.

Documentation