AI Solution

Using Data Parallelism to Train Custom AI Models Faster

Introduction

With powerful NVIDIA GPUs and techniques such as data parallelism, you can train your custom AI models faster.

Using multiple GPUs on a single node offers a performance boost, but hardware limitations might restrict the number of available GPUs. To overcome this, we can leverage distributed training across multiple Oracle Cloud Infrastructure (OCI) instances where we have two or more nodes connected over a network.

This is where parallelism comes in. Parallelism involves breaking down the computation into smaller parts that can be executed simultaneously on different computing resources.

Demo

Demo: Using Data Parallelism to Train Custom AI Models Faster (1:56)

Prerequisites and setup

  1. Oracle Cloud account—sign-up page
  2. Oracle Cloud Infrastructure—documentation
  3. Oracle Cloud Marketplace NVIDIA GPU-Optimized VMI—documentation
  4. Oracle Cloud GPU instances—documentation