AI Solution

Using Data Parallelism to Train Custom AI Models Faster

AI solution topics

Introduction
Demo
Prerequisites and setup
Getting started

Introduction

With powerful NVIDIA GPUs and techniques such as data parallelism, you can train your custom AI models faster.

Using multiple GPUs on a single node offers a performance boost, but hardware limitations might restrict the number of available GPUs. To overcome this, we can leverage distributed training across multiple Oracle Cloud Infrastructure (OCI) instances where we have two or more nodes connected over a network.

This is where parallelism comes in. Parallelism involves breaking down the computation into smaller parts that can be executed simultaneously on different computing resources.

Demo

Prerequisites and setup

Oracle Cloud account—sign-up page
Oracle Cloud Infrastructure—documentation
Oracle Cloud Marketplace NVIDIA GPU-Optimized VMI—documentation
Oracle Cloud GPU instances—documentation

Getting started

Detailed steps and sample code on GitHub