AI infrastructure

Run the most demanding AI workloads faster, including generative AI, computer vision, and predictive analytics, anywhere in our distributed cloud. Use Oracle Cloud Infrastructure (OCI) Supercluster to scale up to 65,536 GPUs today and 131,072 GPUs soon.*

Speak to an AI expert

Now in GA: Largest, Fastest AI Supercomputer in the Cloud

OCI Supercluster: The Infrastructure Driving Generative AI at Scale (0:55)

Why run on OCI AI infrastructure?

Performance and value

Boost AI training with OCI’s unique GPU bare metal instances and ultrafast RDMA cluster networking that reduce latency to as little as 2.5 microseconds. Get up to 220% better pricing on GPU VMs than with other cloud providers.

HPC storage

Take advantage of high performance mount targets (HPMTs) for up to 500 Gb/sec of sustained throughput. Use 61.44 TB of local storage capacity, the highest in the industry for instances with NVIDIA H100 GPUs.

Sovereign AI

Oracle’s distributed cloud enables you to deploy AI infrastructure anywhere to help meet performance, security and AI sovereignty requirements.

Scalability of OCI Supercluster image, description below

OCI Supercluster with NVIDIA Blackwell and Hopper GPUs

Up to 131,072 GPUs, 8X more scalability
Network fabric innovations will enable OCI Supercluster to scale up to 131,072 NVIDIA B200 GPUs and more than 100,000 Blackwell GPUs in NVIDIA Grace Blackwell Superchips. OCI Supercluster scales up to 65,536 NVIDIA H200 GPUs today.

Learn more

OCI AI infrastructure for all your needs

Whether you’re looking to perform inferencing or fine-tuning or train large scale-out models for generative AI, OCI offers industry-leading bare metal and virtual machine GPU cluster options powered by an ultrahigh-bandwidth network and high performance storage to fit your AI needs.

AI infrastructure products diagram, description below — OCI also offers older-generation NVIDIA P100 and V100 GPUs

AI innovators leverage OCI to host, train, and inference next-generation AI models.

Read the announcement

Explore OCI Supercluster for large-scale AI training

Available now: Massive scale-out clusters with NVIDIA H100, A100, and L40S GPUs

Supercharged compute
• Bare metal instances without any hypervisor overhead
• Accelerated by NVIDIA H200, H100, L40S, and A100 GPUs
• Option to use AMD MI300X GPUs
• Data processing unit (DPU) for built-in hardware acceleration

Massive capacity and high-throughput storage
• Local storage: up to 61.44 TB of NVMe SSD capacity
• File storage: high performance mount target (HPMT) with up to 80 Gb/sec of throughput (now GA) and fully managed Lustre service (coming soon)
• Block storage: balanced, higher performance, and ultrahigh performance volumes with a performance SLA
• Object storage: distinct storage class tiers, bucket replication, and high capacity limits

Ultrafast networking
• Custom-designed RDMA over Converged Ethernet protocol (RoCE v2)
• 2.5 to 9.1 microseconds of latency for cluster networking
• Up to 3,200 Gb/sec of cluster network bandwidth
• Up to 200 Gb/sec of front-end network bandwidth

Compute for Supercluster

OCI bare metal instances powered by AMD MI300X, NVIDIA L40S, NVIDIA H100, and NVIDIA A100 GPUs let you run large AI models for use cases that include deep learning, conversational AI, and generative AI. With OCI Supercluster, you can scale up to 32,768 A100 GPUs, 16,384 H100 GPUs, 16,384 MI300X GPUs, and 3,840 L40S GPUs per cluster.

Enlarge+

Networking for Supercluster

High-speed RDMA cluster networking powered by NVIDIA ConnectX network interface cards with RDMA over Converged Ethernet version 2 lets you create large clusters of GPU instances with the same ultralow-latency networking and application scalability you expect on-premises.

You don’t pay extra for RDMA capability, block storage, or network bandwidth, and the first 10 TB of egress is free.

Enlarge+

Storage for OCI Supercluster

Through OCI Supercluster, customers can access local, block, object, and file storage for exascale computing. Among major cloud providers, OCI offers the highest capacity of high performance local NVMe storage for more frequent checkpointing during training runs, resulting in faster recovery from failures.

HPC file systems, including BeeGFS, GlusterFS, Lustre, and WEKA, can be used for AI training at scale without compromising performance.

Enlarge+

How OCI Supercluster works

Watch Chief Technical Architect Pradeep Vincent explain how OCI Supercluster powers the training and inferencing of machine learning models, scaling to tens of thousands of NVIDIA GPUs.

Read the blog

Train AI models on OCI bare metal instances powered by GPUs, RDMA cluster networking, and OCI Data Science.

Deep learning training and inferencing diagram, description below — Train AI models on OCI bare metal instances powered by GPUs, RDMA cluster networking, and OCI Data Science.

Protecting the billions of financial transactions that happen every day requires enhanced AI tools that can analyze large amounts of historical customer data. AI models running on OCI Compute powered by NVIDIA GPUs along with model management tools such as OCI Data Science and other open source models help financial institutions mitigate fraud.

Fraud detection augmented by AI diagram, description below — AI models running on OCI Compute powered by NVIDIA GPUs along with model management tools such as OCI Data Science and other open source models help financial institutions mitigate fraud.

AI is often used to analyze various types of medical images (such as X-rays and MRIs) in a hospital. Trained models can help prioritize cases that need immediate review by a radiologist and report conclusive results on others.

AI-based medical image analysis diagram, description below — Trained models running on OCI Compute powered by GPUs can help analyze medical images and provide immediate conclusive results or prioritize images for further review.

Drug discovery is a time consuming and expensive process that can take many years and cost millions of dollars. By leveraging AI infrastructure and analytics, researchers can accelerate drug discovery. Additionally, OCI Compute powered by NVIDIA GPUs along with AI workflow management tools such as BioNeMo enables customers to curate and preprocess their data.

Using AI to accelerate drug discovery, description below — Leveraging AI Infrastructure and analytics, researchers can accelerate drug discovery and curate and preprocess their data.

AI infrastructure customer successes

Explore more customer stories

Get started with OCI AI infrastructure

Try Oracle AI and get a 30-day trial

Oracle offers a free pricing tier for most AI services as well as a free trial account with US$300 in credits to try additional cloud services. AI services are a collection of offerings, including generative AI, with prebuilt machine learning models that make it easier for developers to apply AI to applications and business operations.

Try Oracle AI for free

Which Oracle AI and ML services offer a free pricing tier?
- OCI Speech
- OCI Language
- OCI Vision
- OCI Document Understanding
- Machine Learning in Oracle Database
- OCI Data Labeling
You also only have to pay compute and storage charges for OCI Data Science.

Additional resources

Learn more about RDMA cluster networking, GPU instances, bare metal servers, and more.

Documentation
Related pages

See how much you can save with OCI

Oracle Cloud pricing is simple, with consistent low pricing worldwide, supporting a wide range of use cases. To estimate your low rate, check out the cost estimator and configure the services to suit your needs.

Try Cost Estimator

Access AI subject matter experts

Get help with building your next AI solution or deploying your workload on OCI AI infrastructure.

Speak to an AI expert

They can answer questions such as
- How do I get started with Oracle Cloud?
- What kinds of AI workloads can I run on OCI?
- What types of AI services does OCI offer?

AI infrastructure

Why run on OCI AI infrastructure?

Performance and value

HPC storage

Sovereign AI

OCI Supercluster with NVIDIA Blackwell and Hopper GPUs

OCI AI infrastructure for all your needs

AI innovators leverage OCI to host, train, and inference next-generation AI models.

Available now: Massive scale-out clusters with NVIDIA H100, A100, and L40S GPUs

Compute for Supercluster

Networking for Supercluster

Storage for OCI Supercluster

How OCI Supercluster works

Get started with OCI AI infrastructure

Try Oracle AI and get a 30-day trial

Which Oracle AI and ML services offer a free pricing tier?

Additional resources

Documentation

Related pages

See how much you can save with OCI

Access AI subject matter experts

They can answer questions such as