AI Solutions Hub

Enter a new era of productivity with generative AI solutions for your business. Leverage AI, embedded as you need it, across the full stack.

AI and Machine Learning

Analytics and AI

Compute

Containers

Data Lake

Developer Services

Generative AI

Identity and Security

  • Deploy AI Apps Fast

    Help speed up your AI application deployment using Oracle Cloud and Kubernetes, enhancing scalability and reliability with cloud native strategies.

Integration

Marketplace

  • Scale NVIDIA NIM Inference

    Deploy NVIDIA NIM on OCI Kubernetes Engine for scalable, efficient inference using OCI Object Storage and NVIDIA GPUs for optimal performance.

Open-source-databases

Oracle Database

Storage

Typical scenarios

Build with OCI Generative AI

Watch the sample solution video (1:36)

Harness the power of LLMs in a managed service

In the fast-paced world of software development, staying informed is crucial. Imagine having an AI assistant that can help quickly transform a complex webpage into content that’s bite-sized, easily consumable, and shareable. This is one of many things that Oracle Cloud Infrastructure (OCI) Generative AI can help you do.

Below is an example of how you can build such an AI assistant with OCI Generative AI.

The AI-powered GitHub trending projects summarizer is a personal content generation engine that automatically retrieves and summarizes the top 25 trending GitHub projects. OCI Generative AI helps extract, read, and compile each project’s README file into a concise, engaging, and informative summary that can be shared with others.

Try it out, with detailed steps and sample code on GitHub.

Choosing models

You can easily switch between multiple LLMs offered through OCI Generative AI simply by modifying the model_id variable in summarize_llm.py.

  • cohere.command-r-16k: A versatile model for general language tasks, such as text generation, summarization, and translation, with a context size of 16K tokens. Ideal for building conversational AI with a good balance of performance and cost-effectiveness.
  • cohere.command-r-plus: An enhanced version with more sophisticated understanding and deeper language capabilities. Best for complex tasks requiring nuanced responses and higher processing capacity.
  • meta.llama-3.3-70b-instruct: A 70B parameter model with 128K token context length and multilingual support.
  • meta.llama-3.1-405b-instruct: The largest publicly available LLM (405B parameters) with exceptional capabilities in reasoning, synthetic data generation, and tool use. Best for enterprise applications requiring maximum performance.

The above is a subset of available models. We’re constantly making newer models available.

Below is a code snippet to call OCI Generative AI:

content.text = """Generate an abstractive summary of the given Markdown contents. Here are the contents to summarize: {}""".format(summary_txt)


chat_detail.content = content.text 

chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="meta.llama-3.1-405b-instruct") # configurable model chat_response = generative_ai_inference_client.chat(chat_detail)

Use OCI Generative AI Agents with RAG

Watch the sample solution video (1:44)

Provide improved access to knowledge bases

Retrieval-augmented generation (RAG) is one of the most important use cases for AI. RAG lets you augment the knowledge of an LLM without retraining it. It’s a way for the LLM to extract new information, from a database or elsewhere, and quickly present it to the end user.

This allows the LLM to acquire up-to-date knowledge regardless of when the LLM was trained and when inference was run. As a result, the updated data can make your LLM more intelligent with little to no effort.

After uploading documents to Oracle Cloud Infrastructure (OCI) GenAI Agents, the service will process the data and provide a way to consume it through a chatbot.

Try it out, with detailed steps and sample code on GitHub.

Below is a code snippet for using the RAG agent in OCI:

# ask a question to RAG agent question = "What steps do I take if I have a new patient under the patient admission recommendations?" # Initialize service client with default config file agent_runtime_client = GenerativeAiAgentRuntimeClient(config)


chat_response = agent_runtime_client.chat( agent_endpoint_id="ocid1.test.oc1..<id>", chat_details=ChatDetails(user_message=question)) 

# Get the data from response print(chat_response.data)

Build with Oracle HeatWave GenAI

Watch the sample solution video (3:54)

Speed up AppDev with integrated GenAI

Generative AI can be especially good at helping to summarize sentiment, as this scenario shows. An ecommerce site may have hundreds of stock-keeping units, or SKUs, with dozens of reviews for each one. To help quickly summarize product reviews, developers can tap into HeatWave GenAI’s integrated capabilities, using in-database large language models and an automated, in-database vector store.

HeatWave GenAI can also help translate and analyze sentiment on demand. All operations can be automated with HeatWave GenAI, keeping summaries up-to-date as new reviews are added.

By keeping the data and processing within HeatWave, developers can scale solutions with their GenAI needs, making AI as simple as a database query.

Try it out, with detailed steps and sample code on GitHub.

Below is a code snippet illustrating how to summarize positive reviews:

SELECT "################### Computing summaries for EXISTING reviews on a product ###################" AS "";

SELECT "" AS "";

CALL SUMMARIZE_TRANSLATE(1, "POSITIVE", "en", @positive_english_summary);
SELECT @positive_english_summary AS "--- English summary of positive reviews on the T-Shirt ---";

Build with open source models on OCI

Watch the sample solution video (1:30)

Leverage open source GenAI models on a unified platform

Open source LLMs, such as those created by Hugging Face, are powerful tools that let developers try out GenAI solutions relatively quickly. Kubernetes, combined with Oracle Cloud Infrastructure (OCI), enables GenAI solutions to scale, while also providing flexibility, portability and resilience.

In this demo, you’ll see how easy it can be to deploy fine-tuned LLM inference containers on OCI Kubernetes Engine, a managed Kubernetes service that simplifies deployments and operations at scale for enterprises. The service enables developers to retain the custom model and data sets within their own tenancy without relying on a third-party inference API.

We’ll use Text Generation Inference as the inference framework to expose the LLMs.

Try it out, with detailed steps and sample code on GitHub.

Below is a code snippet illustrating how to deploy an open source LLM:

# select model from HuggingFace

model=HuggingFaceH4/zephyr-7b-beta


# deploy selected model
docker run ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model

# invoke the deployed model
curl IP_address:port/generate_stream \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":50}}' \
    -H 'Content-Type: application/json'

Build with Oracle Code Assist

Watch the sample solution video (3:40)

Boost developer productivity and enhance code consistency

Oracle Code Assist is an AI code companion designed to help boost developer velocity and enhance code consistency. Powered by large language models (LLMs) on Oracle Cloud Infrastructure (OCI) and fine-tuned and optimized for Java, SQL, and application development on OCI, Oracle Code Assist provides developers with context-specific suggestions. You can tailor it to your organization’s best practices and codebases.

Currently available in beta for JetBrains IntelliJ IDEA and Microsoft Visual Studio Code, the plugin can assist with documentation, legacy code comprehension, and code completion.

To learn how to join the beta program and get started, visit our GitHub repository.

OCI AI Blueprints

Deploy, scale, and monitor GenAI workloads in minutes with Oracle Cloud Infrastructure (OCI) AI Blueprints, complete with hardware recommendations, software components, and out-of-the-box monitoring.

    • Effectively deploy and scale LLMs with vLLM—lightning-fast inference, seamless integration, and zero hassle.

    • Choose from custom models or a variety of open source models on Hugging Face.

    • Automatically provision GPU nodes and store models in OCI Object Storage.

    • Get a ready-to-use API endpoint for instant model inference.

    • Enable autoscaling based on inference latency for mission-critical applications.

    • Easily integrate and scale inference workloads without deep technical expertise.

    • Monitor performance with built-in observability tools, such as Prometheus and Grafana.

    • Fine-tune smarter, not harder—benchmark performance and optimize AI training with data-driven insights.

    • Benchmark fine-tuning performance using the MLCommons methodology.

    • Fine-tune a quantized Llama 2 70B model with a standardized data set.

    • Track training time, resource utilization, and performance metrics.

    • Automatically log results in MLflow and visualize insights in Grafana.

    • Make data-driven infrastructure decisions for optimized fine-tuning jobs.

    • Supercharge LLM fine-tuning with low-rank adaptation (LoRA)—faster, more efficient, and ready for deployment.

    • Utilize LoRA for efficient fine-tuning of LLMs with minimal computational overhead.

    • Leverage your custom data sets or publicly available data sets from Hugging Face for training.

    • Track and analyze detailed training metrics logged in MLflow throughout the fine-tuning process.

    • Store the fine-tuned model and training results in an object storage bucket for seamless deployment.

    • Optimize performance with a design that helps ensure quick, effective model adaptation without heavy resource usage.

    • Scale the solution as needed, from small data sets to large-scale model fine-tuning.