The human eye can usually decipher the contents and meaning of a straightforward photograph. Now, with generative AI’s capability to identify and interpret vast amounts of data, computer systems can emulate this task.
This solution will show you how to create a basic image-to-text app that lets users upload an image, enter a natural language prompt describing a query about the image, and receive a text-based response generated by the AI model. Using a simple interface built with Streamlit, a base64 image decoder, and leveraging the Oracle Cloud Infrastructure (OCI) Generative AI inference API for processing multimodal data (text and images), it’s easy to put together and serves as an ideal entry point to try out AI services on OCI.