Oracle Cloud Infrastructure (OCI) Stack Monitoring provides essential monitoring and alarm management for applications and infrastructure, such as hosts, GPUs, databases, and app servers.
Eliminate siloes and get full-stack visibility into the health of your Oracle E-Business Suite, PeopleSoft, and GoldenGate systems as well as infrastructure, such as hosts, GPUs, databases, and app servers running on-premises or in the cloud. Extend monitoring to any infrastructure with Prometheus, collectd, or Telegraf integration. Monitor any unique condition with custom metrics.
Use Monitoring Templates to manage all alarm conditions for your E-Business Suite or PeopleSoft applications, database systems, app servers, and fleet of hosts, including GPU infrastructure. Alarm settings are automatically applied as your environment grows. During patching windows, you can suppress alarms easily using topology-aware Maintenance Windows.
With turnkey monitoring of GPU infrastructure, use the Enterprise Health and Alarms view to interactively monitor across the GPU fleet. Triage open alarms, assess activity across all GPUS, monitor GPU utilization, track GPU temperatures, and identify underutilized GPUs and errors. Monitor workload processes and correlate with underlying infrastructure.
OCI Stack Monitoring provides full-stack monitoring and alarm management of applications and infrastructure resources running on-premises or in the cloud. Stack Monitoring’s app-specific logic is bundled as a plugin to the agent that, in turn, is used to discover resources and collect metrics which are then sent to OCI. Stack Monitoring creates resources and application topologies from discovery results. Status and key performance data are shown in the Enterprise Health and Alarms user interface (UI), resource homepages, and fleet and application dashboards. Alarm creation is streamlined using Monitoring Templates that create alarms in OCI Monitoring and are automatically shown and summarized in the Stack Monitoring UI. Using machine learning, baselines for key performance metrics are automatically calculated and anomalies are highlighted in performance charts.
Discover and monitor all components of your E-Business Suite application deployment including Concurrent Manager, Workflow Background Engine, Notification Mailer, and the dependent WebLogic Servers, Oracle Databases, and hosts. Use the E-Business Suite homepage to check the status of all components and open alarms. With Stack View, you can quickly check vital signs across the stack, including E-Business Suite long-running programs, Concurrent Manager requests status, WebLogic memory utilization and thread pool status, Oracle Database wait times, host CPU, and memory.
Using similar workflows, you can discover and monitor PeopleSoft and its components, including Application Server Domain, PeopleSoft Internet Architecture (PIA), Process Scheduler, and the dependent WebLogic Server, Oracle Database, and hosts. Use the PeopleSoft homepage to check the status of all components and open alarms. Use the PeopleSoft Stack View to assess the status and load of application server and process scheduler domains and PIA, WebLogic resource usage and stuck threads, Oracle Database wait times, host CPU, and memory.
Start monitoring cloud or on-premises hosts as soon as an agent is deployed on it or as soon as an OCI compute instance is provisioned. Monitor status, alarms, resource usage (CPU, memory, swap, and filesystem usage), and load (disk activity and paging activity) across all cloud and on-premises hosts in a single view. Investigate hosts with high CPU to determine the apps consuming the highest CPU. Use anomalies shown in performance charts to understand if high resource usages are within expected baselines. If needed, monitor conditions specific to your environment using Metric Extensions.
Get more details about monitoring cloud and on-premises hosts
Discover and monitor databases and middleware in one place. Monitor complete Oracle Database systems (including PDBs, Listener, Automatic Storage Management, and Cluster), GoldenGate, and SQL Server databases. Monitor middleware such as WebLogic Servers, Managed File Transfer, SOA, and Oracle HTTP Server, as well as Tomcat, Apache HTTP Server, JBoss, JVM servers, Oracle Identity Manager, and Oracle Unified Directory.
Use Enterprise Health and Alarms to triage open alarms and understand slow performance, high resource usage, errors across the database, and middleware tiers. Use its interactive charts to dynamically correlate any 2 response and load metrics. Drill down on any performance metric to view historical trends and identify anomalies.
Monitor conditions that are unique to your environment using Metric Extensions. Follow the Metric Extensions guided workflow to define the metric name and type, custom scripts, or SQL queries. Test the metric in an iterative manner; try out the metric on test resources, review the data, and edit the metric as needed. Once tested, publish and enable the Metric Extension on your resources. Monitor the data from Metric Extensions from any Stack Monitoring UI—homepages, Enterprise Health and Alarms, or dashboards. Enable anomaly detection to automatically learn baselines and identify anomalies in performance charts. Set up alarm rules on Metric Extensions to generate alarms when values cross performance thresholds.
Monitor the overall health of your GPU infrastructure fleet from a single Enterprise Health and Alarms view. Interact with this view to triage open alarms across hosts and GPUs, track CPU and memory utilization across all hosts, assess GPU activity, memory utilization, power, temperature, and latency across all GPUs. Identify host availability issues or hotspots such as GPUs nearing maximum temperatures. Track errors and underutilized GPUs. Drill down from the enterprise view to a specific cluster network view. Continue troubleshooting using the built-in topology views to drill down from a cluster network to hosts and GPUs within network blocks or local blocks in the cluster network.
Simplify alarm management for applications, systems, and infrastructure fleet using Monitoring Templates. Monitoring Templates provide a resource-centric way to define and manage all alarm conditions for an E-Business Suite or PeopleSoft application, database system, or a fleet of application servers and hosts. During patching periods, use Maintenance Windows to provide a resource-centric way to mute alarms for a fleet of hosts or app servers or for applications such as E-Business Suite and PeopleSoft.
Discovery of GPU infrastructure topology.
Top-down enterprise health and alarms monitoring.
One-click discovery for applications such as Oracle E-Business Suite and PeopleSoft as well as application stack technologies.
Automatic creation of application topology that associates applications with app servers and databases to enable troubleshooting of issues across the stack.
Each resource type is automatically monitored for key vital signs relating to its availability, response, load, error, and utilization, reducing DevOps’ burden of requiring domain expertise to determine what’s important to monitor.
Anomaly detection enables rapid problem identification and resolution.
Monitoring Templates provide a resource-oriented way to set alarm rule conditions for an application, system, or fleet of resources.
Maintenance windows provide a resource-oriented way to suppress alarms for resources undergoing maintenance operations.
Use Enterprise Health and Alarms to get visibility across your enterprise and quickly identify outages, open alarms, and performance hot spots.
Get access to the resource’s status, key performance metrics, alarms, and associated resources.
Stack Views provide rapid insight into the critical KPIs for the application and its underlying infrastructure stack.
Unify metrics, traces, and logs across Observability and Management services using dashboards.
Add custom metrics to monitor conditions unique to your environment.
Extend your application topology by associating Stack Monitoring resources with other OCI service instances.
Monitor any type of infrastructure with integration with Prometheus, Telegraf, collectd, and process-based custom resources.
Interested in learning more about Stack Monitoring? Let one of our experts help.