HeatWave Lakehouse Features

Query engine for data in object storage and optionally in MySQL databases

Query data in object storage in various file formats, including CSV, Parquet, Avro, and export files from other databases using standard SQL syntax, and optionally combine it with transactional data in MySQL databases. The query processing is done entirely in the HeatWave engine, so you can use HeatWave for non-MySQL workloads and MySQL-compatible workloads alike. When loaded into the HeatWave cluster, data from any source is automatically transformed into a single optimized internal format. As a result, querying the data in object storage is as fast as querying the databases—an industry first.

Query results can be written to object storage, allowing users to easily share them and to store results in object storage inexpensively. This also enables developers to use HeatWave for MapReduce applications.

Support for JSON and JavaScript

You can use HeatWave to query semistructured data in JSON format in object storage, for example, to develop content management apps or real-time dashboards using JSON data in object storage. With native JavaScript support in HeatWave Lakehouse, you can use JavaScript to process and query data in object storage. For example, you can build dynamic content-loading applications using the rich features of JavaScript.

Support for unstructured documents with HeatWave Vector Store

With HeatWave Vector Store, you can upload and query unstructured documents.

Scale-out architecture

HeatWave’s unrivaled performance is a result of its scale-out architecture, which enables massive parallelism to provision the cluster, load data, and process queries with up to 512 nodes. Each HeatWave node within a cluster and each core within a node can process partitioned data in parallel, including parallel scans, joins, group-by, aggregation, and top-k processing. The algorithms are designed to overlap compute time with the communication of data across nodes, which helps achieve high scalability.

Machine learning–powered automation with HeatWave Autopilot

HeatWave Autopilot provides workload-aware automation for HeatWave powered by machine learning (ML). HeatWave Autopilot capabilities, such as auto provisioning, auto query plan improvement (which learns various runtime statistics from past query executions to improve the execution plan for future queries), and auto parallel loading, have been enhanced for HeatWave Lakehouse. Additional capabilities for HeatWave Lakehouse include the following:

  • Auto schema inference automatically infers the mapping of file data to the corresponding schema definition for all supported file types, including CSV. As a result, you don’t need to manually define and update the schema mapping of files, saving time and effort.
  • Adaptive data sampling intelligently samples files in object storage to derive the information that enables HeatWave Autopilot’s predictions for automation. Using adaptive data sampling, HeatWave Autopilot can scan and make predictions, such as schema mapping on a 400 TB file in less than one minute.
  • Adaptive data flow lets HeatWave Lakehouse dynamically adapt to the performance of the underlying object store in any region to improve overall performance and availability.
  • Adaptive query optimization uses various statistics to adjust data structures and system resources after query execution has started, independently optimizing query execution for each node based on actual data distribution at runtime. This helps improve the performance of ad hoc queries by up to 25%.
  • Auto compression helps customers determine the optimal compression algorithm for each column, which improves load and query performance with faster data compression and decompression. By reducing memory usage, customers can cut costs by up to 20%.

Built-in machine learning

With HeatWave AutoML, you can use data in object storage, the database, or both to build, train, deploy, and explain ML models. You don’t need to move the data to a separate ML cloud service or be an ML expert. HeatWave AutoML automates the machine learning pipeline, including algorithm selection, intelligent data sampling for model training, feature selection, and hyperparameter optimization—saving data analysts significant time and effort. HeatWave AutoML supports anomaly detection, forecasting, classification, regression, and recommender system tasks, even on text columns. You can use HeatWave AutoML at no additional cost.

Highly available, fully managed database service

Tasks such as high-availability management, patching, upgrades, and backups are automated with a fully managed service. Data loaded into the HeatWave cluster is automatically recovered in case of an unexpected compute node failure, without retransformation from external data formats.

Secure access control

With access control mechanisms, such as Oracle Cloud Infrastructure (OCI) resource principal authentication or pre-authenticated requests, you can have full control over access to data lake sources.