HeatWave Lakehouse FAQ

General

Why do I need a lakehouse?

We’ve seen exponential data growth in recent years. The vast majority of data is generated outside of traditional OLTP applications—via sources such as Internet of Things sensors, connected devices and vehicles, web applications, and telemetry endpoints—and stored in file systems. For organizations seeking to analyze this external data alongside their internal transactional data, the process to extract, transform, and load (ETL) this data to a database for analysis is often too expensive or too complex. HeatWave Lakehouse makes it easy for organizations to get valuable real-time insights by combining object storage and database data.

Is HeatWave Lakehouse a feature of HeatWave?

Yes. With HeatWave Lakehouse, HeatWave provides automated and integrated generative AI and machine learning in one cloud service for transactions and lakehouse-scale analytics, all without the complexity, latency, risks, and cost of ETL duplication.

Do I need to run MySQL workloads to use HeatWave Lakehouse?

No. You can use HeatWave Lakehouse to query data in object storage with record price-performance without running any MySQL workload.

Will applications supporting MySQL need to change to benefit from HeatWave Lakehouse?

No. HeatWave Lakehouse is 100% compliant with MySQL syntax. Applications that work with MySQL can use HeatWave Lakehouse to query data in object storage without any changes.

Are there additional costs to use the HeatWave Lakehouse capability?

There are no changes to HeatWave pricing. You pay only an additional US$20 per TB per month for the data loaded in the HeatWave storage layer on the object store.

Performance and scalability

Are there performance benchmarks for HeatWave Lakehouse?

Yes. As demonstrated by a 500 TB TPC-H benchmark, the query performance of HeatWave Lakehouse is as follows:

  • 15X faster than Amazon Redshift, delivering 11X better price-performance
  • 18X faster than Databricks, delivering 15X better price-performance
  • 18X faster than Snowflake, delivering 19X better price-performance
  • 35X faster than Google BigQuery, delivering 22X better price-performance

The data load performance of HeatWave Lakehouse is as follows:

  • 2X faster than Snowflake, delivering 3X better price-performance
  • 6X faster than Databricks, delivering 6X better price-performance
  • 8X faster than Google BigQuery, delivering 7X better price-performance
  • 9X faster than Amazon Redshift, delivering 8X better price-performance

As demonstrated by a 100 TB TPC-DS benchmark, HeatWave Lakehouse scales concurrent query execution better than other cloud services. With eight concurrent client connections, the price-performance of HeatWave Lakehouse is

  • 2X better than Amazon Redshift
  • 3.4X better than Google BigQuery
  • 5X better than Snowflake
  • 5.4X better than Databricks

Why does HeatWave Lakehouse scale out so well?

HeatWave’s exceptional performance is a result of its scale-out architecture, which enables massive parallelism to provision the cluster, load data, and process queries with up to 512 cluster nodes. Data from files in the object store is transformed into the HeatWave in-memory optimized hybrid columnar format, and the query performance is identical for all supported file formats. Additionally, HeatWave Autopilot intelligently samples files to derive the information needed for automation and learns from previously executed queries to improve the execution of subsequent queries.

How is it possible to query data in object storage as quickly as in the database?

When loaded into the HeatWave cluster, data from any source is automatically transformed into a single optimized internal format. As a result, querying the data in object storage is as fast as querying the database.

Multicloud

Is HeatWave Lakehouse available on AWS?

Yes, HeatWave runs natively on AWS. With the addition of the HeatWave Lakehouse capability in HeatWave, AWS customers can replace up to six AWS services with one, reducing complexity and obtaining the best price-performance in the industry for analytics.

With HeatWave Lakehouse, AWS customers can query hundreds of terabytes of data in Amazon S3 object storage in various file formats, including CSV, Parquet, Avro, JSON, and exports from other databases, without copying the S3 data to the database. They can continue to run applications on AWS with no changes and without incurring unreasonably high AWS data egress fees. AWS customers can also run HeatWave AutoML on HeatWave Lakehouse, which enables them to automatically train machine learning models, run inference and obtain explanations on files stored in S3, and run various kinds of machine learning analysis from the interactive HeatWave console.

Is HeatWave Lakehouse available on Microsoft Azure?

Yes, it’s available to Azure customers via Oracle Interconnect for Microsoft Azure.

Best practices

Is there a way to get help evaluating or getting started with HeatWave Lakehouse?

Absolutely, you can request a free expert-led workshop.

Are there blogs outlining best practices for HeatWave Lakehouse?

Definitely, and here are a few to get you started.