Hadoop Datasource

Cloud Account Sign in to Cloud Sign Up for Free Cloud Tier

Oracle Account

Big Data Connectors

Overview

Oracle Datasource for Apache Hadoop (formerly Oracle Table Access for Apache Hadoop) turns Oracle Database tables into a Hadoop data source (i.e., external table) enabling direct, and consistent Hive QL/Spark SQL queries, as well as direct Hadoop API access. The typical use case consists in joining Master Data or dimension data in Oracle database with facts or BigData in Hadoop storages.

Oracle Datasource for Apache Hadoop optimizes a query’s execution plans using predicate and projection pushdown, and partition pruning. Database table access is performed in parallel based on the selected split patterns, using smart and secure connections (Kerberos, SSL, Oracle Wallet), regulated by both Hadoop (i.e., maximum concurrent tasks) and Oracle DBAs (i.e., max pool size).

The support for OutputFormat allows writing back to Oracle database, the result of Hadoop processing for further data mining using your usual BI tools.

Oracle Datsource for Hadoop is documented in the Big data Connector doc.

Oracle Datasource for Apache Hadoop

Data Sheet: Oracle Big Data Connectors

White Paper: Oracle Datasource for Apache Hadoop (OD4H)