Oracle Big Data: Interactive Quick Reference

This interactive diagram shows the Oracle Big Data Conceptual Architecture. SHOW INSTRUCTIONS

To step through the slide show, click the Previous and Next buttons. To return to the first slide, click the First button.
The breadcrumb below the buttons tracks your navigation. Click links in the breadcrumb to jump to different slides that you have visited.
To view more detail about an object in a slide, left-click the object. If more detail is available, a different slide will be displayed; otherwise, you will remain on the same slide.
To view documentation for an object in a slide, right-click the object or tab to the object and press the Menu key. If a documentation link is configured for the object, the documentation will be displayed in a new browser tab.
Use your quick navigation keys to move between buttons, frames, checkboxes, headings, list items, and so on.
A dotted line around an object in a slide indicates that the object is optional. A line with alternating dots and dashes around an object indicates that the object can run as a thread or an operating system process. These object types are also described in the notes. HIDE INSTRUCTIONS

Slide 1

Oracle’s Information Management Conceptual Architecture shows key components and flows in a compact form. Oracle Big Data Cloud services delivers a broad and integrated portfolio of products and engineered systems. It helps you acquire and organize the diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships.

Input Events:Input events are those data from different sources that are given as input to the Streaming Engine. Data sources are potential sources of raw data which are required by the business to address its information requirements. Sources include both internal and external system – including IoT, websites, and more. Data from these systems will vary in structure and presentation method.
Streaming Engine: The streaming engine takes the data published by the producers, persists it, and reliably delivers it to consumers.
Actionable Events: Actionable Events are events which lets you take the next-best action for the plan to succeed.
Data Lake: A data lake is a storage repository that holds a vast amount of raw data in its original format, including structured, semi-structured, and unstructured data. With a data lake, you just load in the raw data, as-is, and then when you’re ready to use the data, that’s when you give it shape and structure. That’s called schema-on-read. A data lake allows for ad hoc discovery, organization, and enrichment of unmodelled data before it moves to more refined sets of analytics tools. It typically captures its data in a Hadoop cluster or Object Store.
Actionable Data Sets: Actionable data set is a piece of information that enables you to make an informed decision. They are usually derived by synthesizing vast amounts of data into crisp and concise statements..
Enterprise Data & Reporting: Enterprise Data is a large scale formalized and modeled business critical data store. It is typically represented by an Enterprise Data Warehouse. A warehouse stores products ready for consumers. This data, when gathered, cleansed, and formatted for reporting and analysis purposes, constitutes the bulk of traditional structured data warehouses, data marts, and OLAP. Includes BI tools and infrastructure components for timely and accurate reporting. In this phase, users may be engineers using the data for their systems, analysts, or decision makers.
Actionable Metrics: Actionable metrics translate data into something useful that helps you to make a decision about your future plans going forward. A metric generally adds context to data; how it compares to history, a benchmark, etc.
Structured Enterprise Data: Structured Enterprise Data are those data that originate from the internal and external enterprise systems (e.g. ERP, HR, etc.). They are usually processed and have a defined structure to it. This includes data contained in relational databases and spreadsheets.
Execution: The interplay of the components and their assembly into solutions are divided into execution and innovation divisions. The execution division contains those tasks which support and inform daily operations. This arrangement of solutions on either side helps inform system requirements for security, governance, and timeliness.
Innovation: The interplay of the components and their assembly into solutions are divided into execution and innovation. The innovation division contains those tasks which drive new insights back to the business. This arrangement of solutions on either side helps inform system requirements for security, governance, and timeliness.
Discovery Lab: The Discovery Lab is a distinct design pattern within the conceptual architecture. It has a set of data stores, processing engines, and analysis tools that are separate from the everyday processing of data.
Data: Data is given as input to the discovery lab for analysis.
Discovery Output: The discovery labs provides deployable code (Actionable Events), scores or some interesting phenomena in the data as the discovery output. This may be a fraud prediction, next best offer, etc.

This is a test slide

The key technologies of the Big Data Architecture are:

Apache Kafka: Apache Kafka is an publish-subscribe messaging system that is exchanging data between processes, applications, and servers. In Kafka, the messages are immediately written to file system and replicated within the cluster to prevent data loss. Kafka is used for real-time streams of data, to collect big data, or to do real time analysis (or both).
NoSQL:NoSQL databases are highly scalable and flexible database management systems which allows you to store and process unstructured as well as semi-structured data. It uses a Key-Value data model. There are a variety of NoSQL databases in the market. NoSQL databases are frequently used to acquire and store big data. For example, NoSQL databases are often used to collect and store social media.
Spark Streaming: Spark Streaming is an extension of Spark. It extends Spark for doing large scale stream processing. Spark Streaming supports both Java and Scala, which makes it easy for users to map, filter, join, and reduce streams (among other operations) using functions in the Scala/Java programming language. Spark Streaming reads data from a Kafka topic, processes it and writes processed data to a new topic where it becomes available for users and applications. For more information, see Spark Streaming.
Object Store: The Object Store can store an unlimited amount of unstructured data of any content type, including analytic data and rich content, like images and videos.
Hadoop/Hdfs: Hadoop is a distributed framework for enormous amounts of data. It is an open-source framework that allows you to store and process big data in a distributed environment across clusters of computers using simple programming models.
DataWarehouse: A data warehouse is a strategic collection of all types of data in support of the decision-making process at all levels of an enterprise. This includes all types of data stores that maintain information for historical and analytical purposes. The historical data stores consolidate large quantities of information in a manner that best maintains historical integrity. Analytical data stores are designed to support analysis by maximizing ease of access and query performance. Technologies and schema designs consist of dimensional data models, OLAP cubes, etc.
Data Visualization: Data visualization describes the presentation of abstract information in graphical form. Data visualization allows us to identify patterns, trends, and correlations that otherwise might go unnoticed in traditional reports, tables, or spreadsheets. You can discover the insights hidden in your data, with rich, interactive visuals using data visualization.
Data Virtualization: Data virtualization technology provides a single point of access to the data by aggregating it from a wide range of data sources. The process of data virtualization involves abstracting, transforming, federating and delivering data from disparate sources.
Notebooks/Analytics Services: Notebooks are used by data scientists for quick exploration tasks. Analytics Notebooks enables easy access to both data and computing power. For example: Apache Zeppelin is a web-based notebook that enables data-driven,interactive data analytics and collaborative documents with SQL, Scala and more.

Oracle products for Big Data Architecture are:

Streaming Engine: Oracle Event Hub, Oracle Data Hub, Oracle NoSQL Database, Oracle Stream Analytics
Data Lake Oracle Object Storage, Oracle Big Data Cloud, Oracle Big Data
Enterprise Data and Reporting: Oracle Autonomous DataWarehouse, Oracle Exadata Cloud Service, Oracle Database Cloud Service
Data Virtualization:Big Data SQL
Data Visualization: Oracle Analytics Cloud
Discovery Lab: Oracle Advanced Analytics, Oracle R Advanced Analytics for Hadoop, Oracle Big Data Spatial and Graph, and Oracle Analytics Cloud.
Oracle Data Integration Platform Cloud: Oracle Data Integration Platform Cloud with autonomous capabilities helps migrate and extract value from data by bringing together capabilities of a complete Data Integration, Data Quality, and Data Governance solution into a single unified autonomous cloud based platform. Oracle data integration products are: Oracle GoldenGate, Oracle Data Integrator and Oracle Enterprise Data Quality. In our Big Data Architecture, Golden Gate replicates data from Enterprise Data & Reporting to the Data Lake. And Data Integrator moves data from the Data Lake to the Enterprise Data & Reporting. For more information, see Oracle Data Integration Platform Cloud

Oracle Event Hub Cloud Service: Oracle Event Hub Cloud Service leverages Oracle Cloud and Apache Kafka to enable you to work with streaming data. You can quickly create, configure and manage your Topics in the cloud while Oracle manages the underlying platform. An instance of Oracle Event Hub Cloud Service is called a Topic. All messages are organized into Topics. Oracle Event Hub Cloud Service enables you to unify and organize these data and make it easily accessible and available for consumption anytime by anyone ranging from an engineer to an advanced analytic machine. For more information, see Oracle Event Hub Cloud Service
Oracle Data Hub Cloud Service: Oracle Data Hub Cloud Service enables you to consistently provision and manage NoSQL database clusters such as Apache Cassandra on Oracle Cloud. Currently, you can use the DHCS Console/API/CLI to easily provision the Apache Cassandra database clusters within the Oracle Cloud Infrastructure Classic (OCI-Classic) platform. You can use this cluster as a data store for your big-data, cloud-native applications or to persistently store the messages from the Oracle Event Hub Cloud Service. You should use the DHCS when you want a consistent interface to provision, administer and monitor popular open source database clusters within the Oracle Cloud platform. For more information, see Oracle Data Hub Cloud Service
Oracle NoSQL Database: Oracle has its own NoSQL solution, which enables fast performance and flexibility by supporting a wide variety of data types and multiple data access. It offers Key-Value and Table Data Models. Oracle NoSQL stores unstructured, semi-structured, or structured data and is accessible by using Java APIs, C/C++, JavaScript, Python, Node.js, REST APIs. Provides the following integration benefits:
- Query NoSQL data from Oracle Database
- Access to NoSQL data from Hadoop
- Support for Hive, DMS, Apache Spark, Kerberos
For more information, see Oracle NoSQL Database
Oracle Stream Analytics: Oracle Stream Analytics allows users to process and analyze large scale real-time information by using sophisticated correlation patterns, enrichment, and machine learning. It offers real-time actionable business insight on streaming data and automates action to drive today’s agile businesses. From its interactive designer, users can explore real-time data through live charts, maps, visualizations, and graphically build streaming pipelines without any hand coding. These pipelines execute in a scalable and highly available clustered Big Data environment utilizing Spark integrated with Oracle’s Continuous Query Engine to address critical real-time use cases of modern enterprises. For more information, see Oracle Stream Analytics

Activity data is the lifeblood of Internet-based applications. On social networking sites such as Facebook or LinkedIn, you can see who has viewed your profile. Even the ads show up according to your browsing history and recent activities. Therefore, log data processing is critical for Internet companies. Apache Kafka is a technology that is designed for real-time collecting and delivering of log data. It collects and delivers high volumes of activity log data with low latency by using a messaging system. Unlike tradition offline processing of log data, whereby data is processed and stored in a data warehouse or a Hadoop environment for batch processing jobs, reporting, and ad hoc analysis, Kafka is designed for real-time processing of log data. In addition, Kafka enables the transfer of log data between primary big data processing engines, including RDBMS, Hadoop, and NoSQL.

Apache Kafka runs as a cluster on one or more servers. With Kafka, a stream of messages of a particular type is defined as a topic. A producer can publish messages to a topic. The published messages are then stored in a set of servers called brokers in a Kafka cluster. Producers can be web servers or mobile apps, and the types of messages they send to Apache Kafka is logging information. These logs include events that indicate actions, for example: A certain event might record the link that a user clicks, and when the link was clicked.

Consumers are various processes that want to find out about the events that are occurring in real time. They may want to generate analytics, monitor for unusual activity, generate personalized recommendations for users, and so on. Consumers may subscribe to one or more topics from the brokers, and consume the subscribed messages by pulling data from the brokers. After pulling a message, consumers perform message aggregation or other processing of these streams. In addition to real-time processors, consumers may also be Hadoop and data warehousing stores that load virtually all feeds for batch-oriented processing. For more information, see Apache Kafka

In this use case, we are demonstrating Oracle MoviePlex application. This use case is based on the Big Data Conceptual Architecture. Oracle MoviePlex is an on-line movie streaming company like many other on-line stores. With this web-based application, you can browse a catalog of movies, watch movie trailers, rent movies, review and rank movies, get personalized experience and recommendations. Like many other on-line stores, they needed a cost effective approach to tackle their “big data” challenges. They recently implemented Oracle’s Big Data Management System to better manage their business, identify key opportunities and enhance customer satisfaction. Users accessing MoviePlex application, consume massive amount of bandwidth which is a potential big data challenge. By combining Movie Site activity data with Network data, you will answer questions like:

How do you monitor Network Traffic and find out who is consuming more bandwidth?
How the current revenue stream compares to the benchmark?
Which country or city is experiencing significant network issues?

Suppose you are watching a movie and suddenly you face frequent buffering, loading problem, or network error. What would you do? Probably log off the site. And this can be a performance challenge for the MoviePlex application. In our use case, current Network activity and Movie Site Activity is streamed using Kafka which gives stream consumers the latest view of MoviePlex application usage and allocation. Network and Movie Site Activity( clicking movie list, watching, logging, etc.) events are published to the Streaming engine i.e Kafka. Specifically, events are published to a specific topic in Kafka – e.g. web site activity to a topic called Movie Site. We have one topic named Movie Site and one named Network. Now, Spark Streaming reads data from the Kafka topics, processes it and writes processed data to a new topic where it becomes available for users and applications for further analysis. Example: A targeted, special offer is being made during the movie playback.

A data lake stores structured and unstructured data, as well as a method for organizing large volumes of highly diverse data from varied sources. A data lake tends to ingest data very quickly and prepare it later on the fly as you access it. The data lake can have many platforms under it, for example:

Oracle Object Storage: Oracle Cloud Infrastructure Object Storage is an Infrastructure as a Service (IaaS) product, which provides an object storage solution for files and unstructured data. You can use Oracle Cloud Infrastructure Object Storage to back up content to an off-site location, programmatically store and retrieve content, and share content with peers. Oracle Object storage stores data as objects within a flat hierarchy of containers. With Object Storage, you can safely and securely store or retrieve data directly from the internet or from within the cloud platform. These objects could be an image file, logs, HTML files, or any self-contained blob of bytes. Object storage allows data to be stored across multiple regions and scales infinitely to petabytes and beyond. For more information, see Oracle Object Storage
Oracle Big Data Cloud Service: Oracle Big Data Cloud Service gives you access to the resources of a preconfigured Oracle Big Data environment, including a complete installation of the Cloudera Distribution Including Apache Hadoop (CDH) and Apache Spark. It is an efficient long term store for Hadoop/HDFS. Use Oracle Big Data Cloud Service to capture and analyze the massive volumes of data generated by social media feeds, e-mail, web logs, photographs, smart meters, sensors, and similar devices. For more information, see Oracle Big Data Cloud Service
Oracle Big Data Cloud: Oracle Big Data Cloud combines open source technologies such as Apache Spark and Apache Hadoop with unique innovations from Oracle to deliver a complete Big Data platform for running and managing Big Data Analytics applications. For more information, see Oracle Big Data Cloud

Apache Hadoop is a fundamental building block in capturing and processing big data. At a high level, Apache Hadoop is designed to make parallel the data processing across computing nodes (servers) to speed computations and hide latency and complexity. Apache Hadoop is a batch and interactive data-processing system for enormous amounts of data. It is designed to process huge amounts of structured and unstructured data (terabytes to petabytes) and is implemented on racks of commodity servers as a Hadoop cluster. Servers can be added or removed from the cluster dynamically because Apache Hadoop is designed to be “self-healing.” In other words, Apache Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption. Apache Hadoop contains the following main core components:

HDFS is a distributed file system for storing information and it sits on top of the operating system that you are using.
Yet Another Resource Negotiator (YARN), an extensible framework job scheduling and cluster resource management, or
MapReduce is a parallel processing framework that operates on local data, whenever possible. It abstracts the complexity of parallel processing. This enables developers to focus more on the business logic rather than on the processing framework.
Spark is another processing framework which is more popular than MR today. Spark is optimized to work with data in memory rather than disk.

Object storage is the persistent storage repository for the data in your data lake. Combining object storage in the cloud with Spark is more flexible than typical Hadoop/MapReduce configuration. If you need more compute, you can spin up a new Spark cluster and leave your storage alone. If you’ve just acquired many terabytes of new data, then just expand your object storage. In the cloud, compute and storage aren’t just elastic. They’re independently elastic. And that’s good, because your needs for compute and storage are also independently elastic.

Every click that takes place on the MoviePlex site is streamed into the persistent stores - HDFS or Object Store and then analyzed. And, it is easy to move data between the Hadoop cluster and Object Store.
Event Hub (Kafka) can save data into long term, persistent storage using OCI sink connector. OCI Sink Connector allows you to export data from a Kafka Topic into an Oracle Cloud Storage instance. Object Store stores all historical movie site activity and network usage data. In our use case, if we want to compare the current network stream to the historical benchmark, then we can make use of data stored in the Object Store. And in order to perform analytics, the historical data is copied into HDFS and analyzed.

Data residing in operational systems such as CRM, ERP, warehouse management systems, etc., is typically very well structured and are designed to consistently store operational data, one transaction at a time. But this data is not always meaningfully presented to the end-user query tool. Alternatively, analytical reporting requires database design that even business users find directly usable. To achieve this, different database design techniques are required (for example, the use of dimensional and star schemas with highly denormalized dimension tables). In our Big Data Conceptual Architecture, Enterprise Data an Reporting is considered as a Data Warehousing or Data Mart solution to all the issues identified with data extraction strategy.
Dimensional data models are generally used for structured data analysis. They support most of the operational and performance reporting requirements of the business. A data warehouse is a strategic collection of all types of data in support of the decision-making process at all levels of an enterprise. This includes all types of data stores that maintain information for historical and analytical purposes. The historical data stores consolidate large quantities of information in a manner that best maintains historical integrity.
Analytical data stores are designed to support analysis by maximizing ease of access and query performance. Technologies and schema designs consist of dimensional data models, OLAP cubes, etc. Dimensional data models are generally used for structured data analysis. They support most of the operational and performance reporting requirements of the business.
DataWarehouse is populated with many different types of data from a variety of sources which includes structured data sources such as operational data, as well as system-generated data and some forms of content. Data can be ingested into the DataWarehouse using a combination of batch or real-time methods. Traditional extract, transform, and load (ETL) processes, or the extract, load, transform (ELT) variant, are frequently used for batch data transfer. Typically the DataWarehouse will have a Foundation Data layer and an Access & Performance layer. The Foundation Data layer is a canonical business-neutral representation of the data (third normal form 3NF) by its nature. Foundation layer focuses on historic data management at the atomic level. Access and Performance layer also known as analytical layer with business/functional specific models, snapshots, aggregations, and summaries. Oracle Cloud Products:

Oracle Autonomous DataWarehouse: Autonomous Data Warehouse Cloud (ADWC) is a fully managed database tuned and optimized for data warehouse workloads with the market-leading performance of Oracle Database. This is Oracle’s solution to data warehousing and BI in the cloud area. DWCS is compatible with all business analytics tools that support Oracle Database. Oracle Autonomous Data Warehouse Cloud uses applied machine learning to self-tune and automatically optimizes performance while the database is running. It is characterized by many exciting features:
- ADWC is very easy to set up, manage, and use because it is built on a fully automated database. All the major tasks like provisioning, patching, upgrades, taking backups, and performance tuning are completely automated.
- ADWC uses the Exadata machine, which is very fast, scalable, and reliable. It also avails of the benefits of Oracle Database capabilities such as parallelism, columnar processing, and compression.
- ADWC allows you to scale compute and storage without down time, thus offering high elasticity. You can pay for only the resources that you consume.
For more information, see Oracle Autonomous Data Warehouse Cloud Service
Oracle Database Cloud Service: Oracle Database Cloud Service provides you the ability to deploy Oracle databases in the Cloud, with each database deployment containing a single Oracle database. You have full access to the features and operations available with Oracle Database, but with Oracle providing the computing power, physical storage and (optionally) tooling to simplify routine database maintenance and management operations. For more information, see Oracle Database Cloud Service
Oracle Exadata Cloud Service: Exadata Cloud Service is offered on Oracle Cloud, using state-of-the-art Oracle-managed data centers. You can also choose Exadata Cloud at Customer which provides Exadata Cloud Service hosted in your data center. For more information, see Oracle Exadata Cloud Service

Data from the Lake is flowing into the Enterprise Data and Reporting Layer. It is going to be enriched, cleansed, etc. – making it a trusted source. Oracle Database has both dimensional and fact data. This includes revenue information, information about sales, transactions, movies, users, and more. Suppose we want to find out if sales revenue in a particular country say UK or France has been impacted due to significant network issues. Then we can analyze the sales data and the streaming network data simultaneously.

Due to an expanded adoption of big data stores—such as Kafka, Hadoop and NoSQL—Oracle customers are experiencing greater challenges integrating disparate data formats within their information management systems. In addition to various big data sources, programming environments are also expanding, including technologies such as Representational State Transfer (REST), nodeJS, and Python. This evolution raises important questions for Oracle customers who want to leverage valuable big data information. Some include:

How do you integrate big data with your Data Warehouse?
How do you analyze all data together?

Big Data SQL is a data virtualization technology that allows users and applications to use Oracle’s rich SQL language across data stored in Apache Kafka, Oracle Database, Hadoop and NoSQL stores. One query can combine data from all these sources. Oracle Information Management System unifies the data platform by providing a common query language, management platform, and security framework across Kafka, Hadoop, NoSQL, and Oracle Database. In our IM architecture, Kafka contains stream data and it's able to answer the question "what is going on right now", whereas in Database you store operational data, in Hadoop historical and those two sources are able to answer the question "how it use to be". Oracle Big Data SQL is a key component of the platform. Big Data SQL allows you to run the SQL over those tree sources and correlate real-time events with historical. For more information, see Oracle Big Data SQL

One of the key challenges is to query across all these sources ensuring business users have a full view of the data. So, now we have data in three different sources:

Kafka contains the streaming Network data and Movie site activity.
HDFS contains the web logs - or user behavior. It stores the all the Network and Movie site offline data.
Sales Transactions, dimensional data for the MoviePlex app comes from Oracle Database 12c.

In our use case, Big Data SQL easily blends real time streams with history, benchmarks and context. It helps us to answer questions like:

Are we running at peak performance?
What is the opportunity cost of our current network latency?
How do you correlate between these stores?
How did current network stream compares to the benchmark? And, did this lead to sales (database)?

Big Data SQL combine data in flight with data in HDFS and Oracle Database. You can use Big Data SQL to run queries joining data across Oracle Database (sales data) and HDFS (movie site activity).

In our use case, you use Oracle Analytics Cloud to analyze the movie site activity. Here, Oracle Analytics Cloud (OAC) is accessing MoviePlex application data using Big Data SQL. Use OAC to create project and can analyze network errors happening in different countries or visualize graphs which depicts how the current revenue stream compares to the benchmark. Which country is experiencing network issues and the same sales drop off? Is revenue impacted for France or UK? Oracle Products:

Oracle Analytics Cloud: With Oracle Analytics Cloud, you can take data from any source, and explore and collaborate with real-time data. You can interact with your personal data, ingest and harmonize data sources, collate and manage disparate inputs, and handle data with coherence and consistency during organizational sharing. As you visually research and discover, you can review and visualize both personal and corporate data, and gain insights at key stages of the iterative information cycle. For more information, see Oracle Analytics Cloud
Oracle BI Cloud Service: Oracle BI Cloud Service is one of the Platform as a Service (PaaS) services that is provided by Oracle Cloud. The service offers many self-service capabilities such as creating reports for your line of business. You can use Oracle BI Cloud service to easily and efficiently explore data and add your own data from external sources. And create and share analyses and dashboards that enable you to solve business problems. For more information, see Oracle BI Cloud Service
Oracle Data Visualization Cloud Service: Oracle Data Visualization Cloud Service makes easy yet powerful visual analytics accessible to everyone. For more information, see Oracle Data Visualization Cloud Service

The Discovery Lab is itself a distinct design pattern within the conceptual architecture. It includes a set of processing engines, and analysis tools that are separate from the everyday processing of data. This component facilitates the discovery of new knowledge of value to the business. It is characterized by the following:

Specific focus on identifying commercial value for exploitation.
Small group of highly skilled individuals (aka Data Scientists, Data Mining practitioners, and so on).
Iterative development approach—data oriented and not development oriented.
Wide range of tools and techniques applied.
Discovery Lab outputs that may include new knowledge, data mining models or parameters, scored data, and others.

There are many tools for analytics and data discovery, for example: Apache Zeppelin, Jupyter, R, etc. Apache Zeppelin is a web-based notebook used for data analytics. It provides a number of useful data-discovery features such as data ingestion, data discovery, data visualization and collaboration. You can construct striking data-driven, interactive and collaborative documents with SQL, Scala and more.

Oracle Products:

Oracle Advanced Analytics: Oracle Advanced Analytics allows data and business analysts to extract knowledge, discover new insights, and make predictions by working directly with large data volumes in Oracle Database. With Oracle Advanced Analytics, you can discover patterns hidden in massive data volumes, discover new insights, make predictions, and immediately transform raw data into actionable insights. For more information, see Oracle Advanced Analytics
Oracle R Advanced Analytics for Hadoop: Oracle R Advanced Analytics for Hadoop is a collection of R packages that provide Interfaces to work with Apache Hive tables, HDFS, the local R environment, and Oracle Database tables. It provides predictive analytic techniques, written in R or Java as Hadoop MapReduce jobs, that can be applied to data in HDFS files. For more information, see Oracle R Advanced Analytics for Hadoop
Oracle Big Data Spatial and Graph: Oracle Big Data Spatial and Graph delivers advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms. The spatial features include support for data enrichment of location information, spatial filtering and categorization based on distance and location-based analysis, and spatial data processing for vector and raster processing of digital map, sensor, satellite and aerial imagery values, and APIs for map visualization. For more information, see Oracle Big Data Spatial and Graph
Oracle Analytics Cloud: With Oracle Analytics Cloud, you can take data from any source, and explore and collaborate with real-time data. You can interact with your personal data, ingest and harmonize data sources, collate and manage disparate inputs, and handle data with coherence and consistency during organizational sharing. As you visually research and discover, you can review and visualize both personal and corporate data, and gain insights at key stages of the iterative information cycle. For more information, see Oracle Analytics Cloud