OCI Data Lake is a fully managed data lake service for better governance of data. It provides users with centralized management for storage and security of their data in the data lake. The service enables users to easily ingest/analyze the data. Users and applications can seamlessly share data within the organization and apply fine-grained access control on objects in the data lake. The integrated engines consuming data in the data lake honor these predefined access control rules.
OCI Data Lake enables customers to store and govern structured, semi-structured, and unstructured data. It is a single pane of glass for all data management needs. With it, users can build a data lake with fine-grained security in just a few minutes. OCI Data Lake is well integrated with other OCI services, facilitating easy ingestion, processing, and analysis of data in the data lake.
OCI Data Lake is integrated with OCI Data Integration for easy, no-code ingestion of data into the lake. When an OCI Data Lake is created, the entities in the data lake are autoharvested in OCI Data Catalog for data stewards to discover data. OCI Data Lake works seamlessly with OCI Data Flow, Oracle Big Data, and OCI Data Science notebooks for data processing and running analytics workloads. Users can query data in the lake using Autonomous Data Warehouse.
You have two options: store the data in a file model, by creating external or managed mounts, or store data in a relational model, by creating tables in the data lake.
An external mount is a reference to an Oracle Cloud Infrastructure (OCI) Object Storage location. The OCI Object Storage location for external mounts is not managed by the data lake. External mounts are used to provide fine-grain access control to data already existing in an OCI Object Storage location.
A managed mount is a reference to an OCI Object Storage location that is managed by the data lake service. Managed mounts provide enhanced security for the data files so that only permitted data lake users can access the data stored in the managed mount. The data in the managed mount is stored in the data lake.
An external table defines a structure for data that is stored in an OCI Object Storage location managed by you or in a mount within the data lake. The mount can be an external mount or a managed mount. When you delete an external table, only the table definition is deleted. The data referenced by the external table is not deleted.
A managed table defines a structure for data that is stored within the data lake and can only be accessed by OCI Data Lake users. When you delete a managed table, the table definition and the table data is deleted.
You have two options: store the data in a file model, by creating external or managed mounts, or store data in a relational model, by creating tables in the data lake.
YAn external mount is a reference to an Oracle Cloud Infrastructure (OCI) Object Storage location. The OCI Object Storage location for external mounts is not managed by the data lake. External mounts are used to provide fine-grain access control to data already existing in an OCI Object Storage location.
A managed mount is a reference to an OCI Object Storage location that is managed by the data lake service. Managed mounts provide enhanced security for the data files so that only permitted data lake users can access the data stored in the managed mount. The data in the managed mount is stored in the data lake.
An external table defines a structure for data that is stored in an OCI Object Storage location managed by you or in a mount within the data lake. The mount can be an external mount or a managed mount. When you delete an external table, only the table definition is deleted. The data referenced by the external table is not deleted.
A managed table defines a structure for data that is stored within the data lake and can only be accessed by OCI Data Lake users. When you delete a managed table, the table definition and the table data is deleted.
Data engineers can write ETL processes using OCI Data Integration service in a no-code fashion. Data engineers can also use SDKs and APIs to ingest data into the lake or create a spark application in OCI Data Flow for data ingestion.
Yes, OCI Data Lake supports Terraform for creating OCI Data Lake resources.
OCI Data Flow streaming jobs can write data into the data lake.
Data stewards can discover data in the lake using OCI Data Catalog, which is attached/provisioned during the data lake creation process. The catalog is refreshed at regular intervals, giving data stewards the most updated view of their data lake.
No, when a data lake is provisioned, a catalog gets created and is managed by the service.
OCI Data Lake provides unified access control, which allows administrators to define access control policies for all data lake objects. From the console, administrators have a consolidated view to see who has access to data lake objects.
OCI Data Lake has two-layered security. The lake itself can only be accessed if the user has been given access through Oracle IAM policy. All objects in the data lake are governed by policies defined in the lake.
Yes, data lake administrators can create roles and grant permissions to roles, users, resource principals, groups, and dynamic groups.
Yes, users can assign read/write/administrator permissions to roles/users/resource principals/groups/dynamic groups.
No, OCI Data Lake does not support access control on files.
Yes, OCI Data Lake enables administrators to create column-level access control policies.
Yes, OCI Data Lake enables administrators to create row-level access control policies based on columns values.
Data engineers can process data in Spark application using OCI Data Flow or in Big Data Service. Data scientists and data analysts can do exploratory analysis or create ML models on data in the data lake using OCI Data Science notebook.
No, OCI Data Lake supports Spark APIs for easy reading/writing of data in various file formats.
Data analysts can leverage Spark SQL for DDLs, DMLs, or querying data.
Yes, OCI Data Lake is integrated with OCI Data Flow SQL endpoint, which exposes a JDBC/ODBC driver that allows data in the data lake to be visualized using business intelligence tools that support JDBC/ODBC drivers. Users can also leverage the driver to connect to data lake using a SQL tool which supports JDBC/ODBC driver.