Data Lake service FAQ

General

What is OCI Data Lake?

OCI Data Lake is a fully managed data lake service for better governance of data. It provides users with centralized management for storage and security of their data in the data lake. The service enables users to easily ingest/analyze the data. Users and applications can seamlessly share data within the organization and apply fine-grained access control on objects in the data lake. The integrated engines consuming data in the data lake honor these predefined access control rules.

Why OCI Data Lake?

OCI Data Lake enables customers to store and govern structured, semi-structured, and unstructured data. It is a single pane of glass for all data management needs. With it, users can build a data lake with fine-grained security in just a few minutes. OCI Data Lake is well integrated with other OCI services, facilitating easy ingestion, processing, and analysis of data in the data lake.

What other services are integrated with OCI Data Lake?

OCI Data Lake is integrated with OCI Data Integration for easy, no-code ingestion of data into the lake. When an OCI Data Lake is created, the entities in the data lake are autoharvested in OCI Data Catalog for data stewards to discover data. OCI Data Lake works seamlessly with OCI Data Flow, Oracle Big Data, and OCI Data Science notebooks for data processing and running analytics workloads. Users can query data in the lake using Autonomous Data Warehouse.

What are the storage options in OCI Data Lake?

You have two options: store the data in a file model, by creating external or managed mounts, or store data in a relational model, by creating tables in the data lake.

What is the difference between external and managed mounts?

An external mount is a reference to an Oracle Cloud Infrastructure (OCI) Object Storage location. The OCI Object Storage location for external mounts is not managed by the data lake. External mounts are used to provide fine-grain access control to data already existing in an OCI Object Storage location.

A managed mount is a reference to an OCI Object Storage location that is managed by the data lake service. Managed mounts provide enhanced security for the data files so that only permitted data lake users can access the data stored in the managed mount. The data in the managed mount is stored in the data lake.

What is the difference between external and managed tables?

An external table defines a structure for data that is stored in an OCI Object Storage location managed by you or in a mount within the data lake. The mount can be an external mount or a managed mount. When you delete an external table, only the table definition is deleted. The data referenced by the external table is not deleted.

A managed table defines a structure for data that is stored within the data lake and can only be accessed by OCI Data Lake users. When you delete a managed table, the table definition and the table data is deleted.

Storage

What are the storage options in OCI Data Lake?

You have two options: store the data in a file model, by creating external or managed mounts, or store data in a relational model, by creating tables in the data lake.

What is the difference between external and managed mounts?

YAn external mount is a reference to an Oracle Cloud Infrastructure (OCI) Object Storage location. The OCI Object Storage location for external mounts is not managed by the data lake. External mounts are used to provide fine-grain access control to data already existing in an OCI Object Storage location.


A managed mount is a reference to an OCI Object Storage location that is managed by the data lake service. Managed mounts provide enhanced security for the data files so that only permitted data lake users can access the data stored in the managed mount. The data in the managed mount is stored in the data lake.

What is the difference between external and managed tables?

An external table defines a structure for data that is stored in an OCI Object Storage location managed by you or in a mount within the data lake. The mount can be an external mount or a managed mount. When you delete an external table, only the table definition is deleted. The data referenced by the external table is not deleted.

A managed table defines a structure for data that is stored within the data lake and can only be accessed by OCI Data Lake users. When you delete a managed table, the table definition and the table data is deleted.


Ingestion

How can I build my data lake using OCI Data Lake service?

Data engineers can write ETL processes using OCI Data Integration service in a no-code fashion. Data engineers can also use SDKs and APIs to ingest data into the lake or create a spark application in OCI Data Flow for data ingestion.

Can I create my data lake using Terraform?

Yes, OCI Data Lake supports Terraform for creating OCI Data Lake resources.

Does OCI Data Lake ingest streaming data?

OCI Data Flow streaming jobs can write data into the data lake.


Data discovery

How will data stewards discover data in the lake?

Data stewards can discover data in the lake using OCI Data Catalog, which is attached/provisioned during the data lake creation process. The catalog is refreshed at regular intervals, giving data stewards the most updated view of their data lake.

Can I use my existing data catalog or Hive Metastore with OCI Data Lake?

No, when a data lake is provisioned, a catalog gets created and is managed by the service.

Security

What is unified access control?

OCI Data Lake provides unified access control, which allows administrators to define access control policies for all data lake objects. From the console, administrators have a consolidated view to see who has access to data lake objects.

How does OCI Data Lake secure my data in the data lake?

OCI Data Lake has two-layered security. The lake itself can only be accessed if the user has been given access through Oracle IAM policy. All objects in the data lake are governed by policies defined in the lake.

Can I create roles and grant permissions to roles in OCI Data Lake?

Yes, data lake administrators can create roles and grant permissions to roles, users, resource principals, groups, and dynamic groups.

Governance

Can I secure my data in external/managed mounts?

Yes, users can assign read/write/administrator permissions to roles/users/resource principals/groups/dynamic groups.

Can I write access policies to secure files in a mount?

No, OCI Data Lake does not support access control on files.

Can I restrict access to certain columns with sensitive data?

Yes, OCI Data Lake enables administrators to create column-level access control policies.

Can I restrict access to some rows in an OCI Data Lake table?

Yes, OCI Data Lake enables administrators to create row-level access control policies based on columns values.

Data access

How can I process or analyze data in OCI Data Lake?

Data engineers can process data in Spark application using OCI Data Flow or in Big Data Service. Data scientists and data analysts can do exploratory analysis or create ML models on data in the data lake using OCI Data Science notebook.

Do I have to write a new Spark application if I move my data to OCI Data Lake?

No, OCI Data Lake supports Spark APIs for easy reading/writing of data in various file formats.

How does OCI Data Lake help an analyst or data scientist access data in OCI Data Lake?

Data analysts can leverage Spark SQL for DDLs, DMLs, or querying data.

Can I visualize data in OCI Data Lake?

Yes, OCI Data Lake is integrated with OCI Data Flow SQL endpoint, which exposes a JDBC/ODBC driver that allows data in the data lake to be visualized using business intelligence tools that support JDBC/ODBC drivers. Users can also leverage the driver to connect to data lake using a SQL tool which supports JDBC/ODBC driver.