AWS Lake Formation
AWS Lake Formation¶
AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.
However, setting up and managing data lakes today involves a lot of manual, complicated, and timeconsuming tasks. This work includes loading data from diverse sources, monitoring those data flows,
setting up partitions, turning on encryption and managing keys, defining transformation jobs and monitoring their operation, re-organizing data into a columnar format, configuring access control settings, deduplicating redundant data, matching linked records, granting access to data sets, and auditing access over time.
Creating a data lake with Lake Formation is as simple as defining where your data resides and what data access and security policies you want to apply. Lake Formation then collects and catalogs data from databases and object storage, moves the data into your new Amazon S3 data lake, cleans and classifies data using machine learning algorithms, and secures access to your sensitive data. Your users can then
access a centralized catalog of data which describes available data sets and their appropriate usage. Your users then leverage these data sets with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Amazon Redshift, Amazon Athena, SageMaker, and Amazon QuickSight.
Backlinks¶
- AWS overview
- Amazon Athena
Amazon Elasticsearch Service
Amazon EMR
Amazon FinSpace
Amazon Kinesis
Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Amazon Redshift
Amazon QuickSight
AWS Data Exchange
AWS Data Pipeline
AWS Glue
AWS Lake Formation
Amazon Managed Streaming for Apache Kafka
- Amazon Athena