Use AWS Glue Knowledge Catalog views to research information


On this submit, we present you easy methods to use the brand new views characteristic the AWS Glue Knowledge Catalog. SQL views are a robust object used throughout relational databases. You need to use views to lower the time to insights of information by tailoring the info that’s queried. Moreover, you need to use the facility of SQL in a view to precise complicated boundaries in information throughout a number of tables that may’t be expressed with easier permissions. Knowledge lakes present clients the pliability required to derive helpful insights from information throughout many sources and lots of use instances. Knowledge shoppers can eat information the place they should throughout strains of enterprise, growing the speed of insights technology.

Clients use many various processing engines of their information lakes, every of which have their very own model of views with completely different capabilities. The AWS Glue Knowledge Catalog and AWS Lake Formation present a central location to handle your information throughout information lake engines.

AWS Glue has launched a new feature, SQL views, which lets you handle a single view object within the Knowledge Catalog that may be queried from SQL engines. You’ll be able to create a single view object with a distinct SQL model for every engine you need to question, resembling Amazon Athena, Amazon Redshift, and Spark SQL on Amazon EMR. You’ll be able to then handle entry to those sources utilizing the identical Lake Formation permissions which can be used to regulate tables within the information lake.

Answer overview

For this submit, we use the Women’s E-Commerce Clothing Review. The target is to create views within the Knowledge Catalog so you possibly can create a single frequent view schema and metadata object to make use of throughout engines (on this case, Athena). Doing so allows you to use the identical views throughout your information lakes to suit your use case. We create a view to masks the customer_id column on this dataset, then we’ll share this view to a different consumer in order that they’ll question this masked view.

Stipulations

Earlier than you possibly can create a view within the AWS Glue Knowledge Catalog, just remember to have an AWS Identity and Access Management (IAM) position with the next configuration:

  • The next belief coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": [
               "glue.amazonaws.com",
               "lakeformation.amazonaws.com"
            ]
          },
          "Motion": "sts:AssumeRole"
        }
      ]
    }

  • The next go position coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Sid": "Stmt1",
          "Action": [
            "iam:PassRole"
          ],
          "Impact": "Enable",
          "Useful resource": "*",
          "Situation": {
             "StringEquals": {
               "iam:PassedToService": [
                 "glue.amazonaws.com",
                 "lakeformation.amazonaws.com"
               ]
             }
           }
         }
       ]
    }

  • Lastly, additionally, you will want the next permissions:
    • "Glue:GetDatabase",
    • "Glue:GetDatabases",
    • "Glue:CreateTable",
    • "Glue:GetTable",
    • "Glue:UpdateTable",
    • "Glue:DeleteTable",
    • "Glue:GetTables",
    • "Glue:SearchTables",
    • "Glue:BatchGetPartition",
    • "Glue:GetPartitions",
    • "Glue:GetPartition",
    • "Glue:GetTableVersion",
    • "Glue:GetTableVersions"

Run the AWS CloudFormation template

You’ll be able to deploy the AWS CloudFormation template glueviewsblog.yaml to create the Lake Formation database and desk. The dataset will likely be loaded into an Amazon Simple Storage Service (Amazon S3) bucket.

For step-by-step directions, seek advice from Creating a stack on the AWS CloudFormation console.

When the stack is full, you possibly can see a desk known as clothing_parquet on the Lake Formation console, as proven within the following screenshot.

Create a view on the Athena console

Now that you’ve your Lake Formation managed desk, you possibly can open the Athena console and create a Knowledge Catalog view. Full the next steps:

  1. Within the Athena question editor, run the next question on the Parquet dataset:
SELECT * FROM "clothing_reviews"."clothing_parquet" restrict 10;

Within the question outcomes, the customer_id column is at the moment seen.

Subsequent, you create a view known as hidden_customerID and masks the customer_id column.

  1. Create a view known as hidden_customerID:
CREATE PROTECTED MULTI DIALECT VIEW clothing_reviews.hidden_customerid SECURITY DEFINER AS 
SELECT * FROM clothing_reviews.clothing_parquet

Within the following screenshot, you possibly can see a view known as hidden_customerID was efficiently created.

  1. Run the next question to masks the primary 4 characters of the customer_id column for the newly generated view:
ALTER VIEW clothing_reviews.hidden_customerid UPDATE DIALECT AS
SELECT '****' || substring(customer_id, 4) as customer_id,clothing_id,age,title,review_text,ranking,recommend_ind,positive_feedback,division_name,department_name,class_name 
FROM clothing_reviews.clothing_parquet

You’ll be able to see within the following screenshot that the view hidden_customerID has the customer_id column’s first 4 characters masked.

The unique desk clothing_parquet stays the identical unmasked.

Grant entry of the view to a different consumer to question

Knowledge Catalog views let you use Lake Formation to regulate entry. On this step, you grant this view to a different consumer known as amazon_business_analyst after which question from that consumer.

  1. Check in to the Lake Formation console as admin.
  2. Within the navigation pane, select Views.

As proven within the following screenshot, you possibly can see the hidden_customerid view.

  1. Check in because the amazon_business_analyst consumer and navigate to the Views web page.

This consumer has no visibility to the view.

  1. Grant permission to the amazon_business_analyst consumer from the info lake admin.
  1. Check in once more as amazon_business_analyst and navigate to the Views web page.

  1. On the Athena console, question the hidden_customerid view.

You’ve gotten efficiently shared a view to the consumer and queried it from the Athena console.

Clear up

To keep away from incurring future fees, delete the CloudFormation stack. For directions, seek advice from Deleting a stack on the AWS CloudFormation console.

Conclusion

On this submit, we demonstrated easy methods to use the AWS Glue Knowledge Catalog to create views. We then confirmed easy methods to alter the views and masks the info. You’ll be able to share the view with completely different customers to question utilizing Athena. For extra details about this new characteristic, seek advice from Using AWS Glue Data Catalog views.


In regards to the Authors

Leonardo Gomez is a Principal Analytics Specialist Options Architect at AWS. He has over a decade of expertise in information administration, serving to clients across the globe tackle their enterprise and technical wants. Join with him on LinkedIn

Michael Chess – is a Product Supervisor on the AWS Lake Formation group primarily based out of Palo Alto, CA. He makes a speciality of permissions and information catalog options within the information lake.

Derek Liu – is a Senior Options Architect primarily based out of Vancouver, BC. He enjoys serving to clients resolve huge information challenges by way of AWS analytic companies.

Leave a Reply

Your email address will not be published. Required fields are marked *