Stream information to Amazon S3 for real-time analytics utilizing the Oracle GoldenGate S3 handler

Stream information to Amazon S3 for real-time analytics utilizing the Oracle GoldenGate S3 handler
Stream information to Amazon S3 for real-time analytics utilizing the Oracle GoldenGate S3 handler


Trendy enterprise functions depend on well timed and correct information with rising demand for real-time analytics. There’s a rising want for environment friendly and scalable information storage options. Knowledge at occasions is saved in numerous datasets and must be consolidated earlier than significant and full insights might be drawn from the datasets. That is the place replication instruments assist transfer the info from its supply to the goal techniques in actual time and remodel it as needed to assist companies with consolidation.

On this publish, we offer a step-by-step information for putting in and configuring Oracle GoldenGate for streaming information from relational databases to Amazon Simple Storage Service (Amazon S3) for real-time analytics utilizing the Oracle GoldenGate S3 handler.

Oracle GoldenGate for Oracle Database and Large Knowledge adapters

Oracle GoldenGate is a real-time information integration and replication instrument used for catastrophe restoration, information migrations, excessive availability. It captures and applies transactional modifications in actual time, minimizing latency and protecting goal techniques synchronized with supply databases. It helps information transformation, permitting modifications throughout replication, and works with numerous database techniques, together with SQL Server, MySQL, and PostgreSQL. GoldenGate helps versatile replication topologies equivalent to unidirectional, bidirectional, and multi-master configurations. Earlier than utilizing GoldenGate, ensure you have reviewed and cling to the license settlement.

Oracle GoldenGate for Large Knowledge supplies adapters that facilitate real-time information integration from completely different sources to large information providers like Hadoop, Apache Kafka, and Amazon S3. You’ll be able to configure the adapters to manage the info seize, transformation, and supply course of primarily based in your particular necessities to assist each batch-oriented and real-time streaming information integration patterns.

GoldenGate supplies particular instruments referred to as S3 occasion handlers to combine with Amazon S3 for information replication. These handlers permit GoldenGate to learn from and write information to S3 buckets. This feature means that you can use Amazon S3 for GoldenGate deployments throughout on-premises, cloud, and hybrid environments.

Resolution overview

The next diagram illustrates our answer structure.

On this publish, we stroll you thru the next high-level steps:

  1. Set up GoldenGate software program on Amazon Elastic Compute Cloud (Amazon EC2).
  2. Configure GoldenGate for Oracle Database and extract information from the Oracle database to path recordsdata.
  3. Replicate the info to Amazon S3 utilizing the GoldenGate for Large Knowledge S3 handler.

Stipulations

You could have the next conditions in place:

Set up GoldenGate software program on Amazon EC2

It’s worthwhile to run GoldenGate on EC2 situations. The situations will need to have sufficient CPU, reminiscence, and storage to deal with the anticipated replication quantity. For extra particulars, confer with Operating System Requirements. After you identify the CPU and reminiscence necessities, choose a present era EC2 instance type for GoldenGate.

Use the next components to estimate the required path area:

path disk area = transaction log quantity in 1 hour x variety of hours down x .4

When the EC2 occasion is up and operating, obtain the next GoldenGate software program from the Oracle GoldenGate Downloads page:

  • GoldenGate 21.3.0.0
  • GoldenGate for Large Knowledge 21c

Use the next steps to add and set up the file out of your native machine to the EC2 occasion. Ensure that your IP tackle is allowed within the inbound guidelines of the safety group of your EC2 occasion earlier than starting a session. For this use case, we set up GoldenGate for Basic Structure and Large Knowledge. See the next code:

scp -i pem-key.pem 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip ec2-user@hostname:~/.
ssh -i pem-key.pem  ec2-user@hostname
unzip 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip

Set up GoldenGate 21.3.0.0

Full the next steps to put in GoldenGate 21.3 on an EC2 occasion:

  1. Create a house listing to put in the GoldenGate software program and run the installer:
    mkdir /u01/app/oracle/product/OGG_DB_ORACLE
    /fbo_ggs_Linux_x64_Oracle_shiphome/Disk1
    
    ls -lrt
    whole 8
    drwxr-xr-x. 4 oracle oinstall 187 Jul 29 2021 set up
    drwxr-xr-x. 12 oracle oinstall 4096 Jul 29 2021 stage
    -rwxr-xr-x. 1 oracle oinstall 918 Jul 29 2021 runInstaller
    drwxrwxr-x. 2 oracle oinstall 25 Jul 29 2021 response

  2. Run runInstaller:
    [oracle@hostname Disk1]$ ./runInstaller
    Beginning Oracle Common Installer.
    Checking Temp area: have to be larger than 120 MB.   Precise 193260 MB Handed
    Checking swap area: have to be larger than 150 B.       Precise 15624 MB    Handed

A GUI window will pop as much as set up the software program.

  1. Observe the directions within the GUI to finish the set up course of. Present the listing path you created as the house listing for GoldenGate.

After the GoldenGate software program set up is full, you may create the GoldenGate processes that learn the info from the supply. First, you configure OGG EXTRACT.

  1. Create an extract parameter file for the supply Oracle database. The next code is the pattern file content material:
    [oracle@hostname Disk1]$vi eabc.prm
    
    -- Extract group identify
    EXTRACT EABC
    SETENV (TNS_ADMIN = "/u01/app/oracle/product/19.3.0/community/admin")
    
    -- Extract database consumer login
    
    USERID ggs_admin@mydb, PASSWORD "********"
    
    -- Native path on the distant host
    EXTTRAIL /u01/app/oracle/product/OGG_DB_ORACLE/dirdat/ea
    IGNOREREPLICATES
    GETAPPLOPS
    TRANLOGOPTIONS EXCLUDEUSER ggs_admin
    TABLE scott.emp;

  2. Add the EXTRACT on the GoldenGate immediate by operating the next command:
    GGSCI> ADD EXTRACT EABC, TRANLOG, BEGIN NOW

  3. After you add the EXTRACT, examine the standing of the operating applications with the information all

You will notice the EXTRACT standing is within the STOPPED state, as proven within the following screenshot; that is anticipated.

  1. Begin the EXTRACT course of as proven within the following determine.

The standing modifications to RUNNING. The next are the completely different statuses:

  • STARTING – The method is beginning.
  • RUNNING – The method has began and is operating usually.
  • STOPPED – The method has stopped both usually (managed method) or as a result of an error.
  • ABENDED – The method has been stopped in an uncontrolled method. An irregular finish is called ABEND.

It will begin the extract course of and a path file might be created within the location talked about within the extract parameter file.

  1. You’ll be able to confirm this by utilizing the command stats <<group_name>>, as proven within the following screenshot.

Set up GoldenGate for Large Knowledge 21c

On this step, we set up GoldenGate for Large Knowledge in the identical EC2 occasion the place we put in the GoldenGate Basic Structure.

  1. Create a listing to put in the GoldenGate for Large Knowledge software program. To repeat the .zip file, comply with these steps:
    mkdir /u01/app/oracle/product/OGG_BIG_DATA
    
    unzip 214000_ggs_Linux_x64_BigData_64bit.zip
    tar -xvf ggs_Linux_x64_BigData_64bit.tar
    
    GGSCI> CREATE SUBDIRS
    GGSCI> EDIT PARAM MGR
    PORT 7801
    
    GGSCI> START MGR

It will begin the MANAGER program. Now you may set up the dependencies required for the REPLICAT to run.

  1. Go to /u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader and run the sh file with the most recent version of aws-java-sdk. This script downloads the AWS SDK, which supplies shopper libraries for connectivity to the AWS Cloud.
    [oracle@hostname DependencyDownloader]$ ./aws.sh 1.12.748

Configure the S3 handler

To configure an GoldenGate Replicat to ship information to an S3 bucket, you should arrange a Replicat parameter file and properties file that defines how information is dealt with and despatched to Amazon S3.

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the entry key and secret entry key of your IAM consumer, respectively. Don’t hardcode credentials or safety keys within the parameter and properties file. There are a number of strategies out there to realize this, equivalent to the next:

#!/bin/bash

# Use setting variables which can be already set within the OS
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_REGION="your_aws_region"

You’ll be able to set these setting variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a safe methodology to set them quickly:

export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"

Configure the properties file

Create a properties file for the S3 handler. This file defines how GoldenGate will work together along with your S3 bucket. Just remember to have added the proper parameters as proven within the properties file.

The next code is an instance of an S3 handler properties file (dirprm/reps3.properties):

[oracle@hostname dirprm]$ cat reps3.properties
gg.handlerlist=filewriter

gg.handler.filewriter.kind=filewriter
gg.handler.filewriter.fileRollInterval=60s
gg.handler.filewriter.fileNameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.format=json
gg.handler.filewriter.finalizeAction=rename
gg.handler.filewriter.fileRenameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.eventHandler=s3

goldengate.userexit.writers=javawriter
#TODO Set S3 Occasion Handler- please replace as wanted
gg.eventhandler.s3.kind=s3
gg.eventhandler.s3.area=eu-west-1
gg.eventhandler.s3.bucketMappingTemplate=s3bucketname
gg.eventhandler.s3.pathMappingTemplate=${tableName}_${currentTimestamp}
gg.eventhandler.s3.accessKeyId=$AWS_ACCESS_KEY_ID
gg.eventhandler.s3.secretKey=$AWS_SECRET_ACCESS_KEY

gg.classpath=/u01/app/oracle/product/OGG_BIG_DATA/dirprm/:/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader/dependencies/aws_sdk_1.12.748/
gg.log=log4j
gg.log.stage=DEBUG

#javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar -Daws.accessKeyId=my_access_key_id -Daws.secretKey=my_secret_key
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar

Configure GoldenGate REPLICAT

Create the parameter file in /dirprm within the GoldenGate for Large Knowledge dwelling:

[oracle@hostname dirprm]$ vi rps3.prm
REPLICAT rps3
-- Command so as to add REPLICAT
-- add replicat fw, exttrail AdapterExamples/path/tr
SETENV(GGS_JAVAUSEREXIT_CONF = 'dirprm/rps3.props')
TARGETDB LIBFILE libggjava.so SET property=dirprm/rps3.props
REPORTCOUNT EVERY 1 MINUTES, RATE
MAP SCOTT.EMP, TARGET gg.handler.s3handler;;

[oracle@hostname OGG_BIG_DATA]$ ./ggsci
GGSCI > add replicat rps3, exttrail ./dirdat/tr/ea
Replicat added.

GGSCI > information all
Program Standing Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED RPS3 00:00:00 00:00:39

GGSCI > begin *
Sending START request to Supervisor ...
Replicat group RPS3 beginning.

Now you might have efficiently began the Replicat. You’ll be able to confirm this by operating information and stats instructions adopted by the Replicat identify, as proven within the following screenshot.

To substantiate that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You’ll be able to see that the desk information has been replicated to Amazon S3 in JSON file format.

Finest practices

Just remember to are following the most effective practices on efficiency, compression, and safety.

Contemplate the next greatest practices for efficiency:

The next are greatest practices for compression:

  • Allow compression for path recordsdata to cut back storage necessities and enhance community switch efficiency.
  • Use GoldenGate’s built-in compression capabilities or use file system-level compression instruments.
  • Strike a steadiness between compression stage and CPU overhead, as a result of increased compression ranges might impression efficiency.

Lastly, when implementing Oracle GoldenGate for streaming information to Amazon S3 for real-time analytics, it’s essential to deal with numerous safety concerns to guard your information and infrastructure. Observe the security best practices for Amazon S3 and security options available for GoldenGate Classic Architecture.

Clear up

To keep away from ongoing fees, delete the assets that you simply created as a part of this publish:

  1. Take away the S3 bucket and path recordsdata if now not wanted and cease the GoldenGate processes on Amazon EC2.
  2. Revert the modifications that you simply made within the database (equivalent to grants, supplemental logging, and archive log retention).
  3. To delete all the setup, stop your EC2 instance.

Conclusion

On this publish, we offered a step-by-step information for putting in and configuring GoldenGate for Oracle Basic Structure and Large Knowledge for streaming information from relational databases to Amazon S3. With these directions, you may efficiently arrange an setting and benefit from the real-time analytics utilizing a GoldenGate handler for Amazon S3, which we are going to discover additional in an upcoming publish.

When you have any feedback or questions, depart them within the feedback part.


Concerning the Authors

Prasad Matkar is Database Specialist Options Architect at AWS primarily based within the EMEA area. With a concentrate on relational database engines, he supplies technical help to clients migrating and modernizing their database workloads to AWS.

Arun Sankaranarayanan is a Database Specialist Resolution Architect primarily based in London, UK. With a concentrate on purpose-built database engines, he assists clients in migrating and modernizing their database workloads to AWS.

Giorgio Bonzi is a Sr. Database Specialist Options Architect at AWS primarily based within the EMEA area. With a concentrate on relational database engines, he supplies technical help to clients migrating and modernizing their database workloads to AWS.

Leave a Reply

Your email address will not be published. Required fields are marked *