Hello,
I have a task of data processing. I want to import the large data from SQL Server into Hive using SQOOP. I want to process the data and generate some result. For that, I have implemented some task for it in Hive using Hive JDBC.
Now I want to deploy this task on Amazon EMR. I have read some links related to AWS EMR for launching the instance and about what is EMR, How it works and etc…
I have some doubts about EMR like:
1) EMR uses S3 Buckets, which holds Input and Output data Hadoop Processing (in the form of Objects). If I use SQOOP to import the tables from SQL Server then it will stored in the form of File.
2) As already said I have implemented a task for my use case in Java. So If I create the JAR of my program and create the Job Flow with Custom JAR. Will it be possible to implement like this or do need to do some thing extra for that?
3) As I said in my Use Case that I want to export my result back to Ec2 with the help of SQOOP. Does EMR have support of SQOOP?
4) And if supported then I have one doubt as:
I am using Hive. Whatever Tables I created or Imported using SQOOP that will
automatically stored in HDFS (in /user/hive/warehouse directory). Then If I use
S3 to stored the imported data from SQL Server then How can I make link with
HDFS (/user/hive/warehouse directory) so that Hive can show/access the tables
And also I want to import the tables from SQL Server daily/weekly so I will use
S3.
How to overcome from all this stuff? I am new to Amazon Services and I am not getting how to proceed for this? Please help me.
Hello,
I have a task of data processing. I want to import the large data from SQL Server into Hive using SQOOP. I want to process the data and generate some result. For that, I have implemented some task for it in Hive using Hive JDBC.
Now I want to deploy this task on Amazon EMR. I have read some links related to AWS EMR for launching the instance and about what is EMR, How it works and etc…
I have some doubts about EMR like:
1) EMR uses S3 Buckets, which holds Input and Output data Hadoop Processing (in the form of Objects). If I use SQOOP to import the tables from SQL Server then it will stored in the form of File.
2) As already said I have implemented a task for my use case in Java. So If I create the JAR of my program and create the Job Flow with Custom JAR. Will it be possible to implement like this or do need to do some thing extra for that?
3) As I said in my Use Case that I want to export my result back to Ec2 with the help of SQOOP. Does EMR have support of SQOOP?
4) And if supported then I have one doubt as:
I am using Hive. Whatever Tables I created or Imported using SQOOP that will
automatically stored in HDFS (in /user/hive/warehouse directory). Then If I use
S3 to stored the imported data from SQL Server then How can I make link with
HDFS (/user/hive/warehouse directory) so that Hive can show/access the tables
And also I want to import the tables from SQL Server daily/weekly so I will use
S3.
How to overcome from all this stuff? I am new to Amazon Services and I am not getting how to proceed for this? Please help me.