Pre-requisitive: On how to install hadoop
https://dev.to/zawhtutwin/installing-hadoop-single-node-cluster-in-aws-ec2-o39
We have already installed Hadoop with OpenJDK 1.8 in the previous guide.
JDK 1.8 is required because later version of MySql Servers does not work well with JDK 1.7 MySql connector drivers. Cloudera shipped their docker image with JDK 1.7,so the objective of this manual installation is to allow the sqoop to work with JDK 8 which support wide range of MySql Server version especially on RDS. So we are not using Cloudera docker image, instead we will be installing things manually.
Go to home folder
cd ~
Download sqoop from Apache website
wget sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
Extract the file in the home folder
tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
Create a sqoop directory in /usr/lib
cd /usr/lib
mkdir sqoop
Move the sqoop-1.4.7.bin__hadoop-2.6.0 folder to /usr/lib folder
mv ~/sqoop-1.4.7.bin__hadoop-2.6.0 .
Add $SQOOP_HOME environment variable in ~/.bashrc
sudo nano ~/.bashrc
export SQOOP_HOME=/usr/lib/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0
Then add to the $PATH variable too
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
Save the .bashrc and source it
source .bashrc
Then download the sql-connector-j jar file from Maven
cd ~
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.30/mysql-connector-java-8.0.30.jar
Then copy mysql-connector-java-8.0.30.jar to $SQOOP_HOME/lib folder
cp mysql-connector-java-8.0.30.jar $SQOOP_HOME/lib
Go to $SQOOP_HOME/conf folder and rename the sqoop-env-template.sh to sqoop-env.sh
mv sqoop-env-template.sh sqoop-env.sh
Then edit the file as following
export HADOOP_COMMON_HOME=/usr/lib/hadoop/hadoop-2.9.0
export HADOOP_MAPRED_HOME=/usr/lib/hadoop/hadoop-2.9.0
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
Then check the sqoop installation version
sqoop version
22/10/05 04:50:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
Then you can start import from the RDS or any Mysql remote database as following.
sqoop import --connect jdbc:mysql://your_rds_dns_address/yourdatabase --table hr_users --username something --password 'something'
After import the data will be saved as csv part files in hdfs. The location is /user/ubuntu/hr_users. You can verify as following.
hdfs dfs -ls /user/ububtu/hr_users
To see the content of the file.
hdfs dfs -cat /user/ubuntu/hr_users/part-m-00001
Then you are ready to install Apache Hive
https://dev.to/zawhtutwin/installing-hive-in-aws-ec2-2g35/
Top comments (0)