This is a continuing series on how to build a data lake. Welcome to part5
In this blog post I’ll show you how to enable Isilon to integrate with Pivotal Hawq.
In these previous posts I explained the architecture and install of our data lake. At this point we have installed and configured HDFS for EMC Isilon, deployed and configured Pivotal Hadoop, and installed Hawq. Now we need to initiate our hawq server and enable the Pivotal Framework extension to access Isilon.
To enable Hawq to access EMC Isilon HDFS we need to first initiate the Hawq service then install the PFX RPM. To initiate the Hawq server we do the following.
On the hawq server:
Confirm the Isilon setting
cat /etc/gphd/hawq/conf/gpinitsystem_config |grep Isilon
Your DFS_URL should be the FQDN of the Isilon cluster with the port number 8020 and the /hawq_data directory. Example for my environment is:
DFS_URL=isilon2.cto.emc.local:8020/hawq_data
To start the service:
su gpadmin
source /usr/local/hawq/greenplum_path.sh
./etc/init.d/hawq init
This initiates the Hawq environment
gpstart –a
This starts the hawq service
On the Onefs file system in the HDFS directory under the hawq_data diretroy we should see something like the following:
Enable the Pivotal Framework Extension
First you will need to download the Pivotal framework extension RPM (PFXD). It needs to be installed on to the hawq master and all hawq segment nodes. Ill use a script that calls on a hawq text file (contains the FQDN's of the hawq server and segment nodes)
for dest in $(<hawq.txt); do
scp /tmp/PHD/pxfd-1.0-1.noarch.rpm ${dest}:/tmp
done
This next script does the install
for dest in $(<hawq.txt); do
ssh ${dest} rpm -ivh /tmp/pxfd-1.0-1.noarch.rpm
done
On Hawq server we need to enable the frame work
vim /data1/master/gpseg-1/postgresql.conf
add the following line
pfx_local_storage=false
Restart the service
su -gpadmin /etc/init.d restart
As root we need to share ssh keys with the hawq server and segment nodes and export our Java profile. We will use the “set gpssh-exkeys”
source /usr/local/hawq/greenplum_path.sh
cd /usr/local/hawq-1.1.4.0/bin/
./gpssh -f /tmp/hosts.txt "echo "export JAVA_HOME=/usr/java/latest" > /etc/profile.d/java.sh"
./gpssh -f /tmp/hosts.txt chmod a+x /etc/profile.d/java.sh
./gpssh -f /tmp/hosts.txt /usr/lib/gphd/pxfd/start-pxf.sh
Next we start the nodes:
su gpadmin
source /usr/local/hawq/greenplum_path.sh
cd /usr/local/hawq-1.1.4.0/bin/
gpssh -f /tmp/hosts.txt /usr/lib/gphd/pxfd/start-pxf.sh
Hawq and the PFX are now up and running and integrated with EMC Isilon for HDFS
Comments
You can follow this conversation by subscribing to the comment feed for this post.