This is a continuing series on how to build a data lake. Welcome to part4
The control center server (PCC) will push all the software and configuration information to our PHD nodes, Hawq master, and hawq segment servers. Create a temp directory and upload the binaries to it using something like winscp. In my examples I created and use a directory called /tmp/PHD
Create 3 different text files. These text files contain the FQDN of the hosts. The first, called hosts.txt, contain all the FQDN of the PHD nodes, the Hawq master, and the gemxdnodes. Do not add the PCC server to this file. The second file, called hawq.txt, contains the FQDN of the hawq master and the hawq segment servers. The third file, called gemxdnode.txt contains the FQDN of the gemfirexd nodes. Each file should have one FQDN per line. Make sure there are no spaces at the end of lines and no blank spaces at the end of the file. Example of the host.tx file below
These files will be used by the installer to perform tasks during the install. During the prep and install at certain point I needed to run commands or copy data to multiple hosts. I created a simple script to do this that calls on these txt files. In the example below I run ntp date on all the hosts
for dest in $(<hosts.txt); do
ssh root@${dest} ntpdate ntp.server.com
done
To make sshing easier first we setup ssh paswwordless login. Run these commands on the PCC server
ssh-keygen
Once the key is generated it needs to be copied to all the hosts in the cluster.
You can do this using the ssh-copy-id command for each individual host or use a script like the one below.
The command to copy the ssh file
ssh-copy-id -i /root/.ssh/id_rsa.pub phdnode1.cto.emc.local
Do this for all the nodes listed in your host.txt file. If using a script to copy the ssh-id change the ssh setting on the PCC host
vim /etc/ssh/ssh_config
edit this setting to = no -------StrickHostKeyChecking no
The script below:
for dest in $(<hosts.txt); do
ssh-copy-id -i /root/.ssh/id_rsa.pub root@${dest} -p password
done
Next untar the install files
tar -zxvf PCC-2.1.0-460.x86_64.tar.gz
tar zxf PHD-1.1.0.0-76.tar.gz
tar zxf PADS-1.1.x.x.tar.gz
Install the PCC server
cd PCC-2.1.0-460
./install
You can accept all the defaults for the install. After the install you can login to the PCC server using the following URL and port number
https://pcc.cto.emc.local:5443
The PCC install created a user called gpadmin with a password of Gpadmin1. This is the account used to login to PCC and will be used across the cluster.
Install PHD and Hawq
From the command line prepare the PCC server to install PHD and Hawq nodes. You have to import the binaries to the PCC server and then prep the hosts.
su gpadmin
icm_client import -s PHD-1.1.1.0-82/
icm_client import -s PADS-1.1.4-34/
icm_client import -r jdk-6u26-linux-x64-rpm.bin
These 3 commands import the binaries to the PCC server to be used during cluster deployment
Note: Download the java file from oracle and place it in the /tmp/PHD directory
icm_client preparehosts --hostfile=/tmp/PHD/hosts.txt --java=/tmp/PHD/jdk-6u26-linux-x64-rpm.bin
This command preps the hosts by creating a gpadmin user and installing java.
icm_client scanhosts -f /tmp/PHD/hosts.txt
Resolve any errors that the scan shows. Re-run the command once any erros are corrected.
icm_client prepare-hawq-hosts -f /tmp/hawq.txt -g /usr/lib/gphd/gphdmgr/hawq_sys_config
This commands prepares the hawq servers
From the web gui login into the PCC server as gpadmin. Select the create cluster button. A wizard will launch.
Give the cluster a name and enter the FQDN of the hosts used in the cluster. Don not include the gemfire or PCC host. Do include the Hawq server
|
Enter the root password for the hosts and a password for the gpadmin user. The prep hosts command that was run at the command line has already created the gpadmin user. Enter the name of the JDK file that you downloaded and told the ICM command to use. This version was installed during the prep host command. |
Durring setup of my nodes I disabled SELinux, iptables and setup NTP. If you didn't do those things you can have the wizard do them for you. You can run one last scanhost to verify the nodes are prepared. This is the same command run at the CLI.
|
The hosts are scanned to verify they are prepared to have the componets installed in them.
Next we assign roles. See part2 for role assignemnts used in this enviornment. Note that only hosts setup by the wizard can have roles assigned to them. |
Note we changed this after the original install and also installed the clients to our SpingXD hosts as well as node1. |
Even though we are using Isilon for HDFS we must install HDFS. The installer will bomb if we dont. This does mean that by default any namenode entries in config files will be pointed to phdnode1. We can edit these at the end of the wizard. The next set of screenshots show the service roles and the nodes we assigned to them. |
After the roles are assigned select the libaries to be installed. In the next section we edit some config files to use Isilon for HDFS instead of the local HDFS. On the left side are the names of the service and their coresponding config files. |
Select the core-site.xml for HDFS and change the hdfs vlaue to be the FQDN of you isilon smartconnect address with the port uimber 8020 |
In Hawq secrtion select the gpinitsystem_config file note the location of the Master_Directory. Change this to /hawq_data |
In the gpinitsystem_config file change the DFS_URL to the Isilon FQDN with the 8020 port number and the location of the hawq master_directory |
In the yarn-site.xml change the yarn.log-aggregation-enable value to false. This is a recommended setting for Isilon. Finally in the hbase config file change the DFS_URL to the Isilon FQDN. I will add a screen shot for this later as I forgot to take one during the initial install :) For this project we wont be using Hbase so its not neccessary to configure it co use Isilon for HDFS. After you have edited all the files click install. In the uper right corner is a link to save the xml config file for the install. You can save this file and use it if you ever have to do a reinstall. After the install completes click the start cluster button and the cluster and all the service will start.
Next up will be initializing Hawq
|