One of the benefits of VMware Big Data Extension is the ability to configure, deploy and run multiple Hadoop distributions from different vendors. When you deploy the Big Data Extensions vApp, the Apache 1.2.1 Hadoop distribution is included in the OVA that you download and deployed. You can add and configure other Hadoop distributions, like PivotalHD, using Yellowdog updater (YUM). YUM is an open-source command-line package-management utility for Linux operating systems that allows automatic updates, package and dependency management, on RPM-based distributions like CentOS. PivotalHD and Cloudera distributions require the setup of a YUM repository on the Serengeti vApp management server to host the RPM’s for the hadoop distribution.
This blog will show how to setup PivotalHD for deployment byBDE. To use PivotalHD with VMware Big Data Extensions, you must first setup a YUM repo, and create a CentOS 6 template (See my previous blog on setting up a custom template). The YUM repo holds the RPM’s that are required to install PivotalHD. These RPMS, can be found here:
http://gopivotal.com/pivotal-products/data/pivotal-hd#4
http://bitcast-a.v1.o1.sjc1.bitgravity.com/greenplum/pivotal-sw/phd_1.0.1.0-19_community.tar.gz
VMware Big Data Extension supports PivotalHD version1 on RedHat and derivatives version 6.
After downloading the RPM’s and creating a repo, a configuration script is used to configure the BDE automation. VMware Big Data Extension uses a Ruby script called config-distro.rb located in the /opt/serengeti/sbin directory on the Serengeti vApp management. This script sets up the chef manifests that are used to automate Hadoop cluster deployments. We run this utility and give it the correct distro information for the different packages we want to deploy.
When the Serengeti vApp is deployed, along with the management server VM is a template VM. This VM is a Centos 5 distribution and is used to deploy all the nodes that make up a Hadoop cluster. The management VM uses puppet to deploy the packages to the template and configure it accordingly. PivotalHD is supported on CentOS 6.
Below is how to setup the Serengeti management server for PivotalHD.
Log in to management server using either putty or the VMware console |
|
Change directories to the temp dir cd /tmp |
|
Download the RPMS from the pivotal web site using the wget command wget bitcast-a.v1.o1.sjc1.bitgravity.com/greenplum/pivotal-sw/phd_1.0.1.0-19_community.tar.gz |
|
Extract the content of the downloaded file tar -zxvf phd_1.0.1.0-19_community.tar.gz |
|
There will be some errors with the content as it extracts. This is normal. |
|
Once extraction is complete, change directories cd PHD_1.0.1_CE |
|
There are 3 files in this directory that need to be extracted. tar -xf PHD-1.0.1.0-19.tar.gz tar -xf PHDTools-1.0.1-19.tar.gz tar -xf PCC-2.0.1.84.121.163.x86_64.tar.g |
|
Once the files are extracted, create a directory to place the RPMS and copy the files to that directory mkdir -p /opt/serengeti/www/PHD/1 mv PHD-1.0.1.0-19 /opt/serengeti/www/PHD/1/ mv PCC-2.0.1.84 /opt/serengeti/www/PHD/1/ mv PHDTools-1.0.1-19 /opt/serengeti/www/PHD/1 cd /opt/serengeti/www/PHD/1/
|
|
A list show the 3 directories moved in the last step. |
|
|
|
Create a YUM repo by executing the following command: createrepo . |
|
Create and edit the repo file touch PHD.repo vim PHD.repo
|
|
Enter the following into the file [PHD] name=Pivotal HD Version 1 baseurl=https://10.10.81.36/PHD/1/ enabled=1 gpgcheck=0 protect=1 NOTE: baseurl should be the IP address of the management server. An ifconfig from the command line will give you this address. Save file. |
|
Open a browser and enter the url: https://10.10.81.36/PHD/1/PHD.repo You should see the contents of the repo file from the last step. |
|
Use the config-distro.rb command to create the correct setting for the Chef manifest config-distro.rb --name PivotalHD --vendor PHD --version 1.0.1 --repos http://10.10.81.36/PHD/1/PHD.repo |
|
Change directory and run the cat command on the manifest file to check contents: cd /opt/serengeti/www/distros cat manifest The end of the file should contain the same text as the screen shot above. Also contained in this directory is a file called manisfest.sample, that shows a sample of how distributions should appear in the manifest file. |
|
Change directory and edit the map file: cd /opt/Serengeti/www/specs vim map |
|
Scroll through the file till you find the “PHD” section. Verify that the version number is the same that you downloaded and set the repo up with. Close file without saving |
|
Restart tomcat service service tomcat restart |
|
In the VMware web client, go to the Big Data extensions tab and click on Hadoop Distributions. You should see the PivotalHD distribution version 1.2.1 is now ready. This verifies the contents of the manifest file. You will always have the apache distribution listed in addition to any other configures distributions. |
|
Click on the Big Data Clusters tab, and select deploy cluster. Under the Hadoop distribution drop down select Cloudera. All deployment types should be available. This verifies the contents of the map file. Check out the EMC Hadoop starter kit for more information |
Comments