When deploying a Hadoop cluster with VMware big data extensions (BDE), you have a few options on the type of cluster you deploy. The deployment types supported by BDE are Basic, HBase, Data/Compute separation, and Compute-only. To integrate EMC Isilon for HDFS we use the compute-only deployment type
Deployment types are pre-defined in BDE. You can view these types by looking at the map file found in the /opt/serengeti/www/specs directory of the Serengeti vApp management server.
The diagram below shows the contents of the map file for PivotalHD. Each deployment type is defined with the vendor name, version #, Deployment type, and path to the json file for that type.
To deploy a compute only hadoop cluster and map HDFS to Isilon follow the steps below
See my preivous blog post on how to setup Isilon for HDFS
Log in to the vSphere Web Client. Select Big Data Extensions. Click Create a New Hadoop Cluster from theBasic Tasks list in theGetting Started tab. The Create New Hadoop Cluster dialog displays.
|
|
Give the cluster a name. Select a distribution Select Compute-Only Hadoop Cluster |
|
Enter the HDFS RPC URL to the Isilon cluster. Format: HDFS://IsilonFQDN:8020 |
|
Select the resource size for the nodes. By default the Compute Master and theClient Node will be placed on shared storage. To change this, select customize on the dropdown and choose local storage. Workers are placed by default on local storage.
|
|
Select a resource pool for deployments |
|
Choose Network |
|
Cluster will begin deploying |
|
|
|
After deployment you can check that HDFS is using Isilon by issuing the following command from any node. This command lists the contents of the HDFS directory on Isilon hadoop fs –ls /
|
See my previous blog post on how to setup Isilon for HDFS. The example illustrates PivotalHD, but can be used by any distribution that has been configured (future blog posts will cover configuring hadoop distributions). For a full detailed doc on BDE deployments download the EMC hadoop starter kit:
Comments