This is a contining blog series on How to build a data lake
In part 2 I showed the architecture we are building for a data lake. In this blog I will begin to show how to deploy and integrate it all together. We’ll start with the base, Isilon for HDFS, and work our way up through Pivotal Hadoop, Hawq, and then Gemfire to complete our analytics infrastructure.
Here is a list of the software versions we are deploying.
Software |
Version |
PivotalHD |
1.1.1.0-82 |
Hawq - PADS |
1.1.4-34 |
GemFire - PRTS |
1.0.0-9 |
OneFS |
7.0.2.4 |
VMware |
5.5.0 |
Prepare Isilon:
Check out this previous post on how to enable HDFS on Isilon. After enabling HDFS we have to create the users, groups and directory structure for the Pivotal suite. Create the users on the Isilon CLI by using the pw command:
pw useradd mapred –G hadoop
Note that all users should be added to the hadoop group. Besides mapred add the following users:
gpadmin
yarn
hadoop
hdfs
hbase
In the HDFS root create the directory structure shown below with the owner, group and permissions as shown. This can be done from the Isilon cli
/user 1777 mapred:hadoop
/user/history 1777 mapred:hadoop
/user/hadoop 755 hadoop:hadoop
/user/mapred 755 mapred:hadoop
/user/hdfs 755 hdfs:hadoop
/tmp 1777 hdfs:hadoop
/yarn 1777 yarn:hadoop
/apps 755 hdfs:hadoop
/apps/hbase 755 hbase:hadoop
/hawq_data 1777 gpadmin:hadoop
/hive 1777 hdfs:hadoop
/hive/gphd/ 1777 hdfs:hadoop
/hive/gphd/warehouse/ 1777 hdfs:hadoop
At this point Isilon has HDFS enabled and has a directory structure and permissions for PHD, HAWQ and Gemfire to use.
Next up Installing PHD1.1
Comments