ViPR HDFS is a POSIX-like Hadoop compatible file system (HCFS) that enables you to run Hadoop 2.0 applications on top of your ViPR storage infrastructure. You can configure your Hadoop distribution to run against the built-in Hadoop file system against ViPR HDFS, or any combination of HDFS, ViPR HDFS, or other HCFSs available in your environment. When you run Hadoop applications on top of ViPR, the ViPR HDFS software manages both the metadata and the data access so the Hadoop NameNode and Hadoop data nodes do not perform those functions.
The core-site.xml contains configuration information about HDFS. For Hadoop to use ViPR for HDFS requires configuration of the core-site.xml file. The latest release of ViPR integrates data services API with Hadoop, by instantiating a java client jar file on each worker node as a Hadoop Compatible File System (HCFS). The jar file is placed in the hadoop classpath on each node.
Modifying the core-site.xml to add the ViPRFS java classes, defining the ViPRFS URI, allows Hadoop cluster’s to point to the ViPR HDFS implementation. The MapReduce jobs can then read and write data directly on the ViPRFS file system.
By using EMC ViPR for as a HCFS for Cloudera, the local HDFS service, including NameNodes and DataNodes services do not need to be used. Once the custom core-site.xml is configured, the local HDFS service will be disabled.
For the configuration examble delow the fs.defaultFS is viprfs://ObjectBucket.ObjectTenant.HSK
In a previous post I showed how to setup object data services within ViPR. Building on that example we would create a tenant called "ObjectTenant" that has access to the object data store. We would then create a bucket from the ViPR service catalog called "ObjectBucket" that supplies both s3 and HDFS access to the object data store (Example of bucket creation here). Finally "HSK" in the defaultFS name is used per ViPR instance. Meaning that a single core-site.xml could be configured to access mutliple ViPR data services instances.
the Table below shows the configuration that need to be place into the core-site.xml file
Property |
Description |
<property> <name>fs.defaultFS</name> <value>viprfs://ObjectBucket.ObjectTenant.HSK</value> </property> |
File system default name ObjectBucket = The name of the bucket ObjectTenant = The tenant name HSK = FS installation name defined in the next property |
<property> <name>fs.vipr.installations</name> <value>HSK</value> </property> |
Add this custom ViPR HDFS property as a comma-separated list of names. The names are further defined by the fs.vipr.installation. [installation_name].hosts property to uniquely identify sets of ViPR data nodes. The names are used as a component of the authority section of the ViPR HFDS file system URI. |
<property> <name>fs.vipr.installation.HSK.hosts</name> <value>10.10.81.160</value> </property |
Add this custom ViPR HDFS property specifying the IP addresses of the ViPR cluster's data nodes or the load balancers for each name listed in the fs.vipr.installations property. Specify the value in the form of a comma-separated list of IP addresses.ViPR data services IP address. Only other value that needs to be changed is “HSK”, which is defined in the previous property and the Data Services IP address. Ip address can also be FQDN. |
<property> <name>fs.viprfs.impl</name> <value>com.emc.hadoop.fs.vipr.ViPRFileSystem</value> </property> |
Custom ViPR file system property. No changes necessary |
<property> <name>fs.AbstractFileSystem.viprfs.impl</name> <value>com.emc.hadoop.fs.vipr.ViPRAbstractFileSystem</value> </property |
Custom ViPR file system property. No changes necessary |
<property> <name>fs.permissions.umask-mode</name> <value>000</value> </property> |
Custom ViPR file system property. No changes necessary |
<property> <name>fs.viprfs.auth.anonymous_translation</name> <value>CURRENT_USER</value> </property> |
Custom ViPR file system property. No changes necessary |
ViPR custom Core-site.xml properties and values
Hadoop distribution |
ViPR HDFS JAR |
Pivotal |
hadoop-2.0.x-alpha-viprfs-1.0.1.jar |
Cloudera |
hadoop-2.0.x-alpha-viprfs-1.0.1.jar |
Hortonworks |
hadoop-2.2.viprfs-1.0.1.jar |
ViPR Java client version
This section will show you how to use Hortonwoks Ambari Server to push a new core-site.xml file to all the nodes that make up your HDP cluster. For Hortonworks the best way to push this configuration to the nodes is to use the Hortonworks Ambari Server GUI.
On the dashboard screen of the Ambari server click the HDFS service in the left panel |
When the service page opens up, click Configs |
Scroll down the page until you see the “Custom core-site.xml” tab. Click on it to expand. |
Click Add Property |
Add a property name and value for each property found in table 4. Don not add fs.defaultFS property here. That will be done in the next step. The screen shot above shows the ViPR custom properties and values for this guide. |
Click on the Advance tab above the Custom core-site.xml tab. |
For the fs.defaultFS property enter the value for bucket, tenant, and FS installation name. |
Save your changes by clicking Save in the bottom right corner. |
Configuration is saved and the new core-site.xml is pushed to all the managed clients. |
On the dashboard the HDFS, YARN, and MapReduce2 service all show as being needed to be rebooted. ViPR HDFS software manages both the metadata and the data access so the Hadoop NameNode and Hadoop data nodes do not perform those functions. There is no restart button so stop then start the services. |
From the dashboard click on the HDFS service. Click on Stop. Repeat for the YARN and MapReduce services |
From the dashboard click on the HDFS service and click Start. Service start will fail, but will push client config to all nodes. Start Mapreduce and YARN service. |
From one of the HDP nodes verify the core-site.xml file using the cat and grep command cat /etc/hadoop/conf/core-site.xml |grep HSK
|
Upload the ViPR java client to the HDPManager. Use SCP to copy the file to all HDP nodes. Place it in the hadoop classpath directory /usr/lib/hadoop-hdfs/lib scp hadoop-2.2-viprfs-1.0.1.jar [email protected]:/usr/lib/hadoop-hdfs/lib/ Repeat this for all nodes |
Verify connectivity by running a hadoop list command hadoop fs –ls / hadoop fs -ls viprfs://ObjectBucket.ObjectTenant.HSK:9040/ |