ViPR HDFS is a POSIX-like Hadoop compatible file system (HCFS) that enables you to run Hadoop 2.x applications on top of your ViPR storage infrastructure. You can configure your Hadoop distribution to run against the built-in Hadoop file system against ViPR HDFS, or any combination of HDFS, ViPR HDFS, or other HCFSs available in your environment. When you run Hadoop applications on top of ViPR, the ViPR HDFS software manages both the metadata and the data access so the Hadoop NameNode and Hadoop data nodes do not perform those functions
The core-site.xml contains configuration information about HDFS. For Hadoop to use ViPR for HDFS requires configuration of the core-site.xml file. The latest release of ViPR integrates data services API with Hadoop, by instantiating a java client jar file on each worker node as a Hadoop Compatible File System (HCFS). The jar file is placed in the hadoop classpath on each node.
Modifying the core-site.xml to add the ViPRFS java classes, defining the ViPRFS URI, allows Hadoop cluster’s to point to the ViPR HDFS implementation. The MapReduce jobs can then read and write data directly on the ViPRFS file system.
By using EMC ViPR for as a HCFS for Cloudera, the local HDFS service, including NameNodes and DataNodes services do not need to be used. Once the custom core-site.xml is configured, the local HDFS service will be disabled.
For the configuration examble delow the fs.defaultFS is viprfs://ObjectBucket.ObjectTenant.HSK
In a previous post I showed how to setup object data services within ViPR. Building on that example we would create a tenant called "ObjectTenant" that has access to the object data store. We would then create a bucket from the ViPR service catalog called "ObjectBucket" that supplies both s3 and HDFS access to the object data store (Example of bucket creation here). Finally "HSK" in the defaultFS name is used per ViPR instance. Meaning that a single core-site.xml could be configured to access mutliple ViPR data services instances.
The table below shows the configuration that need to be place into the core-site.xml file
Property |
Description |
<property> <name>fs.defaultFS</name> <value>viprfs://ObjectBucket.ObjectTenant.HSK</value> </property> |
File system default name ObjectBucket = The name of the bucket ObjectTenant = The tenant name HSK = FS installation name defined in the next property |
<property> <name>fs.vipr.installations</name> <value>HSK</value> </property> |
Add this custom ViPR HDFS property as a comma-separated list of names. The names are further defined by the fs.vipr.installation. [installation_name].hosts property to uniquely identify sets of ViPR data nodes. The names are used as a component of the authority section of the ViPR HFDS file system URI. |
<property> <name>fs.vipr.installation.HSK.hosts</name> <value>10.10.81.160</value> </property |
Add this custom ViPR HDFS property specifying the IP addresses of the ViPR cluster's data nodes or the load balancers for each name listed in the fs.vipr.installations property. Specify the value in the form of a comma-separated list of IP addresses.ViPR data services IP address. Only other value that needs to be changed is “HSK”, which is defined in the previous property and the Data Services IP address. Ip address can also be FQDN. |
<property> <name>fs.viprfs.impl</name> <value>com.emc.hadoop.fs.vipr.ViPRFileSystem</value> </property> |
Custom ViPR file system property. No changes necessary |
<property> <name>fs.AbstractFileSystem.viprfs.impl</name> <value>com.emc.hadoop.fs.vipr.ViPRAbstractFileSystem</value> </property |
Custom ViPR file system property. No changes necessary |
<property> <name>fs.permissions.umask-mode</name> <value>000</value> </property> |
Custom ViPR file system property. No changes necessary |
<property> <name>fs.viprfs.auth.anonymous_translation</name> <value>CURRENT_USER</value> </property> |
Custom ViPR file system property. No changes necessary |
Table 4. ViPR custom Core-site.xml properties and values
Hadoop distribution |
ViPR HDFS JAR |
Pivotal |
hadoop-2.0.x-alpha-viprfs-1.0.1.jar |
Cloudera |
hadoop-2.0.x-alpha-viprfs-1.0.1.jar |
Hortonworks |
hadoop-2.2.viprfs-1.0.1.jar |
ViPR Java client version
For Cloudera the best way to push this configuration to the nodes is to use the Cloudera manager GUI.
From the Cloudera web page, click on the HDFS service under Status to open the HDFS service. |
Click Configuration, and choose View and Edit |
On the left panel expand Service-Wide and click advance |
Click on Cluster-wide Configuration Safety Valve for core-site.xml Enter the setting from Table above |
Settings entered into core-site safety valve |
At the top right of the page, Click Save Changes |
Go back to the Cloudera Manager Home screen. Using the drop down stop the HDFS service. |
Using the drop down next to the Status, deploy the client configuration |
g Using the drop down, restart the Mapreduce service. |
Check that the nodes have received the correct configuration. From the CLI on a node do a grep on the core-site.xml and search for HSK cat /etc/hadoop/conf.cloudera.hdfs1/core-site.xml |grep HSK |
Next Upload the Java ViPR client JAR file to a client using WinSCP and copy it to the Hadoop Classpath on all Hadoop nodes.
On the first node where you have uploaded the ViPr client copy it to the Cloudera Classpath cp hadoop-2.0.x-alpha-viprfs-1.0.1.jar /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hadoop/lib |
Copy it to all the other nodes in your cluster using the scp command. Do this for all hosts. scp hadoop-2.0.x-alpha-viprfs-1.0.1.jar [email protected]:/opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hadoop/lib |
Validate that the connection is working with a hadoop list command. First using the ViPR URL then just a normal Hadoop list: hadoop fs -ls viprfs://ObjectBucket.ObjectTenant.HSK:9040/ hadoop fs –ls / Note that on the first connection, the /tmp and /user directory is created. |
Comments