EMC Isilon is a scale out NAS solution that supports CIF, NFS and native HDFS. There is a ton of information out there about why Isilon is cool but for
HDFS it really address some issues with the “commodity model”. By having a central repository for our HDFS data, we don’t need to have multiple distributed
copies of data as a traditional HDFS model does. We don’t have to use HDFS tools to import data into the cluster, we can just use NFS of CIFS, or perhaps
the data is already on our Isilon cluster. We can do things like replicate data, dedupe (hadoop dedupe), or allow easy access for other applications to the
data without having to import/export. Bigger hadoop clusters spend much of their life just ingesting data, and Isilon can seriously trim that time down!
Add in that it’s a free license and simple setup and there seesm to be no reason not to use it!!!
This blog will show you how to configure you EMC Isilon array for use by HDFS in hadoop environments.

Logon to your Isilon cluster
|
|

To add HDFS license click the help button in the top right corner and select “About This Cluster”
HDFS is a Free license avalaible from Isilon
|
|

Click Activate License and add code. After you should see the HDFS module listed
NOTE: HDFS is a free license. You can obtain your code from your Isilon sales team.
|
|

Next click on File System Management and File System Explorer. In the right panel highlight the root /ifs
By default HDFS has access to the root directory. You can change this at the command prompt. To create a specific directory for HDFS Click the Add Directory button in the middle of the page. You can also share this directory as an NFS export or CIFS share for easy ingestion of data for Hadoop
|
|

Enter the name “Hadoop”
Give a user rights to the directory
Click Submit
NOTE: You can create a specific user to access the HDFS directory or you can use root.
The command to create a user form the CLI is
isi auth users create --name="user1"
|
|

You will now see the Hadoop directory in the root
|
|

Create a NFS export of this directory using Unix Sharing under the Protocols tab.
Click “Add Export”
|
|


Enter information for the share
Enter the path to the directory recently created
At the bottom of the page click Save
|
|
|
|

Verify successful creation of export
|
|

SSH into the Isilon cluster. Run the command:
isi hdfs
This verifies that HDFS is running and shows the root directory that HDFS will use
|
|

Change the HDFS root to /ifs/Hadoop by running this command
isi hdfs –root-path=/ifs/Hadoop
Next, run:
isi hdfs
Executing this command again verifies results
|
|

Another option is to change block size. Block sizes can be up to one gig. The type of data you will be analyzing with hadoop will determine the block size to use. This example shows changeing the block size to 1 GB
isi hdfs --block-size=1GB
Next run isi hdfs
to verify
Most distributions use the user mapred for jobtraker to access HDFS. To create that user and add him to the wheel group follow this step.
SSH into the isilon cluster

The mapred user needs temp space on HDFS when map jobs are run. When I tested Apache, Horonworks and Pivotal the /tmp folder is automatically created the first time a job is run and the mapred user is given permissions. For Cloudera we need to setup the /tmp structure and give it permissions
Below is a screen shot showing the setup done from the CLI of an Isilon cluster:
Note: /ifs/Hadoop is the HDFS root.

|
|
Last picture can use JUST two command :
# mkdir -pv /ifs/Hadoop/tmp/hadoop-mapred/mapred/{staging,system}
# chown -R mapred:wheel /ifs/Hadoop/tmp
Posted by: higkoo | 03/02/2014 at 02:30 AM
How does Isilon addresses copying petabyte of data from storage to data nodes for processing? And how it adresses doing this several times a day? Or doing this for multiple jobs in the same time? This data has to go through network?
Posted by: Nesa | 11/07/2014 at 03:53 AM