Personal tools
You are here: Home Documentation Hadoop Hadoop Nebraska
Document Actions

Hadoop Nebraska

by admin last modified 2008-11-05 08:44

The configuration and file locations used by Hadoop at Nebraska

Hadoop locations:

  • Hadoop core (version 0.18.1): /opt/osg/osg-100/hadoop
  • Hadoop GridFTP server: /opt/osg/osg-100/gridftp_hdfs
    • Both are installed in the same pacman install as the OSG.
    • Hence, there's nothing additional that needs to be done to install things on a worker node.
  • Hadoop files (data files and the name information) are kept in, on the worker nodes,
    • /scratch/hadoop-*
  • Hadoop logs:
    • /scratch/hadoop-brian/logs/

Hadoop Servers

Unless otherwise noted, HADOOP_HOME=/opt/osg/osg-100
  • hadoop-name: Primary nameserver.
  • node003: Primary Map-Reduce job server (unused)
  • Contents of /opt/osg/osg-100/hadoop/conf/slaves: The data nodes currently being used
  • dcache07 and srm: SRM server.
    • To start: /opt/bestman/sbin/SXXbestman start
    • To stop: /opt/bestman/sbin/SXXbestman stop
    • Endpoint: srm://dcache07.unl.edu:8443/srm/v2/server
    • Example SURL: srm://dcache07.unl.edu:8443/srm/v2/server?SFN=/mnt/hadoop/user/brian/testfile
    • For now, fuse must be mounted by user uscms01 (but still with the option -o-oallow_other).
    • HADOOP_HOME is /opt/osg/osg-100-test/hadoop
  • GridFTP servers (port 5000, using xinetd):
    • dcache06
    • dcache08
    • dcache-s01
    • dcache-s07
    • dcache-s08
    • dcache-s05
    • dcache-s10
    • dcache-s09
    • dcache-fc01
    • dcache04
    • dcache05
    • dcache03
    • dcache01
  • Not GridFTP servers
    • dcache-s11
    • dcache-head
    • dcache-pnfs
    • srm
    • dcache07
    • dcache09
    • dcache10

Hadoop Control

Start/stop Hadoop:
  • Log in to hadoop-name as root
  • Source /opt/osg/osg-100/setup.sh
  • To start everything, run start-all.sh
  • To stop everything, run stop-all.sh

Mount the file system (must have the FUSE kernel modules installed!)
  • Switch to root.
    • Make /mnt/hadoop if it does not already exist.
  • source /opt/osg/osg-100/setup.sh
  • fuse_dfs --server=hadoop-name --port=9000 /mnt/hadoop -o-oallow_other
  • You will see the following output:
    didn't recognize /mnt/hadoop
    didn't recognize -oallow_other
    This is harmless, as long as there aren't any other messages.
  • Check the mount:
    • ls /mnt/hadoop/
    • cat /mnt/hadoop/hello_world

Hadoop Web Interfaces

Via port forwarding, you can view the Hadoop cluster's webpage and job tracker webpage using the following URLs:
If you are connected directly to the cluster (i.e., running the browser from within the private network), you may additionally browse the filesystem and individual data node's logs.

Hadoop User Commands

  • By default, a user will want to interact with the "hadoop fs" set of commands.
  • Handy pre-installed aliases:
    • hls [hadoop path]: Lists the directory on HDFS pointed to by [hadoop path]
    • hget [hadoop path] [local path]: Copy file from Hadoop into the local filesystem
    • hput [local path] [hadoop path]: Copy file from local file system into Hadoop
    • hmkdir [hadoop path]: Make a directory in Hadoop
    • hdu [hadoop path]: Perform a DU on a specific Hadoop path
    • hrm [hadoop path]: Remove [hadoop path] from the Hadoop FS.
    • hstat [hadoop path]: Perform a stat on [hadoop path]; returns last modification time


Powered by Plone, the Open Source Content Management System