Hadoop Nebraska
The configuration and file locations used by Hadoop at Nebraska
Hadoop locations:
- Hadoop core (version 0.18.1): /opt/osg/osg-100/hadoop
- Hadoop GridFTP server: /opt/osg/osg-100/gridftp_hdfs
- Both are installed in the same pacman install as the OSG.
- Hence, there's nothing additional that needs to be done to install things on a worker node.
- Hadoop files (data files and the name information) are kept in, on the worker nodes,
- /scratch/hadoop-*
- Hadoop logs:
- /scratch/hadoop-brian/logs/
Hadoop Servers
Unless otherwise noted, HADOOP_HOME=/opt/osg/osg-100- hadoop-name: Primary nameserver.
- node003: Primary Map-Reduce job server (unused)
- Contents of /opt/osg/osg-100/hadoop/conf/slaves: The data nodes currently being used
- dcache07 and srm: SRM server.
- To start: /opt/bestman/sbin/SXXbestman start
- To stop: /opt/bestman/sbin/SXXbestman stop
- Endpoint: srm://dcache07.unl.edu:8443/srm/v2/server
- Example SURL: srm://dcache07.unl.edu:8443/srm/v2/server?SFN=/mnt/hadoop/user/brian/testfile
- For now, fuse must be mounted by user uscms01 (but still with the option -o-oallow_other).
- HADOOP_HOME is /opt/osg/osg-100-test/hadoop
- GridFTP servers (port 5000, using xinetd):
- dcache06
- dcache08
- dcache-s01
- dcache-s07
- dcache-s08
- dcache-s05
- dcache-s10
- dcache-s09
- dcache-fc01
- dcache04
- dcache05
- dcache03
- dcache01
- Not GridFTP servers
- dcache-s11
- dcache-head
- dcache-pnfs
- srm
- dcache07
- dcache09
- dcache10
Hadoop Control
Start/stop Hadoop:- Log in to hadoop-name as root
- Source /opt/osg/osg-100/setup.sh
- To start everything, run start-all.sh
- To stop everything, run stop-all.sh
Mount the file system (must have the FUSE kernel modules installed!)
- Switch to root.
- Make /mnt/hadoop if it does not already exist.
- source /opt/osg/osg-100/setup.sh
- fuse_dfs --server=hadoop-name --port=9000 /mnt/hadoop -o-oallow_other
- You will see the following output:
didn't recognize /mnt/hadoop
This is harmless, as long as there aren't any other messages.
didn't recognize -oallow_other - Check the mount:
- ls /mnt/hadoop/
- cat /mnt/hadoop/hello_world
Hadoop Web Interfaces
Via port forwarding, you can view the Hadoop cluster's webpage and job tracker webpage using the following URLs:If you are connected directly to the cluster (i.e., running the browser from within the private network), you may additionally browse the filesystem and individual data node's logs.
Hadoop User Commands
- By default, a user will want to interact with the "hadoop fs" set of commands.
- Handy pre-installed aliases:
- hls [hadoop path]: Lists the directory on HDFS pointed to by [hadoop path]
- hget [hadoop path] [local path]: Copy file from Hadoop into the local filesystem
- hput [local path] [hadoop path]: Copy file from local file system into Hadoop
- hmkdir [hadoop path]: Make a directory in Hadoop
- hdu [hadoop path]: Perform a DU on a specific Hadoop path
- hrm [hadoop path]: Remove [hadoop path] from the Hadoop FS.
- hstat [hadoop path]: Perform a stat on [hadoop path]; returns last modification time