Personal tools
You are here: Home Documentation Hadoop BestMan HDFS
Document Actions

BestMan HDFS

by admin last modified 2009-03-17 05:32

Install the BeStMan SRM server on top of the HDFS file system using FUSE.

Install Pre-Req

We assume that you have already following the instructions here for installing GridFTP on top of HDFS.  Let $VDT_LOCATION be the location of your VDT install.  Initialize your environment:

source $VDT_LOCATION/setup.sh
source /opt/pacman/pacman-3.26/setup.sh
Replace the above lines with those for your site.

Install Hadoop, FUSE, and a GridFTP server

We need a valid install of Hadoop, mounted through FUSE.  A GridFTP-HDFS server must also be installed, but this does not need to be on the same server as the BestMan server.  The installation instructions are:
A larger install will prefer to have their GridFTP and BeStMan servers installed on separate hosts.

Install BeStMan

Install the latest BeStMan from the OSG with pacman:
pacman -get http://t2.unl.edu/store/cache:Hadoop-Bestman
If you are using GUMS, make sure you have set VDT_GUMS_HOST to the hostname of the GUMS server.  This will install the BeStMan server with the Ganglia load-balancing plugin.

Configure BeStMan

The BeStMan configuration file is in $VDT_LOCATION/bestman/conf/bestman.rc.  You'll probably want to take a look over that file.  The following configurations are interesting:
  • supportedProtocolList: This should be a semi-colon separated list of accessible servers; should include the protocol, hostname, and port.  Example: gsiftp://red-gridftp1.unl.edu:5000;gsiftp://dcache-s01.unl.edu:5000
  • noSudoOnLs (default: True): This defaults to True; set to False if you want BeStMan to use sudo to perform ls.  Use this if user daemon can't list a portion of your namespace.
  • staticTokenList: A list of staticly-configured space tokens.  See the in-file help text for this information.
  • GUMSserviceURL: Check this to make sure it points to the correct GridFTP.
  • accessFileSysViaSudo: Set this to true or all your srmMkdir commands will result in directories owned by root.
Add the following lines to /etc/sudoers so that BeStMan can manipulate the filesystem namespace.
    Cmnd_Alias SRM_CMD = /bin/rm, /bin/mkdir, /bin/rmdir, /bin/mv, /bin/cp, /bin/ls
Runas_Alias SRM_USR = ALL, !root
daemon ALL=(SRM_USR) NOPASSWD: SRM_CMD

# NOTE: on RHEL 5+ systems you need to make sure the following is commented out
#Defaults requiretty

Make sure you actually commented out the "requiretty" line if it is present on your system.

Customizations for BeStMan

At Nebraska, we have the following customizations:
  • JMX monitoring.  This allows us to closely monitor memory usage and thread activity in a much more Java-centric manner than Ganglia monitoring.
    • To enable this, we add the following command-line arguments to the call to java in $VDT_LOCATION/bestman/sbin/bestman.server:
      -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8004
      Note that this allows anyone with access to port 8004 access to read out your JVM internal metrics.
  • Ganglia-based selection of GridFTP servers.  The default algorithm for GridFTP server selection is round-robin through a static list.  We have a protocol plugin for BeStMan that randomly selects the servers from the list with a probability distribution function built from the load and memory usage as reported by Ganglia.  To enable this, add the following line to $VDT_LOCATION/bestman/conf/bestman.rc:
    protocolSelectionPolicy=class=edu.unl.rcf.BestmanGridftpSelector.BestmanGridftp&jarFile=UNLGangliaBestman.jar&name=gsiftp
    Then, add the following command-line arguments to the call to java in $VDT_LOCATION/bestman/sbin/bestman.server:
    -Dedu.unl.rcf.BestmanGridftpSelector.host=localhost -Dedu.unl.rcf.BestmanGridftpSelector.port=8649
    Add this after the ${MAXHEAP} parameter.

Powered by Plone, the Open Source Content Management System