BestMan HDFS
Install the BeStMan SRM server on top of the HDFS file system using FUSE.
Install Pre-Req
We assume that you have already following the instructions here for installing GridFTP on top of HDFS. Let $VDT_LOCATION be the location of your VDT install. Initialize your environment:
source $VDT_LOCATION/setup.shReplace the above lines with those for your site.
source /opt/pacman/pacman-3.26/setup.sh
Install Hadoop, FUSE, and a GridFTP server
We need a valid install of Hadoop, mounted through FUSE. A GridFTP-HDFS server must also be installed, but this does not need to be on the same server as the BestMan server. The installation instructions are:A larger install will prefer to have their GridFTP and BeStMan servers installed on separate hosts.
Install BeStMan
Install the latest BeStMan from the OSG with pacman:pacman -get http://t2.unl.edu/store/cache:Hadoop-BestmanIf you are using GUMS, make sure you have set VDT_GUMS_HOST to the hostname of the GUMS server. This will install the BeStMan server with the Ganglia load-balancing plugin.
Configure BeStMan
The BeStMan configuration file is in $VDT_LOCATION/bestman/conf/bestman.rc. You'll probably want to take a look over that file. The following configurations are interesting:- supportedProtocolList: This should be a semi-colon separated list of accessible servers; should include the protocol, hostname, and port. Example: gsiftp://red-gridftp1.unl.edu:5000;gsiftp://dcache-s01.unl.edu:5000
- noSudoOnLs (default: True): This defaults to True; set to False if you want BeStMan to use sudo to perform ls. Use this if user daemon can't list a portion of your namespace.
- staticTokenList: A list of staticly-configured space tokens. See the in-file help text for this information.
- GUMSserviceURL: Check this to make sure it points to the correct GridFTP.
- accessFileSysViaSudo: Set this to true or all your srmMkdir commands will result in directories owned by root.
Cmnd_Alias SRM_CMD = /bin/rm, /bin/mkdir, /bin/rmdir, /bin/mv, /bin/cp, /bin/lsMake sure you actually commented out the "requiretty" line if it is present on your system.
Runas_Alias SRM_USR = ALL, !root
daemon ALL=(SRM_USR) NOPASSWD: SRM_CMD
# NOTE: on RHEL 5+ systems you need to make sure the following is commented out
#Defaults requiretty
Customizations for BeStMan
At Nebraska, we have the following customizations:- JMX monitoring. This allows us to closely monitor memory usage and thread activity in a much more Java-centric manner than Ganglia monitoring.
- To enable this, we add the following command-line arguments to the call to java in $VDT_LOCATION/bestman/sbin/bestman.server:
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8004
Note that this allows anyone with access to port 8004 access to read out your JVM internal metrics. - Ganglia-based selection of GridFTP servers. The default algorithm for GridFTP server selection is round-robin through a static list. We have a protocol plugin for BeStMan that randomly selects the servers from the list with a probability distribution function built from the load and memory usage as reported by Ganglia. To enable this, add the following line to $VDT_LOCATION/bestman/conf/bestman.rc:
protocolSelectionPolicy=class=edu.unl.rcf.BestmanGridftpSelector.BestmanGridftp&jarFile=UNLGangliaBestman.jar&name=gsiftp
Then, add the following command-line arguments to the call to java in $VDT_LOCATION/bestman/sbin/bestman.server:-Dedu.unl.rcf.BestmanGridftpSelector.host=localhost -Dedu.unl.rcf.BestmanGridftpSelector.port=8649
Add this after the ${MAXHEAP} parameter.