Personal tools
You are here: Home Documentation Hadoop Building a Hadoop Release
Document Actions

Building a Hadoop Release

by admin last modified 2009-03-18 12:45

How to build a Hadoop release so it can be installed at Nebraska

Note: You DO NOT need to use this page if you are only deploying Hadoop, not building a new one for the Nebraska system.

At Nebraska, we usually build on SL4 on node001 in the directory /opt/osg/osg-100/hadoop; on SL5, build on dcache06 in /opt/hadoop-build.

  1. Download and unpack the Hadoop source code
  2. Source an existing VDT install which includes Ant and JDK (do a "ls" on the $VDT_LOCATION and make sure there are "ant" and "jdk1.5" directories).
  3. Set the following variables:
    1. HADOOP_HOME=freshly unpacked source
    2. CLASSPATH variables:
      export CLASSPATH=$HADOOP_HOME/hadoop-0.19.0-core.jar:$HADOOP_HOME/lib/commons-logging-1.0.4.jar:$HADOOP_HOME/lib/commons-logging-api-1.0.4.jar:$HADOOP_HOME/lib/log4j-1.2.15.jar:$CLASSPATH
    3. Library variables:
      export LD_LIBRARY_PATH=$HADOOP_HOME/build/libhdfs:$VDT_LOCATION/jdk1.5/jre/lib/amd64/server:$LD_LIBRARY_PATH
    4. Path variables:
      export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/src/contrib/fuse-dfs/src:$PATH
  4. Patch Hadoop as necessary.  The patches we use are listed below.
  5. (Only on 64-bit nodes).  Edit $HADOOP_HOME/src/c++/libhdfs/Makefile; replace all occurrences of -m32 with -m64.
  6. Export misc. build variables:
    export PERMS=1
    export FUSE_HOME=$VDT_LOCATION/fuse
    Otherwise, fuse-dfs will not build.
  7. Build Hadoop:
    ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1 jar
    This requires automake >= 1.9.5, which IS NOT AVAILABLE on RHEL4 (for Nebraska builders: this is located in /usr on node001.  Build there.).  I had to download and install it from source, then add /usr/local/bin to the PATH and /usr/local/lib to the LD_LIBRARY_PATH.  You can test your automake version with "automake --version".
  8. Fix link, build/libhdfs/libhdfs.so, to not be absolute.  I 'cd' to the directory $HADOOP_HOME/build/libhdfs, rm the existing libhdfs.so, then perform 'ln -s libhdfs.so.1 libhdfs.so'
  9. cd $HADOOP_HOME/..   Then, make a copy of the entire directory, hadoop-0.x.x/, to hadoop/.  Finally, issue the tar command:
    tar zcf hadoop-0.x.x-RHELy-zzz.tar.gz hadoop/
    Replace x.x with the Hadoop version number; replace y with the RHEL release (4 or 5), and zzz with the platform (i686 or x86_64).
  10. Copy the resulting tarball into t2.unl.edu:/var/www/html/cache.

Patches we apply to Hadoop

  • Patch to add offset logging support in datanodes: https://issues.apache.org/jira/secure/attachment/12400028/clienttrace.patch.
  • Patch to fix a cache corruption in fuse-dfs, as referenced here: https://issues.apache.org/jira/browse/HADOOP-4298.  FIXED IN 0.19.0
  • Mutex lock on read patch for fuse-dfs.  No JIRA reference yet.  FIXED IN 0.19.0
  • Patch for a java error during fsck under certain conditions: https://issues.apache.org/jira/browse/HADOOP-4351  FIXED IN 0.19.0
  • Patch for averages in Ganglia metrics: https://issues.apache.org/jira/browse/HADOOP-4369.  Wrapped into below patch
  • Patch for Ganglia NPE: https://issues.apache.org/jira/browse/HADOOP-3422.  Wrapped into patch below
  • Patch for Ganglia 3.1 support: https://issues.apache.org/jira/browse/HADOOP-4675.
    • Quickest way to patch:
      cd $HADOOP_HOME
      cp src/core/org/apache/hadoop/metrics/ganglia/GangliaContext.java src/core/org/apache/hadoop/metrics/ganglia/GangliaContext31.java
      curl -k https://issues.apache.org/jira/secure/attachment/12394647/hadoop-4675-3.patch | patch -p 0
      Contact Brian if patch does not succeed.
  • Patch for FUSE-DFS "df": https://issues.apache.org/jira/browse/HADOOP-4368.  Patch available for Hadoop 0.19.0 and 0.20.0
    • Quickest way to patch 0.19.0 (will fail on 0.20.0):
      cd $HADOOP_HOME
      curl -k https://issues.apache.org/jira/secure/attachment/12395292/fuse_statfs.patch | patch -p 0
  • Patch for FUSE-DFS to prevent infinite loop on read error (Patch available for Hadoop 0.19.0 and 0.20.0).  Fixed in HADOOP-4616
    • Quickest way to patch 0.19.0 (will fail on 0.20.0):
      cd $HADOOP_HOME
      curl -k http://issues.apache.org/jira/secure/attachment/12394123/HADOOP-4616_0.19.txt | patch -p 0
  • Patch for FUSE-DFS groups: https://issues.apache.org/jira/browse/HADOOP-4727. Hadoop 0.19.0 only
    • Quickest way to patch:
      cd $HADOOP_HOME
      curl -k https://issues.apache.org/jira/secure/attachment/12394700/hadoop-4727.patch | patch -F5 -p 0

Building GridFTP-HDFS

The build machines used are similar to before.  For dcache07, the build directory is /opt/gridftp-hdfs-build.

  1. Pre-requisites:
    1. Valid Hadoop, preferably installed via the UNL pacman cache.
    2. subversion RPM package providing the standard svn client.
  2. Use pacman to pull in the Globus GridFTP SDK:
    pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-Base-Data-Server
    pacman -get http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-Base-SDK
  3. Source the VDT's setup.sh.
    • Make sure $VDT_LOCATION exists in the following steps!
  4. Check out the GridFTP-HDFS sources:
    svn co svn://t2.unl.edu/brian/gridftp_hdfs
  5. Make a backup copy of the makefiles:
    cp makefile_header makefile_header.bkp
    cp Makefile Makefile.bkp
    This is so the original makefiles can be preserved during the next step.
  6. Replace MAGIC_VDT_LOCATION with the actual contents of $VDT_LOCATION.
    sed -i s:MAGIC_VDT_LOCATION:$VDT_LOCATION:g Makefile
    sed -i s:MAGIC_VDT_LOCATION:$VDT_LOCATION:g makefile_header
  7. Run make to build the GridFTP module.
    make
  8. Copy the original makefiles back:
    cp Makefile.bkp Makefile
    cp makefile_header.bkp makefile_header
  9. Create a tarball, and place it in the pacman cache.


Powered by Plone, the Open Source Content Management System