Personal tools
You are here: Home Documentation Hadoop Hadoop Upgrade
Document Actions

Hadoop Upgrade

by admin last modified 2008-11-25 06:35

Instructions for performing a Hadoop upgrade

Instructions:

This page is adopted from rev 9 of http://wiki.apache.org/hadoop/Hadoop_Upgrade.

Cluster prep:

  1. Set the condor clients to queueing.
  2. Turn off the nagios mount checks and any other Hadoop-related checks
  3. Turn off RSV

Hadoop Upgrade

  1. Stop map-reduce cluster(s)

    stop-mapred.sh

    and all client applications running on the DFS cluster.

  2. Run fsck command:

    hadoop fsck / -file -blocks -locations > dfs-v-old-fsck-1.log

    Fix DFS to the point there are no errors. The resulting file will contain complete block map of the file system.

  3. Run lsr:

    hadoop dfs -lsr / > dfs-v-old-lsf-1.log

    The resulting file will contain complete namespace of the file system.

  4. Run report to create a list of data nodes participating in the cluster.
    hadoop dfsadmin -report > dfs-v-old-report-1.log
  5. Optionally, stop and restart DFS cluster, in order to create an up-to-date namespace checkpoint of the old version.

    stop-dfs.sh
    start-dfs.sh
  6. Optionally, repeat 3, 4, 5, and compare the results with the previous run to ensure the state of the file system remained unchanged.

  7. Copy the following checkpoint files into a backup directory:

    dfs.name.dir/edits
    dfs.name.dir/image/fsimage

  8. Stop DFS cluster.

    stop-dfs.sh

    Verify that DFS has really stopped, and there are no DataNode processes running on any nodes.

  9. Backup the pacman install before doing anything.  Source the pacman environment.
  10. Install new version of Hadoop software.  For each software install, perform:

    pacman -update-check
    pacman -update http://t2.unl.edu/store/cache:Hadoop
  11. Steps (9) and (10) should be run on
    • Namenode.
    • osg-wn-source for the worker nodes.
    • Manually on all the GridFTP nodes (see Hadoop Nebraska for the list of all GridFTP servers).  These can be done first to test out the Pacman packaging.  You should be able to turn off Ganglia (which should cause SRM servers to avoid the individual node), upgrade, then turn Ganglia back on.
    • SRM servers srm.unl.edu and dcache07.unl.edu.
  12. Start name node only:

    hadoop-daemon.sh --config $HADOOP_HOME/conf start namenode -upgrade

    This should convert the checkpoint to the new version format.

  13. Optionally, run lsr:

    hadoop dfs -lsr / > dfs-v-new-lsf-0.log

    and compare with dfs-v-old-lsr-1.log.

  14. Start DFS cluster.

    start-dfs.sh
  15. Run report:

    hadoop dfsadmin -report > dfs-v-new-report-1.log

    and compare with dfs-v-old-report-1.log to ensure all data nodes previously belonging to the cluster are up and running.

  16. Run lsr:
    hadoop dfs -lsr / > dfs-v-new-lsr-1.log
    and compare with dfs-v-old-lsr-1.log. These files should be identical unless the format of lsr reporting or the data structures have changed in the new version.
  17. Run fsck:
    hadoop fsck / -files -blocks -locations > dfs-v-new-fsck-1.log
    and compare with dfs-v-old-fsck-1.log. These files should be identical, unless the fsck reporting format has changed in the new version.
  18. Start map-reduce cluster
    start-mapred.sh
  19. Done!

Powered by Plone, the Open Source Content Management System