Hadoop Upgrade
Instructions for performing a Hadoop upgrade
Instructions:
This page is adopted from rev 9 of http://wiki.apache.org/hadoop/Hadoop_Upgrade.Cluster prep:
- Set the condor clients to queueing.
- Turn off the nagios mount checks and any other Hadoop-related checks
- Turn off RSV
Hadoop Upgrade
Stop map-reduce cluster(s)
stop-mapred.sh
and all client applications running on the DFS cluster.
Run fsck command:
hadoop fsck / -file -blocks -locations > dfs-v-old-fsck-1.log
Fix DFS to the point there are no errors. The resulting file will contain complete block map of the file system.
Run lsr:
hadoop dfs -lsr / > dfs-v-old-lsf-1.log
The resulting file will contain complete namespace of the file system.
-
Run report to create a list of data nodes participating in the cluster.
hadoop dfsadmin -report > dfs-v-old-report-1.log
Optionally, stop and restart DFS cluster, in order to create an up-to-date namespace checkpoint of the old version.
stop-dfs.sh
start-dfs.shOptionally, repeat 3, 4, 5, and compare the results with the previous run to ensure the state of the file system remained unchanged.
Copy the following checkpoint files into a backup directory:
dfs.name.dir/edits
dfs.name.dir/image/fsimageStop DFS cluster.
stop-dfs.sh
Verify that DFS has really stopped, and there are no DataNode processes running on any nodes.
- Backup the pacman install before doing anything. Source the pacman environment.
Install new version of Hadoop software. For each software install, perform:
pacman -update-check
pacman -update http://t2.unl.edu/store/cache:Hadoop- Steps (9) and (10) should be run on
- Namenode.
- osg-wn-source for the worker nodes.
- Manually on all the GridFTP nodes (see Hadoop Nebraska for the list of all GridFTP servers). These can be done first to test out the Pacman packaging. You should be able to turn off Ganglia (which should cause SRM servers to avoid the individual node), upgrade, then turn Ganglia back on.
- SRM servers srm.unl.edu and dcache07.unl.edu.
Start name node only:
hadoop-daemon.sh --config $HADOOP_HOME/conf start namenode -upgrade
This should convert the checkpoint to the new version format.
Optionally, run lsr:
hadoop dfs -lsr / > dfs-v-new-lsf-0.log
and compare with dfs-v-old-lsr-1.log.
Start DFS cluster.
start-dfs.sh
Run report:
hadoop dfsadmin -report > dfs-v-new-report-1.log
and compare with dfs-v-old-report-1.log to ensure all data nodes previously belonging to the cluster are up and running.
-
Run lsr:
hadoop dfs -lsr / > dfs-v-new-lsr-1.log
and compare with dfs-v-old-lsr-1.log. These files should be identical unless the format of lsr reporting or the data structures have changed in the new version. -
Run fsck:
hadoop fsck / -files -blocks -locations > dfs-v-new-fsck-1.log
and compare with dfs-v-old-fsck-1.log. These files should be identical, unless the fsck reporting format has changed in the new version. -
Start map-reduce cluster
start-mapred.sh
- Done!