Personal tools
You are here: Home Documentation dCache dCacheNebraska Scripts
Document Actions

dCacheNebraska Scripts

by admin last modified 2008-05-19 03:03

Nebraska uses several scripts to help analyze the state of its cluster. We provide them in a package named dCacheNebraska, which is available for wider usage.

Installation - Current Method

To install dCacheNebraska, follow the directions on this page.  This will guide you through adding the Nebraska YUM repository to your computer, then installing the commands referenced below via yum.

Installation - OLD

dCacheNebraska is available solely through the Nebraska subversion repository.

That said, installation simply depends on downloading the files from subversion, having a modern version of python, and a working ssh.  To checkout, run this command:
svn co svn://t2.unl.edu/brian/dCacheNebraska
If you do not have the svn binary and use SL4, you may install it with yum install svn.  If you use SL3, you may have to install subversion from source.

You will find the desired scripts in the scripts/ subdirectory.

Configuration

The dCacheNebraska scripts utilize the PhEDEx DBParam format to connect to your site's dCache install.  Create a file called DBParam and add an entry which looks like this:
Section           dCache/NEBRASKA
Interface dCache
Username admin
Password ****
Port 22223
AdminHost dcache-head.unl.edu
SRMHost srm.unl.edu
Cipher blowfish
You may omit the Password line if ssh keys have been configured for your site.  If you have multiple dCache instances, you may add multiple sections to the same DBParam file.

All dCacheNebraska scripts depend on being able to find this configuration file.  They look for the file DBParam in $CWD, $HOME, then /etc.  When found, they automatically look to the first section unless told otherwise.

Alternately, they all can take a command line argument of this format:
-config <file>:<section>
For the DBParam example above, assuming it is located in the $HOME directory, the command line argument would be:
-config $HOME/DBParam:dCache/NEBRASKA
Note how <section> was replaced with the value of Section from the DBParam file.

Pool Cleaner

The pool_cleaner.py script analyzes all the pools in the dCache system, looking for physical files on pools which are not in PNFS.  It does not look for logical files in PNFS which are not physically present in dCache.

The pool cleaner script makes all its queries through the dCache admin interface.  While this is beneficial in its simplicity, it can be very slow to verify the logical existence of files.  If you have a large dCache install, you might be interested in altering it to take advantage of a mounted /pnfs directory.  This will result in a large speedup.

Running the script is simple:
pool_cleaner.py -config DBParam:dCache/NEBRASKA
There are two options:
  • -quiet: This suppresses most of the output
  • -delete: This triggers the actual deletion of the files on the pool nodes.  Potentially dangerous for obvious reasons.  We are not responsible if this script eats your install!
  • -safefile: A file which will never be deleted.  If the script thinks this file is not present, it decides that the system is in a bad state and immediately aborts.
  • -maxdelete: A float between 0 and 1; the max percentage of files which may be deleted on any given run.  This is to protect it from mistakenly deleting all files.
  • -checktrash: If the script is running on the PNFS server, it can check the trash files of the PNFS daemon to see if the file is already scheduled for deletion.
  • -badlist: Instead of deleting files, create a file with a list of all the "bad" files and their locations.

SRM Transfer Rate Query

The query_rate.py looks up the rate of data transfer for a single specific srmCopy transfer.  This script does not work for sites with pools behind NATs or with srmPut or srmGet transfers.

To run the script, use this command:
query_rate -config DBParam:dCache/NEBRASKA -url <dest URL>
Alternately, in lieu of passing SRM URLs one-by-one, you may also pass an entire copyjob (although this option has not been heavily tested). 

The query_rate.py script can be used programmatically to insure that a single transfer is making sufficient progress.  Here are the relevant options:
  • -quiet: Suppress unnecessary output.
  • -timeout: If the rate does not meet the desired criteria, return nonzero exit code.
  • -grace <minutes>: The grace period in minutes during which the file won't timeout.
  • -max <minutes>: Maximum amount of time a transfer should take.
  • -rate <KB/s>: Minimum rate to be tolerated after grace period.

Query All GridFTP Transfer Rates

The query_all_rates looks up the rates of all GridFTP-based transfers it can find.  This should work for sites with pools behind NATs.

To run the script, use this command:
query_all_rates
from the scripts/ directory.

Restore Cleaner

The restore_cleaner script analyzes and potentially retries files which are stuck at the PoolManager.  There are many different reasons that this may happen; for tapeless sites, it is most often due to the fact that there is no on-disk replica for a logical file.

To run the script, use this command:
restore_cleaner -config DBParam:dCache/NEBRASKA -analyze
The analyze flag prints out the results of various analyses of error messages.  Organizes the problems at your site better than the raw list of problem files does.

In addition, you may pass the following options:
  • -retry_all: This retries all stuck transfers
  • (In the future, we will add the capability to retry transfers which fail specific analyses).

Space Usage Analyzer

The space_usage script is an external way to measure usage of pools in dCache.  It prints out the total space used, including replicas, and the distribution of disk space utilized throughout the namespace.

Here's an example usage:
space_usage -config DBParam:dCache/NEBRASKA -base /pnfs/unl.edu/data4 -count_replicas

Required Arguments:

  • -config <filename>:<section>: Config file listing parameters needed to connect to dCache.
  • -base <directory>: The base directory to start in.

Optional Arguments:

  • -threshold <%>: Percentage of disk space used a directory must contain in order to be displayed.  Defaults to 5%.
  • -count_replicas: Account for the space used by multiple replicas of the same file (uses the admin interface).
   

Pool Retire Script

The retire_pool script is an external way to safely remove a pool from dCache.  It takes an inventory of all the files in all pools, and determines all files which are unique on the pool to be retired.  It then starts P2P transfers of just the unique files.  If any in-process P2P transfers do not appear to be valid (too-slow or does not appear to have a valid source/destination), they will be cancelled.  Valid in-process P2P transfers will be accounted for and multiple P2P transfers for the same file will not be started.

dCache admin interface access and a PNFS mount are required.

One of the design goals of this script is that the last line of output will inform the admin whether or not it is safe to turn the pool off.  If the pool is not safe to turn off, the approximate number of in-progress plus started transfers will be printed out.

Here's an example usage:
retire_pool <poolname> -dryrun
With the -dryrun flag, no files will be copied.  By default, this script will use the pfm_config file (if it exists) as a configuration file; the one which comes with the dCacheNebraska distribution contains sane defaults.

PNFS ID Lookup

The dcache_pnfs_idfinder script uses the admin interface to look up the PNFS ID corresponding to a particular file.  This does not require PNFS to be mounted locally, but it does require dCache admin interface access.

Here is an example usage:
dcache_pnfs_idfinder /pnfs/unl.edu/data4/test/testfile.unl.3
The script must be able to find the DBParam config file or have it passed via command line.

Path Lookup

The dcache_pnfs_pathfinder script uses the admin interface to look up the filename associated with a particular PNFS ID.  This does not require PNFS to be mounted locally.

Here is an example usage:
dcache_pnfs_pathfinder 0004000000000000000AA5C8
The script must be able to find the DBParam config file or have it passed via command line.

Pool Lookup

The dcache_path2server script uses the admin interface to find pools containing replicas of a specified file.  This does not require PNFS to be mounted locally.  The PnfsManager will be queried for cache info, then the presence of the file will be confirmed with the individual pools.


Here are some example use cases:
dcache_path2server 0004000000000000000AA5C8
dcache_path2server /pnfs/unl.edu/data4/test/testfile.unl.3
dcache_path2server 0004000000000000000AA5C8 -n
Note either the PNFS ID or path is an acceptable input.
The -n flag instructs the script to not confirm with the pools.  Often the -n flag can be used to see where a file has disappeared from.

Authentication Checker

The auth_check utility is designed to let site admins check authentication at their site.  It requires dCache 1.8 and must be run on the SRM node.  auth_check will search through the credentials which have recently used the SRM server and try to use these credentials to determine if any GridFTP nodes fail authentication.  This allows the site admin to test the GridFTP server with the user's proxy certificate.

If auth_check is given an argument, it will only use certificates matching that argument; if it is not passed an argument, it will check all certificates.

Here is an example usage:
$ auth_check Xin
Checking authentication for /DC=org/DC=doegrids/OU=People/CN=Xin Zhao 102397/CN=1151626830/CN=1381624632
Door failed: srm.unl.edu:2811
If given the -v flag, additional information will be printed out.  The auth_check script requires access to the admin interface, meaning the DBParam file must be configured.


Powered by Plone, the Open Source Content Management System