Personal tools
You are here: Home Documentation dCache dCache PFM Explained
Document Actions

dCache PFM Explained

by admin last modified 2007-08-02 18:18

This document explains the algorithm that the dCache Physical File Manager uses.

Main Algorithm

Here is an outline of the execution steps of the dCache PFM:

  1. Determine the base of the file system.  The base is the top-level directory which will be operated on.  If you are just testing the PFM, you might want to set the base to be a test directory.
  2. Determine the optimal replica settings and any special settings.  These are controlled by the values of the attributes ignore, repmin, repmax, and replicate_X in the config file.
  3. Catalogue all in-progress transfers.  A transfer is considered in-progress if:
    • it has been active for under grace_time_s seconds OR
    • if the total rate is less than min_rate_KB_s KB/s.
  4. Cancel all transfers which are one-sided.  That is, the destination pool is not attempting the transfer but the server pool is (or vice-versa).
  5. Cancel all transfers which are bad.  That is, it does not get cancelled by (4) and does not meet the in-progress rules set out by (3).
  6. Walk through all the files within the base directory.  The following information is recorded for each file:
    • Filename
    • Parent directory
    • PNFS ID
    • Size as recorded by PNFS
  7. Iterate through each pool.  Each pool recieves the pnfs register and rep ls commands.  For each file in the output of rep ls, the following logic is executed:
    1. Parse the status, PNFS ID, and replica size
    2. If the replica size does not match the file size recorded from step (6) from PNFS, ignore this replica and start processing the next.
    3. If the file is cached and at the correct size, make it precious.
    4. Record the pool in a PNFS ID -> pool list mapping.
    5. If all the other pools in step (4) are on different hosts, then add the file size to a PNFS ID -> total replica size mapping.  This is done so the replica only "counts" once per host.
  8. For each file inside the base directory, execute the following logic:
    1. If the full filename matches one of the ignore regular expressions, then skip it.
    2. If the file is in the catalogue of in-progress replications, then skip it.
    3. If the file is size 0, then skip it.
    4. Determine the values of repmin and repmax for this file:
      • By default, any number of replicas between the global repmin and repmax is fine.
      • If the full file name matches one of the replicate_X regular expressions, set it's repmin = repmax = X
    5. Do one of the following:
      • If there are no replicas, print out a special error message and make a note of it.
      • If there are too few replicas, start a P2P copy using the function replicate, described later.
        • This is only done up to max_replications times per run of the dCache PFM.
      • If there are too many replicas, remove one using the function reduce, described later.
      • Otherwise, do nothing

Replicate Function

This is a description of the replicate algorithm:
  1. Determine the possible sources as recorded in step 7.4 above.
  2. For each source pool, determine the CPU cost.  Pick the lowest CPU cost as the source.
  3. Pick the destination pool:
    1. For each pool, determine the freespace and total space.
    2. Exclude any pools on the blacklist
    3. Exclude any pool which is on a host that already contains a replica of the file.
    4. Create a list of the three pools with the most freespace percentage.
    5. Double check this narrowed list and make sure the pool doesn't contain a replica already (this is done because pools sometimes contain a partial replica).
    6. Pick a pool at random from the filtered list.
  4. Start the P2P transfer using the "pp get file <pnfsid> <source pool>" command from the destination pool selected in (3).
If, at any point, one of the pools times out during a command, it is added to the blacklist for destinations.

Reduce Function

This is a description of the reduce algorithm:
  1. For each possible source pool as determined by 7.4 above, determine the freespace percentage.  (Note to self: Although not necessary, should check for copies again!)
  2. Pick the two pools with the least amount of freespace.  Select one at random.
  3. Double check to make sure that there are at least two possible sources (Note to self: need to make sure this respects hosts, not just pools!  Doesn't appear to right now, although this will be treated correctly the next time the script is run...).  Always refuse to do a deletion which would reduce replica count to 0!
  4. If there is no problem, then finally do the deletion on the chosen pool.

Current list of concerns

There are a few concerns which I'd like to point out.  Any script which automates things must be watched closely, especially those which can remove files!  That said, there are no currently known bugs which could cause data loss.  We are running this script on our production site.
  • For dCache instances with lots of pools that don't respond at random, this can cause massive replication.
    • This can be avoided by loosening the requirements; i.e. repmin < repmax instead of repmin = repmax.
    • The better solution is to properly set up your dCache so pools don't randomly disappear!
  • Files are forced to be precious.
  • Invalid-looking P2P transfers may be cancelled even if they were started by something else!
  • There isn't enough variability/randomness in source and destination pool selection.
  • This is an iterative algorithm, so if you start sufficiently far away from the "optimal" state or change the state drastically between runs, it may take a long time to reach a fixed point.

Questions?  Comments?  Well, discussion is enabled on this page for logged-in users.  Leave a note or drop me an email!

Powered by Plone, the Open Source Content Management System