Personal tools
You are here: Home Documentation dCache Dataset Query Interface
Document Actions

Dataset Query Interface

by admin last modified 2008-05-13 15:07

The Dataset Query Interface is a means for site admins to allow external parties to probe individual files and CMS datasets at their site. This is useful for central operations to be able to check the consistency of external catalogs like DBS

The Dataset Query Interface is bundled with the dCacheNebraskaWeb web application.  This must be run on a node with PNFS mounted.

Installation and Startup - Server

To install the Dataset Query Interface, follow the dCacheNebraska install instructions here.  The interface does not require any passwords or database logins to operate.  To start the dCacheNebraskaWeb application, do the following:
chkconfig --add dCacheNebraskaWeb
service dCacheNebraska start
This will start the dCacheNebraskaWeb interface on port 8098.  I recommend that you add a rewrite rule to the system's Apache server so the web applet "appears" to be on the normal HTTP port (this is especially useful for users at sites with restrictive firewalls).  In the file /etc/httpd/conf/httpd.conf, inside the tag <VirtualHost *:80>, add:
 <IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^/billing/(.*) http://localhost:8098/billing/$1 [L,P]
</IfModule>
Reload Apache with:
service httpd reload
You should now be able to view the billing graphs at http://localhost/billing/xml/.  If the billing graphs are loaded, then the query interface will also be exposed.

Installation and Setup - Client

There is no installation or setup for the client - everything is currently done through the web interface.

If there is interest, we do plan on introducing a script which would automate the synchronization of DBS and dCache.

Using the Query Interface

The query interface has a few major components.  It always responds in plain text to allow easy parsing by external tools (it is not really meant for non-experts).
  • /billing/query/check/<filename>: Check the status of a single file in the filesystem.  The filename should be the physical file name, url-escaped.  This means you need to replace all slashes, /, with the URL escape code, %2F.
  • /billing/query/getSite: Simply returns the best guess of this site's CMS SE.  Note that the current hostname may not be the CMS-recognized storage element endpoint; often, the endpoint is the SRM server and the applet is most likely running on a different admin server.  Make sure that it identifies your site correctly.
  • /billing/query/getCmsName: Returns the CMS site name; for Nebraska, this is T2_US_Nebraska.  Again, make sure that this identifies your site correctly.
  • /billing/query/checkCmsBlock/path/to/block%23<block UUID>: Perform a sanity check on all the files in this block which are supposed to be at your site.  The sanity check takes 0-10 seconds per file, so this might take awhile to load.  Partial results are given as they are available.  For example, looking at the block /ttbar_inclusive_TopRex/CMSSW_1_3_1-Spring07-1122/GEN-SIM-DIGI-RECO%23fa31a80f-e6ca-47c0-b0c8-e278b9f7cd49 results in the following URL:
    http://localhost/billing/query/checkCmsBlock/ttbar_inclusive_TopRex/CMSSW_1_3_1-Spring07-1122/GEN-SIM-DIGI-RECO%23fa31a80f-e6ca-47c0-b0c8-e278b9f7cd49
    Note how the # in the block name is escaped with a %23.
  • /billing/query/checkCmsFile/path/to/LFN: Perform a sanity check on a single file; there is a large overhead in this function, so please do not automate queries on it.  For the file /store/mc/PreCSA08/MuonPT5/GEN-SIM-RAW/STARTUP_V2_v2/0006/92AD107C-431B-DD11-9215-00188B7AD141.root, the URL would be:
    http://localhost/billing/query/checkCmsFile/store/mc/PreCSA08/MuonPT5/GEN-SIM-RAW/STARTUP_V2_v2/0006/92AD107C-431B-DD11-9215-00188B7AD141.root
    Note we did not have to escape any components of the URL.
  • /billing/query/path/to/dataset: Perform a sanity check on a full dataset.  Each file can take 0-10 seconds to check, so it will take awhile to load.  Partial results are given as they are available.  For example, looking at the dataset /MuonPT5/PreCSA08_STARTUP_V2_v2/GEN-SIM-RAW will result in this URL:
     http://dcache-head.unl.edu/billing/query/checkCmsDataset/MuonPT5/PreCSA08_STARTUP_V2_v2/GEN-SIM-RAW
    Note we did not have to escape anything!

Operational Notes

  • The sanity checks currently performed are a namespace check (make sure the filename is in PNFS), followed by a read via dCap.  So, make sure that you can read via dCap on the web server node.
    • Because a dCap read is performed, this might cause havoc at tape-based sites!  Think before you run this at a T1 site - unless you really want to prestage!
  • If the file turns up as "OK", then it is most likely OK.  If the file turns up as FAIL, it will require debugging to determine its real status - there's still a bit of a false negative rate, but not much of a false positive rate.
  • The web application is designed to run as daemon.
  • The read via dCap will be forcibly killed after 10 seconds.
  • Sometimes the dCap server will refuse to read 10-15 consecutive files due to internal error, not due to a problem with the files.  If you notice large consecutive chunks of files missing, it might indicate a larger site problem.

Powered by Plone, the Open Source Content Management System