Using Globus Web Services with the Open Science Grid
We have been working on testing the deployment of Web Services through the Open Science Grid (OSG). Specifically, we are seeing how an OSG Compute Element (CE) fairs under high load with WS-GRAM versus pre-WS GRAM.
Currently, Globus-WS is slated for inclusion into the OSG-0.4.1 release, as well as any future releases of the OSG. Once OSG-0.4.1 comes out, there will be no separate installation procedure for Globus-WS, simplifying the process.
What remains on this site are validation procedures for Globus-WS:
Condor-G Tests
The next tests utilize the job submission capabilities of Condor-G, which will allow us to test how well Globus-WS will scale up against a large number of jobs.
Unfortunately, the default settings for Condor-G severely limit the number of jobs that can be sent to one resource at a time. Add the following two lines to your condor config file on your submitting machine:
Currently, Globus-WS is slated for inclusion into the OSG-0.4.1 release, as well as any future releases of the OSG. Once OSG-0.4.1 comes out, there will be no separate installation procedure for Globus-WS, simplifying the process.
What remains on this site are validation procedures for Globus-WS:
Validation for Globus-WS
We have put together a set of scripts and this documentation page which will walk you through the validation of Globus-WS at your site. Because some of the tests are somewhat-qualitative (is the site responsive?) and involve many components, we have not automated the tests at this time. Perhaps in the future we will be able to include some of these tests into the site_verify script for the OSG.- Set up a grid proxy for yourself, and make sure authentication will work. Setting up authentication is outside the scope of this page, but instructions can be found elsewhere on this site.
- Download and unpack our validation scripts:
wget http://t2.unl.edu/cms/ws-osg-tests/validation_tests.tgz
tar zxf validation_tests.tgz
cd validation_tests - Determine your hostname; this will be referred as HOSTNAME in the rest of the tutorial. If unsure, simply type globus-hostname into the prompt. The name returned is what globus thinks your hostname is.
- Simple fork job test: This tests the functionality of just the globus components by sending a test job. Type the following from your submit machine:
globus-job-run-ws -args "-c" "hostname" HOSTNAME:9443 /bin/sh
Note that the port the OSG uses for Globus-WS is NOT the default port; you must specify 9443. - Simple batch queue test: This tests the functionality of Globus-WS submitting jobs to the remote batch queue. The most common remote batch queues are either Condor or PBS (in pre-WS, these were referred to jobmanager-condor and jobmanager-pbs, respectively). They are called Job Factories in the new nomenclature. Try the following:
globus-job-run-ws -args "-c" "hostname" -factory-type Condor HOSTNAME:9443 /bin/sh
The hostname that is returned as the output should be that of a worker node in the remote cluster, not the head node.
Condor-G Tests
The next tests utilize the job submission capabilities of Condor-G, which will allow us to test how well Globus-WS will scale up against a large number of jobs.Unfortunately, the default settings for Condor-G severely limit the number of jobs that can be sent to one resource at a time. Add the following two lines to your condor config file on your submitting machine:
GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE=2000These tests can be monitored using the "condor_q -globus". The tests are considered successful when all of the condor jobs successfully complete (no longer show up in condor_q). Monitor the *.log files for more details.
GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE=50
- Simple Condor-G test: From the validation_tests directory, edit the file condor-g-ws-test to reflect your HOSTNAME (not osg-test1.unl.edu). Then, simply type condor_submit condor-g-ws-test to submit a single job. The file job_test.output should return the hostname of the worker node the job was executed on.
- Large Condor-G test: Edit condor-g-ws-test-sleep to reflect your HOSTNAME. Then, submit the file to condor. This will send 500 jobs that will sleep for a random amount of time (30 seconds to 10 minutes).
- Large Condor-G test with I/O movement: Edit condor-g-ws-test-sleep-io to reflect your HOSTNAME. Then, submit the file to condor. This will send 200 jobs that will sleep for a random amount of time. These jobs are the same as above, except they also stage in and stage out a 4 MB file, which is similar to what a real application may do.
- Condor-G cancellation test: Edit condor-g-ws-testcancel to reflect your HOSTNAME. Then, submit the file to condor. This test will send 200 long sleep jobs to the remote machine. Once all of the jobs have been submitted, cancel all of them using "condor_rm <your username>". This test is successful when all of the jobs have been removed from the condor queue.
Additional Variations
There are several variants to the above Condor-G based tests. Here are a couple of additional ones to test:- Response tests: While in the middle of any of the larger tests outlined above (while the jobs are still being submitted to the remote server), try the following command:
time globus-job-run-ws HOSTNAME:9443 "/bin/sh -c date"
This will return the time taken to complete the globus job. In order for GridCat to function, the job must complete in under 45 seconds; for the most part, WS-GRAM cannot achieve this in the version released with the VDT 1.3.10 when under heavy loads. - Larger tests. Change "Queue 200" to "Queue 500" or "Queue 3500" for more intense tests. Combine this with the above variation.
- Back-to-back tests. Perform one of the above tests in quick succession; as soon as the cancel test finishes, try the large Condor-G I/O test.
- Application tests. Try altering your actual application to use Globus-WS. WS-GRAM is the wave of the future, so your application will need to eventually work with it.