Using Nebraska Grid Interfaces
The documentation explains how to use the grid interfaces here at Nebraska.
The idea behind grids is that you submit to a cluster of
clusters. To do this, some new paradigms for authentication and
application/data movement must be learned. It is difficult at
first, but well worth it when you realize how many idle processors you
will have access to.
Authentication
For authentication, we use Public Key Infrastructure with
Globus. To use this, you must first have a user
certificate. The application process is explained here:
Applying for a UNL grid certificate
Once
you have recieved your certificate, you will need to gain access to a
submission machine. Instead of submitting jobs from a headnode
for the grid, one submits to a "User Interface" (UI), which has the
necessary grid software installed, but is not attached to any
particular cluster on the grid. The local UI is
osg-test2.unl.edu. In order to use this, you must get a shell
account and install your certificate. Instructions here:
Installing your certificate on the Grid UI
Once
you have your grid certificate installed, you must be added to our
local Virtual Organization. Right now, this must be done manually
by a site administrator, but will be done through a separate website at
a later date.
Simple Jobs and Data Movement
Here are some simple globus-related commands.
Initializing the Globus proxy:
[brian@red ~]$ grid-proxy-init
Your identity: /DC=org/DC=doegrids/OU=People/CN=Brian Bockelman 504307
Enter GRID pass phrase for this identity:
Creating proxy ........................................................... Done
Your proxy is valid until: Fri Feb 2 07:40:29 2007
[brian@red ~]$
Testing authentication:
[brian@osg-test1 ~]$ globusrun -a -r red.unl.edu
GRAM Authentication test successful
[brian@osg-test1 ~]$
Initializing the VOMS proxy:
Try one, errors out:
[brian@gpn-husker ~]$ voms-proxy-init --voms gpn:/gpn
VOMS Server for gpn not known!
[brian@gpn-husker ~]$
Add GPN line to /opt/glite/etc/vomses file:
"gpn" "t2.unl.edu" "15002" "/DC=org/DC=doegrids/OU=Services/CN=voms/t2.unl.edu" "gpn"
Second try, work:
[brian@gpn-husker ~]$ voms-proxy-init --voms gpn:/gpn
Your identity: /DC=org/DC=doegrids/OU=People/CN=Brian Bockelman 504307
Enter GRID pass phrase:
Your proxy is valid until Fri Feb 2 07:47:14 2007
Creating temporary proxy ................................................................................ Done
Contacting t2.unl.edu:15002 [/DC=org/DC=doegrids/OU=Services/CN=voms/t2.unl.edu] "gpn"
Done
Creating proxy ................................................ Done
Your proxy is valid until Fri Feb 2 07:47:14 2007
[brian@gpn-husker ~]$
Running a simple command on the host machine:
globus-job-run red.unl.edu:/jobmanager-fork /bin/hostname
Notes:
Somethings to note with globus-job-run:
- The full path must be specified on the command. Just putting 'hostname' will return an error.
- jobmanager-fork runs commands as a process on the headnode of the cluster NOT in the batch queue. For most circumstances this is highly undesirable. To launch a job into the batch queue use jobmanager-pbs
Transferring files to the data area on red:
globus-url-copy file:////home/USERNAME/filename.txt gsiftp://red.unl.edu/opt/data/remote_file.txt
It's important to specify the filename on the remote machine. Just leaving a trailing directory will result in an error.
Example:
A full example of running a grid job can be found here.Manage Jobs with Condor-G
While the globus tools are excellent for putting together and submitting single jobs at a time, they can present a significant headache in managing large numbers of jobs at a time.In order to solve this problem, we turn back to an old friend - the batch scheduler. In this case, an extension of Condor, Condor-G, handles most of the grunt work of moving data and executables for us. Condor-G is one of the most extensively used tools on the Open Science Grid.
We have adopted a tutorial that originated at UW-Madison, and updated it for our use. Part I covers simple uses of Condor-G. Part II covers recovering from common errors.
This tutorial taken from the UW-Madison website and updated for our use.