Author: Jerome Lauret, Modified: Alexandr Prozorov
flowchart TD
subgraph SL7["📦SL7 Container"]
Analysis["Your Analysis"]
end
subgraph Alma["🎯 New OS (AlmaLinux 9)"]
SL7
end
Alma -. Job submission .-> HTCondor[/"📤 Job Scheduler (HTCondor)<br><i>submit from Alma9</i>"/]
HTCondor -- Execution in container --> Analysis
style Analysis stroke-width:1px,stroke-dasharray: 1
Summary - TL;DR
- Update
~/.cshrcand~/.loginfor NFSv4:setenv USE_NFS4 1 setenv GROUP_DIR /star/nfs4/AFS/star/group Use new a9
starsubnodes whenssh:starsub01 - starsub07- Use SL7 container via Singularity for job submission:
Add to your XML scheduler inside
<job> </job>:<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell> - Test your script interactively inside the container:
singularity shell --shell /usr/bin/csh -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif cshI suggest to create an alias for it
- In the container:
cons # or your compile comand root myMacro.C("test_file") # verify that it works exit # leave the container for further sumbitting to HTCondor
- In the container:
- Submit OUTSIDE the container from direct Alma 9.
⚙️ Transition from SL7 to Alma 9
The facility CPUs are upgraded from Scientific Linux 7 (SL7) to a newer operating system called Alma 9 (a9).
On a9, AFS is no longer supported which means that we also need to find a replacement for AFS. This replacement has been set to be NFS version 4. Below are instructions allowing to bridge this gap while we transition (both the underlined OS and file system). —
🔧 Environment Setup
First, make sure your login is modified as follows:
- Instead of having something like: ```bash setenv GROUP_DIR /afs/rhic.bnl.gov/star/group
replace by:
```bash
setenv USE_NFS4 1
setenv GROUP_DIR /star/nfs4/AFS/star/group
You need to modify BOTH your
$HOME/.cshrcand$HOME/.login.NB: For now, we are testing ONLY official STAR libraries - please do NOT use other libraries, private or otherwise.
🖥️ Submit Nodes
The submit nodes for launching jobs on Alma 9 are named starsub0X where X is a number from 1 to 7 (ex: starsub03).
📤 Submitting from A9 (starsub0X)
If you are a STAR Scheduler user:
All you need to do is to add the following line in your XML:
<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell>
Example:
<?xml version="1.0" encoding="utf-8" ?>
<job>
<command>root4star -q -b StHbtDiHadron.C\(1000000,100,-1,\"\",\"$FILELIST\"\)</command>
<stdout URL="file:/star/u/carcassi/scheduler/out/$JOBID.out" />
<input URL="file:/star/data21/reco/productionCentral/FullField/P02gc/2001/312/st_physics_2312011_raw_0017.MuDst.root" />
</job>
Change to
<?xml version="1.0" encoding="utf-8" ?>
<job>
<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell> <!-- highlight -->
<command>root4star -q -b StHbtDiHadron.C\(1000000,100,-1,\"\",\"$FILELIST\"\)</command>
<stdout URL="file:/star/u/carcassi/scheduler/out/$JOBID.out" />
<input URL="file:/star/data21/reco/productionCentral/FullField/P02gc/2001/312/st_physics_2312011_raw_0017.MuDst.root" />
</job>
If you are not a STAR scheduler user:
Make sure that whatever you do to submit jobs, you execute a shell script in the container.
In condor land, this may be adjusting your JDL to read as follows:
Arguments = "singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif /blabla/where-my-csh-script-is.csh"
Instead of:
Arguments = /blabla/where-my-csh-script-is.csh
Adapt this as needed.
🧪 Testing a Job Interactively
NOTE: Before submitting, you may want to test ONE job interactively to make sure it works.
Remember that on
starsub0X, you are on Alma 9 and therefore, our code is not yet supported as indicated earlier.- Therefore, you will need to start a shell like this:
singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif csh - This will start a SL7 login on an Alma9 node
- If you want to to pass the display environment variable when running it interactively, for example, to open TBrowser or see plots in root/root4star, add
--env DISPLAY=$DISPLAY:singularity shell --shell /usr/bin/csh --env DISPLAY=$DISPLAY -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sifHowever, for faster usage one could either:
- mount/copy
.rootfiles locally - use VSCode extension +
ssh use NoMachine (VNC server)
- From that shell, you can execute one of the generated
.cshscripts and verify all goes according to plan.cons root my_macro.C exit - If this runs, you are ready to submit outside the singularity shell. It means you should exit the container and send jobs to HTCondor from pure Alma 9 node, not inside the SL7 containter.
⚠️ Possible Issues
- There has been reports of issues with the 32bits version of ROOT/CInt - if you encounter an issue, please try the 64bits environment.
setup 64b
While using SIMD instructions, there may be a need to restrict jobs to some CPU architecture.
We currently do not have a flag in the STAR scheduler for this but:requirements = (Microarch >= "x86_64-v4")would limit to one kind of nodes with specific SIMD instructions.
From the production test, we have evidence of a slowdown when more jobs are running.