Author: Jerome Lauret, Modified: Alexandr Prozorov
flowchart TD
subgraph SL7["📦SL7 Container"]
Analysis["Your Analysis"]
end
subgraph Alma["🎯 New OS (AlmaLinux 9)"]
SL7
end
Alma -. Job submission .-> HTCondor[/"📤 Job Scheduler (HTCondor)<br><i>submit from Alma9</i>"/]
HTCondor -- Execution in container --> Analysis
style Analysis stroke-width:1px,stroke-dasharray: 1
Summary
- Update
~/.cshrc
and~/.login
for NFSv4:setenv GROUP_DIR /star/nfs4/AFS/star/group
Use new a9
starsub
nodes whenssh
:rcas60xx
→starsub03 - starsub07
- Use SL7 container via Singularity for job submission:
Add to your XML scheduler inside
<job> </job>
:<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell>
- Test your script inside the container:
singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif csh
- In the container:
cons # or your compile comand root myMacro.C("test_file") # verify that it works exit # leave the container for further sumbitting to HTCondor
- In the container:
Submit OUTSIDE the container from direct Alma 9.
- Instead of
star-submit
/star-submit-template
→ usestar-submit-beta
/star-submit-template-beta
⚙️ Transition from SL7 to Alma 9
The facility CPUs are being upgraded from Scientific Linux 7 (SL7) to a newer operating system called Alma 9 (a9).
At this stage, more than 2/3rd of the farm was converted to a9, therefore, in order to facilitate large number of jobs, you are encouraged to follow the instructions herein (this is why jobs submitted from SL7 are pending for so long).
On a9, AFS is no longer supported which means that we also need to find a replacement for AFS. This replacement has been set to be NFS version 4.
Below are instructions allowing to bridge this gap while we transition (both the underlined OS and file system).
⚠️ Notes on Code Compatibility
In this transition period, STAR will continue to use a code based on SL7 (we do NOT yet have A9 native support for our code). This means you will need to assemble and compile codes as usual, using rcas
node.
Here is a quick rundown recipe on how to run on a9 in SL7 containers.
🔧 Environment Setup
First, make sure your login is modified as follows:
- Instead of having something like:
setenv GROUP_DIR /afs/rhic.bnl.gov/star/group
replace by:
setenv GROUP_DIR /star/nfs4/AFS/star/group
You need to modify BOTH your
$HOME/.cshrc
and$HOME/.login
.This will take care of using NFSv4 instead of AFS. Using
rcas
nodes, that should work as usual (if not, please do not revert to AFS but report any issues).- NB: For now, we are testing ONLY official STAR libraries - please do NOT use other libraries, private or otherwise.
🖥️ Submit Nodes
The submit nodes for launching jobs on Alma 9 are named starsub0X
where X is a number from 1 to 7 (ex: starsub03
).
However, we ask you use
03
or upper number for now.
Reasons:starsub03
and above are at Condor version24.0.4 2025-02-02 BuildID: 784178
while01/02
are still at23.9.6 2024-08-08 BuildID: 748275 PackageID: 23.9.6-1
(adjustments were made with the latest version).starsub0X
nodes are Alma 9 nodes. The STAR software is not yet available on those but you will be able to submit to the a9 farm from there.
📤 Submitting from A9 (starsub0X
)
If you are a STAR Scheduler user:
Please use star-submit-beta
and/or star-submit-template-beta
.
All you need to do is to add the following line in your XML:
<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell>
Example:
<?xml version="1.0" encoding="utf-8" ?>
<job>
<command>root4star -q -b StHbtDiHadron.C\(1000000,100,-1,\"\",\"$FILELIST\"\)</command>
<stdout URL="file:/star/u/carcassi/scheduler/out/$JOBID.out" />
<input URL="file:/star/data21/reco/productionCentral/FullField/P02gc/2001/312/st_physics_2312011_raw_0017.MuDst.root" />
</job>
Change to
<?xml version="1.0" encoding="utf-8" ?>
<job>
<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell> <!-- highlight -->
<command>root4star -q -b StHbtDiHadron.C\(1000000,100,-1,\"\",\"$FILELIST\"\)</command>
<stdout URL="file:/star/u/carcassi/scheduler/out/$JOBID.out" />
<input URL="file:/star/data21/reco/productionCentral/FullField/P02gc/2001/312/st_physics_2312011_raw_0017.MuDst.root" />
</job>
If you are not a STAR scheduler user:
Make sure that whatever you do to submit jobs, you execute a shell script in the container.
In condor land, this may be adjusting your JDL to read as follows:
Arguments = "singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif /blabla/where-my-csh-script-is.csh"
Instead of:
Arguments = /blabla/where-my-csh-script-is.csh
Adapt this as needed.
🧪 Testing a Job Interactively
NOTE: Before submitting, you may want to test ONE job interactively to make sure it works.
Remember that on
starsub0X
, you are on Alma 9 and therefore, our code is not yet supported as indicated earlier.- Therefore, you will need to start a shell like this:
singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif csh
This will start a SL7 login on an Alma9 node.
- From that shell, you can execute one of the generated
.csh
scripts and verify all goes according to plan.cons root my_macro.C exit
- If this runs, you are ready to submit outside the singularity shell. It means you should exit the container and send jobs to HTCondor from pure Alma 9 node, not inside the SL7 containter.
⚠️ Possible Issues
- There has been reports of issues with the 32bits version of ROOT/CInt - if you encounter an issue, please try the 64bits environment.
setup 64b
While using SIMD instructions, there may be a need to restrict jobs to some CPU architecture.
We currently do not have a flag in the STAR scheduler for this but:requirements = (Microarch >= "x86_64-v4")
would limit to one kind of nodes with specific SIMD instructions.
From the production test, we have evidence of a slowdown when more jobs are running.