Author: Jerome Lauret, Modified: Alexandr Prozorov


flowchart TD
 subgraph SL7["📦SL7 Container"]
        Analysis["Your Analysis"]
  end
 subgraph Alma["🎯 New OS (AlmaLinux 9)"]
        SL7
  end
    Alma -. Job submission .-> HTCondor[/"📤 Job Scheduler (HTCondor)<br><i>submit from Alma9</i>"/]
    HTCondor --  Execution in container --> Analysis

    style Analysis stroke-width:1px,stroke-dasharray: 1

Summary - TL;DR

  • Update ~/.cshrc and ~/.login for NFSv4:
     setenv USE_NFS4 1
     setenv GROUP_DIR /star/nfs4/AFS/star/group
    
  • Use new a9 starsub nodes when ssh: starsub01 - starsub07

  • Use SL7 container via Singularity for job submission:

    Add to your XML scheduler inside <job> </job>:

    <shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell>
    
  • Test your script interactively inside the container:
      singularity shell --shell /usr/bin/csh -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif csh
    

    I suggest to create an alias for it

    • In the container:
      cons # or your compile comand
      root myMacro.C("test_file") # verify that it works
      exit # leave the container for further sumbitting to HTCondor
      
  • Submit OUTSIDE the container from direct Alma 9.

⚙️ Transition from SL7 to Alma 9

The facility CPUs are upgraded from Scientific Linux 7 (SL7) to a newer operating system called Alma 9 (a9).
On a9, AFS is no longer supported which means that we also need to find a replacement for AFS. This replacement has been set to be NFS version 4. Below are instructions allowing to bridge this gap while we transition (both the underlined OS and file system). —


🔧 Environment Setup

First, make sure your login is modified as follows:

  • Instead of having something like: ```bash setenv GROUP_DIR /afs/rhic.bnl.gov/star/group
  replace by:  
```bash
   setenv USE_NFS4 1
   setenv GROUP_DIR /star/nfs4/AFS/star/group
  • You need to modify BOTH your $HOME/.cshrc and $HOME/.login.

  • NB: For now, we are testing ONLY official STAR libraries - please do NOT use other libraries, private or otherwise.


🖥️ Submit Nodes

The submit nodes for launching jobs on Alma 9 are named starsub0X where X is a number from 1 to 7 (ex: starsub03).


📤 Submitting from A9 (starsub0X)

If you are a STAR Scheduler user:

All you need to do is to add the following line in your XML:

<shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell>

Example:

<?xml version="1.0" encoding="utf-8" ?> 
<job>
 <command>root4star -q -b StHbtDiHadron.C\(1000000,100,-1,\"\",\"$FILELIST\"\)</command>
 <stdout URL="file:/star/u/carcassi/scheduler/out/$JOBID.out" />
 <input URL="file:/star/data21/reco/productionCentral/FullField/P02gc/2001/312/st_physics_2312011_raw_0017.MuDst.root" />
</job>

Change to

<?xml version="1.0" encoding="utf-8" ?> 
<job>
 <shell>singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif</shell> <!-- highlight -->
 <command>root4star -q -b StHbtDiHadron.C\(1000000,100,-1,\"\",\"$FILELIST\"\)</command>
 <stdout URL="file:/star/u/carcassi/scheduler/out/$JOBID.out" />
 <input URL="file:/star/data21/reco/productionCentral/FullField/P02gc/2001/312/st_physics_2312011_raw_0017.MuDst.root" />
</job>

If you are not a STAR scheduler user:

Make sure that whatever you do to submit jobs, you execute a shell script in the container.
In condor land, this may be adjusting your JDL to read as follows:

Arguments = "singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif /blabla/where-my-csh-script-is.csh"

Instead of:

Arguments = /blabla/where-my-csh-script-is.csh

Adapt this as needed.


🧪 Testing a Job Interactively

NOTE: Before submitting, you may want to test ONE job interactively to make sure it works.

  • Remember that on starsub0X, you are on Alma 9 and therefore, our code is not yet supported as indicated earlier.

  • Therefore, you will need to start a shell like this:
    singularity exec -e -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif csh
    
  • This will start a SL7 login on an Alma9 node
  • If you want to to pass the display environment variable when running it interactively, for example, to open TBrowser or see plots in root/root4star, add --env DISPLAY=$DISPLAY:
      singularity shell --shell /usr/bin/csh --env DISPLAY=$DISPLAY -B /direct -B /star -B /afs -B /gpfs -B /sdcc/lustre02 /cvmfs/star.sdcc.bnl.gov/containers/rhic_sl7.sif
    

    However, for faster usage one could either:

  • mount/copy .root files locally
  • use VSCode extension + ssh
  • use NoMachine (VNC server)

  • From that shell, you can execute one of the generated .csh scripts and verify all goes according to plan.
    cons
    root my_macro.C
    exit
    
  • If this runs, you are ready to submit outside the singularity shell. It means you should exit the container and send jobs to HTCondor from pure Alma 9 node, not inside the SL7 containter.

⚠️ Possible Issues

  • There has been reports of issues with the 32bits version of ROOT/CInt - if you encounter an issue, please try the 64bits environment.
  setup 64b
  • While using SIMD instructions, there may be a need to restrict jobs to some CPU architecture.
    We currently do not have a flag in the STAR scheduler for this but:

    requirements = (Microarch >= "x86_64-v4")
    

    would limit to one kind of nodes with specific SIMD instructions.

  • From the production test, we have evidence of a slowdown when more jobs are running.