Biowulf
helix and data transfer
- Data transfer and intensive file management tasks should not be performed on the Biowulf login node, biowulf.nih.gov. Instead such tasks should be performed on the dedicated interactive data transfer node (DTN), helix.nih.gov
- Examples of such tasks include:
- cp, mv, rm commands on large numbers of files or directories
- file compression/uncompression (tar, zip, etc.)
- file transfer (sftp, scp, rsync, etc.)
- https://helix.nih.gov/ Helix is an interactive system for short jobs. Moving large data transfers to Helix, which is now designated for interactive data transfers. For instance, use Helix when transferring hundreds of gigabytes of data or more using any of these commands: cp, scp, rsync, sftp, ascp, etc.
- https://hpc.nih.gov/docs/rhel7.html#helix Helix transitioned to becoming the dedicated interactive data transfer and file management node [1] and its hardware was later upgraded to support this role [2]. Running processes such as scp, rsync, tar, and gzip on the Biowulf login node has been discouraged ever since.
Linux distribution
$ ls /etc/*release # login mode $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.8 (Santiago) $ sinteractive # switch to biowulf2 computing nodes $ cat /etc/redhat-release CentOS release 6.6 (Final) $ cat /etc/centos-release CentOS release 6.6 (Final)
Training notes
- https://hpc.nih.gov/training/ Slides, Videos and Handouts from Previous HPC Classes
- https://hpc.nih.gov/docs/ Biowulf User Guides
- https://hpc.nih.gov/docs/B2training.pdf
- https://hpc.nih.gov/docs/biowulf2-beta-handout.pdf
- Online class: Introduction to Biowulf
Storage
/scratch and /lscratch
The /scratch area on Biowulf is a large, low-performance shared area meant for the storage of temporary files.
- Each user can store up to a maximum of 10 TB in /scratch. However, 10 TB of space is not guaranteed to be available at any particular time.
- If the /scratch area is more than 80% full, the HPC staff will delete files as needed, even if they are less than 10 days old.
- Files in /scratch are automatically deleted 10 days after last access.
- Touching files to update their access times is inappropriate and the HPC staff will monitor for any such activity.
- Use /lscratch (not /scratch) when data is to be accessed from large numbers of compute nodes or large swarms.
- The central /scratch area should NEVER be used as a temporary directory for applications -- use /lscratch instead.
- fasterq-dump command in SRA-Toolkit where a temporary directory is needed. See HowTo page.
- Running RStudio interactively. It is generally recommended to allocate at least a small amount of lscratch for temporary storage for R.
- See the slides of Using the NIH HPC Storage Systems Effectively from NIH HPC classes
Transfer files
- https://hpc.nih.gov/docs/transfer.html
- Data Management: Best Practices for Groups
- Locally Mounting HPC System Directories using CIFS or SMB protocol.
- Globus
Alternatives:
- To transfer or share large amounts of data to specific individuals, located either at the NIH or at another institution, the HPC staff recommends Globus - https://hpc.nih.gov/storage/globus.html.
- To share files with other users within NIH, CIT provides OneDrive - see https://myitsm.nih.gov/sp?id=kb_article&sys_id=1e424bebdbbc2f80dcd074821f9619c4 for details.
- To share small amounts of data with external or NIH colleagues, NIH makes the Box service (https://nih.account.box.com/login) available. You will need to request a Box account, if you do not already have one. It is possible to transfer data directly to and from the HPC systems using Box - see https://hpc.nih.gov/docs/box.html for details.
User Dashboard
https://hpc.nih.gov/dashboard/
See Uppercase S in permissions of a folder and setGID. chmod -R 2770 ShareFolder.
Local disk and temporary files
See https://hpc.nih.gov/docs/b2-userguide.html#local and https://hpc.nih.gov/storage/
Dashboard
User dashboard Unlock account, disk usage, job info
Quota
checkquota
Environment modules
# What modules are available module avail module -d avail # default module avail STAR module spider bed # search by a case-insensitive keyword # Load a module module list # loaded modules module load STAR module load STAR/2.4.1a module load plinkseq macs bowtie # load multiple modules # if we try to load a module in a bash script, we can use the following module load STAR || exit 1 # Unload a module module unload STAR/2.4.1a # Switch to a different version of an application # If you load a module, then load another version of the same module, the first one will be unloaded. # Examine a modulefile $ module display STAR ----------------------------------------------------------------------------- /usr/local/lmod/modulefiles/STAR/2.5.1b.lua: ----------------------------------------------------------------------------- help([[This module sets up the environment for using STAR. Index files can be found in /fdb/STAR ]]) whatis("STAR: ultrafast universal RNA-seq aligner") whatis("Version: 2.5.1b") prepend_path("PATH","/usr/local/apps/STAR/2.5.1b/bin") # Set up personal modulefiles # Using modules in scripts # Shared Modules
Single file - sbatch
- sbatch
- Note sbatch command does not support --module option. In sbatch case, the module command has to be put in the script file.
- The log file is slurm-XXXXXXXX.out
- If we want to create a script file, we can use the echo command; see Create a simple text file with multiple lines; write data to a file in bash script
- Script file must be starting with a line #!/bin/bash
Don't use the swarm command on a single script file since swarm will treat each line of the script file as an independent command.
sbatch --cpus-per-task=2 --mem=4g --time 24:00:00 MYSCRIPT # Use --time 24:00:00 to increase the wall time from the default 2 hours to 24 hours
An example of the script file (Slurm environment variable $SLURM_CPUS_PER_TASK within your script was used to specify the number of threads to the program)
#!/bin/bash module load novocraft novoalign -c $SLURM_CPUS_PER_TASK -f s_1_sequence.txt -d celegans -o SAM > out.sam
rslurm
rslurm package. Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.
Multiple files - swarm
- swarm. Remember the -f and --time options. The default walltime is 2 hours.
- Source code and other information on github
swarm -f run.sh --time 24:00:00 swarm -t 3 -g 20 -f run_seqtools_vc.sh --module samtools,picard,bwa --verbose 1 --devel # 3 commands run in 3 subjobs, each command requiring 20 gb and 3 threads, allocating 6 cores and 12 cpus swarm -t 3 -g 20 -f run_seqtools_vc.sh --module samtools,picard,bwa --verbose 1 # To change the default walltime, use --time 24:00:00 swarm -t 8 -g 24 --module tophat,samtools,htseq -f run_master.sh cat sw3n17156.o
Environment variables $SLURM
Swarm on Biowulf. Some examples: fmriprep, qsiprep.
- $SLURM_MEM_PER_NODE: 32768
- $SLURM_JOB_ID
- $SLURM_NNODES: 1
- $SLURM_CPUS_ON_NODE: 2
- $SLURM_JOB_CPUS_PER_NODE: 2
- $SLURM_NODELIST: cn2448
Why a job is pending
Partition and freen
- https://hpc.nih.gov/docs/b2-userguide.html#cluster_status
- In the script file, we can use $SLURM_CPUS_PER_TASK to represent the number of cpus
- In the swarm command, we can use -t to specify the threads and -g to specify the memory (GB).
Biowulf nodes are grouped into partitions. A partition can be specified when submitting a job. The default partition is 'norm'. The freen command can be used to see free nodes and CPUs, and available types of nodes on each partition.
We may need to run swarm commands on non-default partitions. For example, not many free CPUs are available in 'norm' partition. Or Total time for bundled commands is greater than partition walltime limit. Or because the default partition norm has nodes with a maximum of 120GB memory.
We can run the swarm command on different partition (the default is 'norm'). For example, to run on b1 parition (the hardware in b1 looks inferior to norm)
swarm -f outputs/run_seqtools_dge_align -g 20 -t 16 \ --module tophat,samtools,htseq \ --time 6:00:00 --partition b1 --verbose 1
If we want to restrict the output of freen to the norm nodes, we can use freen | grep -E 'Partition|----|norm' ; see the User Guide.
Partition FreeNds FreeCPUs FreeGPUs Cores CPUs GPUs Mem Disk Features ------------------------------------------------------------------------------------------------------- norm* 8 / 216 3702 / 14748 36 72 373g 3200g cpu72,core36,g384,ssd3200,x6140,ibhdr100 norm* 101 / 522 20030 / 29232 28 56 247g 800g cpu56,core28,g256,ssd800,x2680,ibfdr norm* 497 / 529 16696 / 16928 16 32 121g 800g cpu32,core16,g128,ssd800,x2650,10g norm* 504 / 539 29192 / 30184 28 56 247g 400g cpu56,core28,g256,ssd400,x2695,ibfdr norm* 6 / 7 342 / 392 28 56 247g 2400g cpu56,core28,g256,ssd2400,x2680,ibfdr
jobhist and track the resource usage
- jobhist XXXXXXXX will show the resource usage. The Jobid Runtime, MemUsed columns can be used to identify the job that was using most resource
- To identify the command run by a job, check out the log file <swarm_XXXXXX_Y.o>
Running R scripts
https://hpc.nih.gov/apps/R.html
Running a swarm of R batch jobs on Biowulf
$ cat Rjobs R --vanilla < /data/username/R/R1 > /data/username/R/R1.out R --vanilla < /data/username/R/R2 > /data/username/R/R2.out # swarm -g 32 -t 10 -f /home/username/Rjobs --module R/3.6.0 --time 6:00:00
Pay attention to the default wall time (eg 2 hours) and various swarm options. See Swarm.
Parallelizing with parallel.
- The following is modified from biowulf R's webpage. I change it so it works on non-biowulf system (like local Linux, Mac, or Windows).
- The number of allocated CPUs and available memory are related.
- Here I assume the Windows and Mac only has a modest RAM (eg 16GB) and the local Linux has enough RAM.
- The RAM size on the local system can be obtained through the commented lines.
detectBatchCPUs <- function() { ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK")) if (is.na(ncores)) { ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE")) } if (is.na(ncores)) { # the system is not biowulf if (grepl("linux", R.version$os)) { # if it is linux, we assume there are enough ram # mem <- system('grep MemTotal /proc/meminfo', intern = TRUE) # mem <- strsplit(mem, " ")[[1]] # mem <- as.integer(mem[length(mem) -1]) ncores <- future::availableCores() # } else if (grepl("darwin", R.version$os)) { # ncores <- 2 } else ncores <- 2 } return(ncores) } ncpus <- detectBatchCPUs() options(mc.cores = ncpus) mclapply(..., mc.cores = ncpus) makeCluster(ncpus)
Some experiences
jobhist show significant less memory use than jobload. It appears that novoalign uses shared memory, which we account for in jobload but not in jobhist. In this case, you will want to use the memory value reported by jobload to avoid running out of allocated memory and thus getting killed by Slurm.
What is shared memory? Each process in Linux gets its own private application memory space. Linux makes available a portion of memory where multiple processes can also share the same memory space so these multiple processes can work on the same datasets.
Is there a way I can tell "shared memory" is in use when I was running a job? At the moment, you would have to be on a node running one of your jobs and run the 'ipcs -m' command. This lists shared memory segments for any processes using shared memory. This can be cumbersome, especially for a large multimode job or swarm.
freen command shows the maximum threads is 56 and the memory is 246GB.
When I run an R script (foreach is employed to loop over simulation runs), I find
- Assign 56 threads can guarantee 56 simulations run at the same time (check by the jobload command).
- We need to worry about the RAM size. The larger the threads, the more memory we need. If we don't assign enough memory, weird error message will be spit out.
- Even assigning 56 threads can help to run 56 simulations at the same time, the actual execution time is longer than when I run fewer simulations.
allocated threads allocated memory number of runs memory used time (min) 56 64 10 30 10 56 64 20 36 13 56 64 56 58 27
Monitor jobs/Delete/Kill jobs
https://hpc.nih.gov/docs/userguide.html#monitor
sjobs watch -n 30 jobload watch -n 30 "jobload | tail" scancel -u XXXXX scancel NNNNN scancel --name=JobName scancel --state=PENDING scancel --state=RUNNING squeue -u XXXX jobhist 17500 # report the CPU and memory usage of completed jobs.
The other two commands are very useful too jobhist and swarmhist (temporary).
$ cat /usr/local/bin/swarmhist #!/bin/bash usage="usage: $0 jobid" jobid=$1 [[ -n $jobid ]] || { echo $usage; exit 1; } ret=$(grep "jobid=$jobid" /usr/local/logs/swarm_on_slurm.log) [[ -n $ret ]] || { echo "no swarm found for jobid = $jobid"; exit; } echo $ret | tr ';' '\n' $ jobhist 22038972 $ swarmhist 22038972 date=(SKIP) host=(SKIP) jobid=22038972 user=(SKIP) pwd=(SKIP) ncmds=1 soptions=--array=0-0 --job-name=swarm --output(SKIP) njobs=1 job-name=swarm command=/usr/local/bin/swarm -t 16 -g 20 -f outputs/run_seqtools_vc --module samtools,picard --verbose 1
Show how busy is one node: jobload -n cnXXXX
This will show many jobs running in a node. It'll show USER, JOBID, NODECPUS, TOTALCPUS and TOTALNODES (1).
Show properties of a node: freen -n
Use freen -n.
This is helpful if we want to know the node that is allocated from the output (Nodelist column) of the sjobs command.
$ freen -n ........Per-Node Resources........ Partition FreeNds FreeCPUs Cores CPUs Mem Disk Features Nodelist ---------------------------------------------------------------------------------------------------------------------------------- norm* 160/454 17562/25424 28 56 248g 400g cpu56,core28,g256,ssd400,x2695,ibfdr cn[1721-2203,2900-2955] norm* 0/476 5900/26656 28 56 250g 800g cpu56,core28,g256,ssd800,x2680,ibfdr cn[3092-3631] norm* 278/309 8928/9888 16 32 123g 800g cpu32,core16,g128,ssd800,x2650,10g cn[0001-0310] norm* 281/281 4496/4496 8 16 21g 200g cpu16,core8,g24,sata200,x5550,1g cn[2589-2782,2799-2899] norm* 10/10 160/160 8 16 68g 200g cpu16,core8,g72,sata200,x5550,1g cn[2783-2798] ... $ sjobs User JobId JobNa Part St Reason Runtime Walltime Nodes CPUs Memory Dependey Nodelist ============================================================================================================================= XXX 51944300 sinteracti interactive R 1:13:32 8:00:00 1 16 32 GB cn0862 XXX 51946396 myjob norm R 2:57 12:00:00 1 10 32 GB cn0925 =============================================================================================================================
Exit code
https://hpc.nih.gov/docs/b2-userguide.html#exitcodes
Walltime limits
$ batchlim Max jobs per user: 4000 Max array size: 1001 Partition MaxCPUsPerUser DefWalltime MaxWalltime --------------------------------------------------------------- norm 7360 02:00:00 10-00:00:00 multinode 7560 08:00:00 10-00:00:00 turbo qos 15064 08:00:00 interactive 64 08:00:00 1-12:00:00 (3 simultaneous jobs) quick 6144 02:00:00 04:00:00 largemem 512 02:00:00 10-00:00:00 gpu 728 02:00:00 10-00:00:00 (56 GPUs per user) unlimited 128 UNLIMITED UNLIMITED student 32 02:00:00 08:00:00 (2 GPUs per user) ccr 3072 04:00:00 10-00:00:00 ccrgpu 448 04:00:00 10-00:00:00 (32 GPUs per user) forgo 5760 1-00:00:00 3-00:00:00
Interactive debugging
Default is 2 CPUs, 4G memory (too small) and 8 hours walltime.
Increase them to 60 GB and more cores if we run something like STAR for rna-seq reads alignment.
sinteractive --mem=32g -c 16 --gres=lscratch:100 --time=24:00:00
The '--gres' option will allocate a local disk, 100GB in this case. The local disk directory will be /lscratch/$SLURM_JOBID.
For RStudio, the example shows to allocate 5GB of temporary space.
For R, it also recommends to allocate a minimal amount of lscratch of 1GB plus whatever lscratch storage is required by your code.
ssh cnXXX
If we have an interactive job on a certain node, we can directly ssh into that node to check for example the /lscratch/JobID usage. The JobID can be obtained from the jobload command.
We can also use freen -n to see some properties of the node.
Parallel jobs
Parallel (MPI) jobs that run on more than 1 node: Use the environment variable $SLURM_NTASKS within the script to specify the number of MPI processes.
Reproduciblity/Pipeline
Singularity
- https://hpc.nih.gov/apps/singularity.html
- NCI Containers and Workflows Interest Group
- R and Singularity
Snakemake (and Singularity)
- https://snakemake.readthedocs.io/en/stable/
- Building a reproducible workflow with Snakemake and Singularity
Apps
Scientific Applications on NIH HPC Systems
R program
https://hpc.nih.gov/apps/R.html
Find available R versions:
module -r avail '^R$'
where -r means to use regular expression match. This will match "R/3.5.2" or "R/3.5" but not "Rstudio/1.1.447".
(Self-installed) R package directory
On our systems, the default path to the library is ~/R/<ver>/library where where ver is the two digit version of R (e.g. 3.5). However, R won't automatically create that directory and in its absence will try to install to the central packge library which will fail. To install packages in your home directory manually create ~/R/<ver>/library first.
The directory ~/R/x86_64-pc-linux-gnu-library/ was not used anymore in Biowulf.
RStudio
- Connecting to Biowulf with NoMachine (NX)
- It works though the resolution is not great on a Mac screen. This assume we are in the NIH network.
- One thing is about adjusting the window size. Click Ctrl + Alt + 0 to bring up the setting if we accept all the default options when we connect to a remote machine. Click 'Resize remote display'. Now we can use mouse to drag and adjust the display size. This will keep the resolution as it showed up originally. It is easier when I choose the 'key-based authorization method with a key you provide'.
- Desktop environment is XFCE ($DESKTOP_SESSION)
- RStudio on Biowulf.
- We need to open a terminal and follow the instruction there.
- The desktop IDE program is installed in /usr/local/apps/rstudio/rstudio-1.4.1103/.
- Problems: The copy/paste keyboard shortcuts do not work even I checked the option "Grab the keyboard input". Current resolution: Use two ssh connections (one to run R, and another one to edit file). Open noMachine and use the desktop to view graph files (gv XXX.pdf OR change the default program to 'GNU GV Postscript/PDF Viewer' from the default 'LibreOffice Draw').
sinteractive --mem=6g --gres=lscratch:5 module load Rstudio R rstudio &
Visual Studio Code
- VS Code on Biowulf. Install the "Remote Development" VScode extension.
- Run Jupyter notebook on a compute node in VS Code.
- Remote Development using SSH. Basically once we connect to a remote host, a new VScode instance will be created and we can open any files from the remote host on VScode.
Exit an SSH session
Today after I issued a "cd /data/USERNAME" command, it just hung there even the Ctrl+c won't exit.
One solution is to close the current terminal. If we like to keep the current terminal, a solution is to open another terminal, give an SSH connection and run pkill -u USERNAME. This will also kill all the SSH connections including the current one.
SSH tunnel
https://hpc.nih.gov/docs/tunneling/
The use of interactive application servers (such as Jupyter notebooks) on Biowulf compute nodes requires establishing SSH tunnels to make the service accessible to your local workstation.
Jupyter
Terminal customization
- ssh add authorized_keys
- ~/.bashrc:
- change PS1
- add an alias for nano
- ~/.bash_profile: no change
- ~/.nanorc, ~/r.nanorc and ~/bin/nano/bin/nano (4.2)
- ~/.emacs: global-display-line-numbers-mode
- ~/.vimrc: set number
- .Rprofile: options(editor="emacs")
- .bash_logout: no change