Biowulf

From 太極
Jump to navigation Jump to search
  • Biowulf User Guide. Note that Biowulf2 runs Centos (RedHat) 6.x. (Biowulf1 is at Centos 5.x)

Swarm fig 1.png

helix vs biowulf

  • https://helix.nih.gov/ Helix is an interactive system for short jobs. Moving large data transfers to Helix, which is now designated for interactive data transfers. For instance, use Helix when transferring hundreds of gigabytes of data or more using any of these commands: cp, scp, rsync, sftp, ascp, etc.
  • https://hpc.nih.gov/docs/rhel7.html#helix Helix transitioned to becoming the dedicated interactive data transfer and file management node [1] and its hardware was later upgraded to support this role [2]. Running processes such as scp, rsync, tar, and gzip on the Biowulf login node has been discouraged ever since.

Linux distribution

$ ls /etc/*release  # login mode
$ cat /etc/redhat-release  
Red Hat Enterprise Linux Server release 6.8 (Santiago)

$ sinteractive      # switch to biowulf2 computing nodes
$ cat /etc/redhat-release 
CentOS release 6.6 (Final)
$ cat /etc/centos-release 
CentOS release 6.6 (Final)

Training notes

Storage

https://hpc.nih.gov/storage/

/scratch and /lscratch

The /scratch area on Biowulf is a large, low-performance shared area meant for the storage of temporary files.

  • Each user can store up to a maximum of 10 TB in /scratch. However, 10 TB of space is not guaranteed to be available at any particular time.
  • If the /scratch area is more than 80% full, the HPC staff will delete files as needed, even if they are less than 10 days old.
  • Files in /scratch are automatically deleted 10 days after last access.
  • Touching files to update their access times is inappropriate and the HPC staff will monitor for any such activity.
  • Use /lscratch (not /scratch) when data is to be accessed from large numbers of compute nodes or large swarms.
  • The central /scratch area should NEVER be used as a temporary directory for applications -- use /lscratch instead.
  • Running RStudio interactively. It is generally recommended to allocate at least a small amount of lscratch for temporary storage for R.

Transfer files

User Dashboard

https://hpc.nih.gov/dashboard/

Dashboard

User dashboard Unlock account, disk usage, job info

Quota

checkquota

Environment modules

# What modules are available
module avail
module -d avail # default
module avail STAR
module spider bed # search by a case-insensitive keyword

# Load a module
module list # loaded modules
module load STAR
module load STAR/2.4.1a
module load plinkseq macs bowtie # load multiple modules
# if we try to load a module in a bash script, we can use the following
module load STAR || exit 1

# Unload a module
module unload STAR/2.4.1a

# Switch to a different version of an application
# If you load a module, then load another version of the same module, the first one will be unloaded.

# Examine a modulefile
$ module display STAR
-----------------------------------------------------------------------------
   /usr/local/lmod/modulefiles/STAR/2.5.1b.lua:
-----------------------------------------------------------------------------
help([[This module sets up the environment for using STAR.
Index files can be found in /fdb/STAR
]])
whatis("STAR: ultrafast universal RNA-seq aligner")
whatis("Version: 2.5.1b")
prepend_path("PATH","/usr/local/apps/STAR/2.5.1b/bin")

# Set up personal modulefiles

# Using modules in scripts

# Shared Modules

Single file - sbatch

Don't use the swarm command on a single script file since swarm will treat each line of the script file as an independent command.

sbatch --cpus-per-task=2 --mem=4g --time=24:00:00 MYSCRIPT
# Use --time=24:00:00 to increase the wall time from the default 2 hours

An example of the script file (Slurm environment variable $SLURM_CPUS_PER_TASK within your script was used to specify the number of threads to the program)

#!/bin/bash

module load novocraft
novoalign -c $SLURM_CPUS_PER_TASK  -f s_1_sequence.txt -d celegans -o SAM > out.sam

rslurm

rslurm package. Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.

Multiple files - swarm

swarm -f run.sh --time=24:00:00

swarm -t 3 -g 20 -f run_seqtools_vc.sh --module samtools,picard,bwa --verbose 1 --devel
# 3 commands run in 3 subjobs, each command requiring 20 gb and 3 threads, allocating 6 cores and 12 cpus
swarm -t 3 -g 20 -f run_seqtools_vc.sh --module samtools,picard,bwa --verbose 1

# To change the default walltime, use --time=24:00:00

swarm -t 8 -g 24 --module tophat,samtools,htseq -f run_master.sh
cat sw3n17156.o

Why the job is pending

Partition and freen

Biowulf nodes are grouped into partitions. A partition can be specified when submitting a job. The default partition is 'norm'. The freen command can be used to see free nodes and CPUs, and available types of nodes on each partition.

We may need to run swarm commands on non-default partitions. For example, not many free CPUs are available in 'norm' partition. Or Total time for bundled commands is greater than partition walltime limit. Or because the default partition norm has nodes with a maximum of 120GB memory.

We can run the swarm command on different partition (the default is 'norm'). For example, to run on b1 parition (the hardware in b1 looks inferior to norm)

swarm -f outputs/run_seqtools_dge_align -g 20 -t 16 --module tophat,samtools,htseq --time=6:00:00 --partition b1 --verbose 1

Below is an output from freen command.

                                           ........Per-Node Resources........  
Partition    FreeNds       FreeCPUs       Cores CPUs   Mem   Disk    Features
--------------------------------------------------------------------------------
norm*       0/301        1624/9488         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g
unlimited   3/12         202/384           16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g
niddk       1/82         228/2624          16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g,niddk
ibfdr       0/184        0/5888            16    32     60g   800g   cpu32,core16,g64,ssd800,x2650,ibfdr
ibqdr       1/95         32/3040           16    32     29g   400g   cpu32,core16,g32,sata400,x2600,ibqdr
ibqdr       51/89        1632/2848         16    32     60g   400g   cpu32,core16,g64,sata400,x2600,ibqdr
gpu         1/4          116/128           16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,gpuk20x,acemd
gpu         17/20        634/640           16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,gpuk20x
largemem    3/4          254/256           32    64   1007g   800g   cpu64,core32,g1024,ssd800,x4620,10g
nimh        58/60        1856/1920         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g,nimh
ccr         0/85         1540/2720         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g,ccr
ccr         0/63         598/2016          16    32    123g   400g   cpu32,core16,g128,sata400,x2600,1g,ccr
ccr         0/60         974/1920          16    32     60g   400g   cpu32,core16,g64,sata400,x2600,1g,ccr
ccrclin     4/4          128/128           16    32     60g   400g   cpu32,core16,g64,sata400,x2600,1g,ccr
quick       0/85         1540/2720         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g,ccr
quick       0/63         598/2016          16    32    123g   400g   cpu32,core16,g128,sata400,x2600,1g,ccr
quick       0/60         974/1920          16    32     60g   400g   cpu32,core16,g64,sata400,x2600,1g,ccr
quick       1/82         228/2624          16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g,niddk
quick       58/60        1856/1920         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g,nimh
b1          71/363       7584/8712         12    24     21g   100g   cpu24,core12,g24,sata100,x5660,1g
b1          3/287        3410/4592          8    16     21g   200g   cpu16,core8,g24,sata200,x5550,1g
b1          7/16         192/256            8    16     68g   200g   cpu16,core8,g72,sata200,x5550,1g
b1          5/20         300/640           16    32    250g   400g   cpu32,core16,g256,sata400,e2670,1g
b1          3/10         100/160            8    16     68g   100g   cpu16,core8,g72,sata100,x5550,1g

Running R scripts

https://hpc.nih.gov/apps/R.html

Running a swarm of R batch jobs on Biowulf

$ cat Rjobs
R --vanilla < /data/username/R/R1  > /data/username/R/R1.out
R --vanilla < /data/username/R/R2  > /data/username/R/R2.out

swarm -g 16 -f /home/username/Rjobs --module R. Pay attention to the default wall time (eg 2 hours) and various swarm options. See Swarm.

Parallelizing with parallel.

  • The following is modified from biowulf R's webpage. I change it so it works on non-biowulf system (like local Linux, Mac, or Windows).
  • The number of allocated CPUs and available memory are related.
    • Here I assume the Windows and Mac only has a modest RAM (eg 16GB) and the local Linux has enough RAM.
    • The RAM size on the local system can be obtained through the commented lines.
detectBatchCPUs <- function() { 
    ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK")) 
    if (is.na(ncores)) { 
        ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE")) 
    } 
    return(ncores) 
}
ncpus <- detectBatchCPUs() 
if (is.na(ncpus)) {
  # the system is not biowulf
  if (R.version$os == "linux-gnu") {
    # if it is linux, we assume there are enough ram
    # mem <- system('grep MemTotal /proc/meminfo', intern = TRUE)
    # mem <- strsplit(mem, " ")[[1]]
    # mem <- as.integer(mem[length(mem) -1])
    ncpus <- future::availableCores()
  } else ncpus <- 2
}

options(mc.cores = ncpus) 
mclapply(..., mc.cores = ncpus) 
makeCluster(ncpus)

Some experiences

freen command shows the maximum threads is 56 and the memory is 246GB.

When I run an R script (foreach is employed to loop over simulation runs), I find

  • Assign 56 threads can guarantee 56 simulations run at the same time (check by the jobload command).
  • We need to worry about the RAM size. The larger the threads, the more memory we need. If we don't assign enough memory, weird error message will be spit out.
  • Even assigning 56 threads can help to run 56 simulations at the same time, the actual execution time is longer than when I run fewer simulations.
allocated threads allocated memory number of runs memory used time (min)
56 64 10 30 10
56 64 20 36 13
56 64 56 58 27

Monitor jobs/Delete jobs

https://hpc.nih.gov/docs/userguide.html#monitor

sjobs
watch -n 30 jobload
scancel -u XXXXX
scancel NNNNN
squeue -u XXXX

jobhist 17500  # report the CPU and memory usage of completed jobs.

The other two commands are very useful too jobhist and swarmhist (temporary).

$ cat /usr/local/bin/swarmhist
#!/bin/bash
usage="usage: $0 jobid"
jobid=$1
[[ -n $jobid ]] || { echo $usage; exit 1; }
ret=$(grep "jobid=$jobid" /usr/local/logs/swarm_on_slurm.log)
[[ -n $ret ]] || { echo "no swarm found for jobid = $jobid"; exit; }
echo $ret | tr ';' '\n'

$ jobhist 22038972
$ swarmhist 22038972
date=(SKIP)
 host=(SKIP)
 jobid=22038972
 user=(SKIP)
 pwd=(SKIP)
 ncmds=1
 soptions=--array=0-0 --job-name=swarm --output(SKIP)
 njobs=1
 job-name=swarm
 command=/usr/local/bin/swarm -t 16 -g 20 -f outputs/run_seqtools_vc --module samtools,picard --verbose 1

Show properties of a node

Use freen -n.

This is helpful if we want to know the node that is allocated from the output (Nodelist column) of the sjobs command.

$ freen -n
                                           ........Per-Node Resources........  
Partition   FreeNds       FreeCPUs     Cores CPUs   Mem    Disk    Features                                            Nodelist
----------------------------------------------------------------------------------------------------------------------------------
norm*       160/454      17562/25424       28    56    248g   400g   cpu56,core28,g256,ssd400,x2695,ibfdr            cn[1721-2203,2900-2955]
norm*       0/476        5900/26656        28    56    250g   800g   cpu56,core28,g256,ssd800,x2680,ibfdr            cn[3092-3631]       
norm*       278/309      8928/9888         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g              cn[0001-0310]       
norm*       281/281      4496/4496          8    16     21g   200g   cpu16,core8,g24,sata200,x5550,1g                cn[2589-2782,2799-2899]
norm*       10/10        160/160            8    16     68g   200g   cpu16,core8,g72,sata200,x5550,1g                cn[2783-2798]       
...

Exit code

https://hpc.nih.gov/docs/b2-userguide.html#exitcodes

Local disk and temporary files

See https://hpc.nih.gov/docs/b2-userguide.html#local and https://hpc.nih.gov/storage/

Walltime limits

$ batchlim
Max jobs per user: 4000
Max array size:    1001

Partition        MaxCPUsPerUser     DefWalltime     MaxWalltime                
---------------------------------------------------------------
norm                     7360         02:00:00     10-00:00:00 
multinode                7560         08:00:00     10-00:00:00 
	turbo qos        15064                         08:00:00
interactive                64         08:00:00      1-12:00:00 (3 simultaneous jobs)
quick                    6144         02:00:00        04:00:00 
largemem                  512         02:00:00     10-00:00:00 
gpu                       728         02:00:00     10-00:00:00 (56 GPUs per user)
unlimited                 128        UNLIMITED       UNLIMITED 
student                    32         02:00:00        08:00:00 (2 GPUs per user)
ccr                      3072         04:00:00     10-00:00:00 
ccrgpu                    448         04:00:00     10-00:00:00 (32 GPUs per user)
forgo                    5760       1-00:00:00      3-00:00:00 

Interactive debugging

Default is 2 CPUs, 4G memory (too small) and 8 hours walltime.

Increase them to 60 GB and more cores if we run something like STAR for rna-seq reads alignment.

sinteractive --mem=32g -c 16 --gres=lscratch:100

The '--gres' option will allocate a local disk, 100GB in this case. The local disk directory will be /lscratch/$SLURM_JOBID.

Parallel jobs

Parallel (MPI) jobs that run on more than 1 node: Use the environment variable $SLURM_NTASKS within the script to specify the number of MPI processes.

Reproduciblity/Pipeline

Singularity

Snakemake (and Singularity)

R program

https://hpc.nih.gov/apps/R.html

Find available R versions:

module -r avail '^R$'

where -r means to use regular expression match. This will match "R/3.5.2" or "R/3.5" but not "Rstudio/1.1.447".

(Self-installed) R package directory

On our systems, the default path to the library is ~/R/<ver>/library where where ver is the two digit version of R (e.g. 3.5). However, R won't automatically create that directory and in its absence will try to install to the central packge library which will fail. To install packages in your home directory manually create ~/R/<ver>/library first.

The directory ~/R/x86_64-pc-linux-gnu-library/ was not used anymore in Biowulf.

SSH tunnel

https://hpc.nih.gov/docs/tunneling/

The use of interactive application servers (such as Jupyter notebooks) on Biowulf compute nodes requires establishing SSH tunnels to make the service accessible to your local workstation.

Terminal customization

  • ssh add authorized_keys
  • ~/.bashrc:
    • change PS1
    • add an alias for nano
  • ~/.bash_profile: no change
  • ~/.nanorc, ~/r.nanorc and ~/bin/nano/bin/nano (4.2)
  • ~/.emacs: global-display-line-numbers-mode
  • ~/.vimrc: set number
  • .Rprofile: options(editor="emacs")
  • .bash_logout: no change

tmux for keeping SSH sessions

tmux