Linux Programming
Shell Programming
Some resource
Redirect
Redirecting output. File descriptor number 1 (2) means standard output (error).
./myProgram > stdout.txt # redirect std out to <stdout.txt> ./myProgram 2> stderr.txt # redirect std err to <stderr.txt> by using the 2> operator ./myProgram > stdout.txt 2> stderr.txt # combination of above two ./myProgram > stdout.txt 2>&1 # redirect std err to std out <stdout.txt> ./myProgram >& /dev/null # prevent writing std out and std err to the screen ps >> outptu.txt # append
Redirecting input
./myProgram < input.txt
>&
&> file is not part of the official POSIX shell spec, but has been added to many Bourne shells as a convenience extension (it originally comes from csh). In a portable shell script (and if you don't need portability, why are you writing a shell script?), use > file 2>&1 only.
Redirect Output and Errors To /dev/null
http://www.cyberciti.biz/faq/how-to-redirect-output-and-errors-to-devnull/
command > /dev/null 2>&1 # OR command &>/dev/null
tee -redirect to both a file and the screen same time
To redirect to both a file and the screen the same time, use tee command. See
- http://www.cyberciti.biz/faq/linux-redirect-error-output-to-file/
- http://www.cyberciti.biz/faq/saving-stdout-stderr-into-separate-files/
- https://en.wikipedia.org/wiki/Tee_(command)
command1 |& tee log.txt ## or ## command1 -arg |& tee log.txt ## or ## command1 2>&1 | tee log.txt
Pipe
The operator is |.
ps > psout.txt sort psout.txt > pssort.out
can be simplified to
ps | sort > pssort.out
For example,
$ head /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync $cat /etc/passwd | cut -d: -f7 | sort | uniq -c | sort -nr 18 /bin/sh 13 /bin/false 2 /bin/bash 1 /bin/sync
where cut command will extract the 7th field separated by the : character and write to the output stream. sort command will sort alphabetically sorts the line it reads from its input and returns the new sort to its output. The uniq command will remove and count duplicated lines. The final sort command will sort its input numerically in reverse order.
Process substitution
https://en.wikipedia.org/wiki/Process_substitution
Powerfulness of pipes
Consider the following commands
samtools mpileup -go temp.bcf -uf genome.fa dedup.bam bcftools call -vmO v -o sample1_raw.vcf temp.bcf
The disadvantage of this approach is it will create a temporary file (temp.bcf in this case). If the size of the temporary file is enormous large (several hundred of GB), it will waste/eat up the hard disk space no to say the time used to create the temporary file. If we use pipes, we can save the time and disk space of the temporary file.
samtools mpileup -uf genome.fa dedup.bam | bcftools call -vmO v -o sample1_raw.vcf
Pipe vs redirect
- Pipe is used to pass output to another program or utility.
- Redirect is used to pass output to either a file or stream.
In other words, thing1 | thing2 does the same thing as thing1 > temp_file && thing2 < temp_file.
Shebang (#!)
A shebang is the character sequence consisting of the characters number sign and exclamation mark (that is, "#!") at the beginning of a script. See the Wikipedia page.
The syntax looks like
#! interpreter [optional-arg]
For example,
#!/bin/sh
— Execute the file using sh, the Bourne shell, or a compatible shell#!/bin/csh -f
— Execute the file using csh, the C shell, or a compatible shell, and suppress the execution of the user’s .cshrc file on startup#!/usr/bin/perl -T
— Execute using Perl with the option for taint checks
Comments
For a single line, we can use the '#' sign.
For a block of code, we use
#!/bin/bash echo before comment : <<'END' bla bla blurfl END echo after comment
Variables
food=Banana echo $food food="Apple" echo $food
Environment variables
$HOME $PATH $0 -- name of the shell script $# -- number of parameters passed $$ process ID of the shell script, often used inside a script for generating unique temp filenames $? -- the exit value of the last run command
For example,
#!/bin/sh /usr/local/bin/my-command if [ "$?" -ne "0" ]; then echo "Sorry, we had a problem there!" fi
Parameter variables
$1, $2, .... -- parameters given to the script $* -- list of all the parameters, in a single variable $@ -- subtle variation on $*. $! -- the process id of the last command run in the background.
For example,
$ touch /tmp/tmpfile_$$ $ set foo bar bam $ echo $# 3 $ echo $@ foo bar bam $ set foo bar bam & [1] 28212 $ echo $! 28212 [1]+ Done set foo bar bam
We can also use parentheses around the variable name.
QT_ARCH=x86_64 QT_SDK_BINARY=QtSDK-4.8.0-${QT_ARCH}.tar.gz QT_SD_URL=https://xxx.com/$QT_SDK_BINARY
Conditions
We can use the test command to check if a file exists. The command is test -f <filename>.
[] is just the same as writing test, and would always leave a space after the test word.
if test -f fred.c; then ...; fi if [ -f fred.c ] then ... fi if [ -f fred.c ]; then ... fi
Arithmetic comparison
expr1 -eq expr2 ==> check equal expr1 -ne expr2 ==> check not equal expr1 -gt expr2 ==> expr1 > expr2 expr1 -ge expr2 ==> expr1 >= expr2 expr1 -lt expr2 ==> expr1 < expr2 expr1 -le expr2 ==> expr1 <= expr2 ! expr ==> opposite of expr
File conditionals
-d file ==> True if the file is a directory -e file ==> True if the file exists -f file ==> True if the file is a regular file -r file ==> True if the file is readable -s file ==> True if the file has non-zero size -w file ==> True if the file is writable -x file ==> True if the file is executable
Example: Suppose we want to know if the first argument (if given) match a specific string. We can use (note the space before and after '==')
#!/bin/bash if [ $1 == "console" ]; then echo 'Console' else echo 'Non-console' fi
Control Structures
if
if condition then statements elif [ condition ]; then statements else statements fi
For example, we can run a cp command if two files are different.
if ! cmp -s "$filesrc" "$filecur" then cp $filesrc $filecur fi
while
while condition do statements done
until
until condition do statements done
AND list
statement1 && statement2 && statement3 && ...
If command1 finishes successfully then run command2.
OR list
statement1 || statement2 || statement3 || ...
If command1 fails then run command2.
For example,
codename=$(lsb_release -s -c) if [ $codename == "rafaela" ] || [ $codename == "rosa" ]; then codename="trusty" fi
for
for variable in values do statements done
Example 1
To convert pdfs to tifs using ImageMagick (for looping over files, check cyberciti.biz)
outdir="../plosone" indir="../fig" if [[ ! -d $outdir ]]; then mkdir $outdir fi in=(file1.pdf file2.pdf file3.pdf) for (( i=0; i<${#in[@]} ; i++ )) do convert -strip -units PixelsPerInch -density 300 -resample 300 \ -alpha off -colorspace RGB -depth 8 -trim -bordercolor white \ -border 1% -resize '2049x2758>' -resize '980x980<' +repage \ -compress lzw $indir/${in[$i]} $outdir/Figure$[$i+1].tiff done
Example 2
A second example is to download all the (Ontario gasoline price) data with wget and parsing and concatenating the data with other *nix tools like 'sed':
# Download data for i in $(seq 1990 2014) do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv done # Retain the header head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv # Loop over the files and use sed to extract the relevant lines for i in $(seq 1990 2014) do tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv done
Example 3
Download all 20 sra files (60GB in total) from SRP032789.
for x in $(seq 1027175 1027180) do wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP032/SRP032789/SRR$x/SRR$x.sra done
Example 4
Convert all files from DOS to Unix format
for f in *.txt; do tr -d '\r' < $f > tmp.txt; mv tmp.txt $f ; done # Or for file in $*; do tr -d '\r' < $f > tmp.txt; mv tmp.txt $f ; done
Example 5
Include all files in a directory
for f in /etc/*.conf do echo "$f" done
Functions
set -e and set -x
Exit immediately if a command exits with a non-zero status. Type help set in command line. Very useful!
See
Commands
break ==> escaping from an enclosing for, while or until loop : ==> null command continue ==> make the enclosing for, while or until loo continue at the next iteration . ==> executes the command in the current shell eval ==> evaluate arguments exec ==> replacing the current shell with a different program export ==> make the variable named as its parameter available in subshells expr ==> evaluate its arguments as an expression printf ==> similar to echo set ==> sets the parameter variables for the shell. Useful for using fields in commands that output spaced-separated values shift ==> moves all the parameter variables down by one. trap ==> specify the actions to take on receipt of signals. unset ==> remove variables or functions from the environment.
trap
- http://www.computerhope.com/unix/utrap.htm
- http://linuxcommand.org/wss0160.php
- http://www.tutorialspoint.com/unix/unix-signals-traps.htm
- http://www.ibm.com/developerworks/aix/library/au-usingtraps/
- http://bash.cyberciti.biz/guide/Trap_statement
- http://steve-parker.org/sh/trap.shtml (trap with a user-defined function)
- http://www.turnkeylinux.org/blog/shell-error-handling (set -e)
- http://unix.stackexchange.com/questions/17314/what-is-signal-0-in-a-trap-command (do something on EXIT)
- http://unix.stackexchange.com/questions/79648/how-to-trigger-error-using-trap-command
The syntax to use trap command is
trap command signal
For example,
$ cat traptest.sh #!/bin/sh trap 'rm -f /tmp/tmp_file_$$' INT echo creating file /tmp/tmp_file_$$ date > /tmp/tmp_file_$$ echo 'press interrupt to interrupt ...' while [ -f /tmp/tmp_file_$$ ]; do echo file exists sleep 1 done echo the file no longer exists trap - INT echo creaing file /tmp/tmp_file_$$ date > /tmp/tmp_file_$$ echo 'press interrupt to interrupt ...' while [ -f /tmp/tmp_file_$$ ]; do echo file exists sleep 1 done echo we never get here exit 0
will get an output like
$ ./traptest.sh creating file /tmp/tmp_file_21389 press interrupt to interrupt ... file exists file exists ^Cthe file no longer exists creaing file /tmp/tmp_file_21389 press interrupt to interrupt ... file exists file exists ^C
The first when we use trap, it will delete the file when we hit Ctrl+C. The second time when we use trap, we do not specify any command to be exected when an INT signal occurs. So the default behavior occurs. That is, the final echo and exit statements are never executed.
Note that the following two are different.
trap - INT trap '' INT
The second command will IGNORE signals (Ctrl+C in this case) so if we apply this statement above, we will not be able to use Ctrl+C to kill the execution.
Command Execution
$(command) `command` # ` is a backquote/backtick, not a single quotation sign # Example sudo apt-get install linux-headers-$(uname -r)
Note all new scripts should use the $(...) form, which was introduced to avoid some rather complex rules.
Example
#!/bin/sh echo The current directory is $PWD echo The current users are $(who) sudo chown `id -u` SomeDir # change the ownership to the current user. Dangerous! # Or sudo chown `whoami` SomeDirOrSomeFile exit 0
Note that $(your expression) is a better way as it allows you to run nest expressions. For example,
cd $(dirname $(type -P touch))
will cd you into the directory containing the 'touch' command.
The concept of putting the result of a command into a script variable is very powerful, as it makes it easy to use existing commands in scripts and capture their output.
Arithmetic Expansion
$((...))
is a better alternative to the expr command. More examples:
for i in $(seq 1 3) do echo SRR$(( i + 1027170 ))'_1'.fastq done
Note that the single quote above is required. The above will output SRR1027171_1.fastq, SRR102172_1.fastq and SRR1027173_1.fastq.
Parameter Expansion
${parameter}
Bash shell find out if a command exists or not
http://www.cyberciti.biz/faq/unix-linux-shell-find-out-posixcommand-exists-or-not/
POSIX command
# command -v will return >0 when the command1 is not found command -v command1 >/dev/null && echo "command1 Found In \$PATH" || echo "command1 Not Found in \$PATH" $ help command command: command [-pVv] command [arg ...] Execute a simple command or display information about commands. Runs COMMAND with ARGS suppressing shell function lookup, or display information about the specified COMMANDs. Can be used to invoke commands on disk when a function with the same name exists. Options: -p use a default value for PATH that is guaranteed to find all of the standard utilities -v print a description of COMMAND similar to the `type' builtin -V print a more verbose description of each COMMAND Exit Status: Returns exit status of COMMAND, or failure if COMMAND is not found.
type -P
type -P command1 &>/dev/null && echo "Found" || echo "Not Found" $ help type type: type [-afptP] name [name ...] Display information about command type. For each NAME, indicate how it would be interpreted if used as a command name. Options: -a display all locations containing an executable named NAME; includes aliases, builtins, and functions, if and only if the `-p' option is not also used -f suppress shell function lookup -P force a PATH search for each NAME, even if it is an alias, builtin, or function, and returns the name of the disk file that would be executed -p returns either the name of the disk file that would be executed, or nothing if `type -t NAME' would not return `file'. -t output a single word which is one of `alias', `keyword', `function', `builtin', `file' or `', if NAME is an alias, shell reserved word, shell function, shell builtin, disk file, or not found, respectively Arguments: NAME Command name to be interpreted. Exit Status: Returns success if all of the NAMEs are found; fails if any are not found. typeset: typeset [-aAfFgilrtux] [-p] name[=value] ... Set variable values and attributes. Obsolete. See `help declare'.
pause by read -p command
http://www.cyberciti.biz/tips/linux-unix-pause-command.html
read -p "Press [Enter] key to start backup..."
If we want to ask users about a yes/no question, we can use this method
while true; do read -p "Do you wish to install this program? " yn case $yn in [Yy]* ) make install; break;; [Nn]* ) exit;; * ) echo "Please answer yes or no.";; esac done
OR
echo "Do you wish to install this program?" select yn in "Yes" "No"; do case $yn in Yes ) make install; break;; No ) exit;; esac done
Keyboard input and Arithmetic
http://linuxcommand.org/wss0110.php
read
#!/bin/bash echo -n "Enter some text > " read text echo "You entered: $text"
Arithmetic
#!/bin/bash # An applications of the simple command # echo $((2+2)) # That is, when you surround an arithmetic expression with the double parentheses, # the shell will perform arithmetic evaluation. first_num=0 second_num=0 echo -n "Enter the first number --> " read first_num echo -n "Enter the second number -> " read second_num echo "first number + second number = $((first_num + second_num))" echo "first number - second number = $((first_num - second_num))" echo "first number * second number = $((first_num * second_num))" echo "first number / second number = $((first_num / second_num))" echo "first number % second number = $((first_num % second_num))" echo "first number raised to the" echo "power of the second number = $((first_num ** second_num))"
and a program that formats an arbitrary number of seconds into hours and minutes:
#!/bin/bash seconds=0 echo -n "Enter number of seconds > " read seconds # use the division operator to get the quotient hours=$((seconds / 3600)) # use the modulo operator to get the remainder seconds=$((seconds % 3600)) minutes=$((seconds / 60)) seconds=$((seconds % 60)) echo "$hours hour(s) $minutes minute(s) $seconds second(s)"
Here documents
Debugging Scripts
http://www.cyberciti.biz/tips/debugging-shell-script.html
- Run a shell script with -x option. Then each lines of the script will be shown on the stdout. We can see which line takes long time or which lines broke the code (it still runs through the script).
$ bash -x script-name
- Use of set builtin command
- Use of intelligent DEBUG function
To run a bash script line by line:
- Bash Debugger
- Use Geany. See the next session.
Geany
- (Ubuntu 12.04 only): By default, it does not have the terminal tab. Install virtual terminal emulator. Run
sudo apt-get install libvte-dev
- Step 1: Keyboard shortcut. Select a region of code. Edit -> >Commands->Send selection to Terminal. You can also assign a keybinding for this. To do so: go to Edit->Preferences and pick the Keybindings tab. See a screenshot here. I assign F12 (no any quote) for the shortcut. This is a complete list of the keybindings.
- Step 2: Newline character. Another issue is that the last line of sent code does not have a newline character. So I need to switch to the Terminal and press Enter. The solution is to modify the <geany.conf> (find its location using locate geany.conf. On my ubuntu 14 (geany 1.26), it is under ~/.config/geany/geany.conf) and set send_selection_unsafe=true. See here.
- Step 3: PATH variable.
$ tmpname=$(basename $inputVCF) Command 'basename' is available in '/usr/bin/basename' The command could not be located because '/usr/bin' is not included in the PATH environment variable.
The solution is to run PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin in the Terminal window before running our script.
- Step 4 (optional): Change background color.
Another handy change to geany is to change its background to black. To do that, go to Edit -> Preferences -> Editor. Once on the Editor options level, select the Display tab to the far right of the dialog, and you will notice a checkbox marked invert syntax highlighting colors.
See this post about changing the default terminal in the Terminal window. The default is xterm (see the output of echo $TERM).
Examples
- <upgrade8.sh> file from BioLinux installation page
Text processing
Extract columns or fields from text files: cut
http://www.thegeekstuff.com/2013/06/cut-command-examples/
To extract fixed columns (say columns 5-7 of a file):
cut -c5-7 somefile
If the field delimiter is different from TAB you need to specify it using -d:
cut -d' ' -f100-105 myfile > outfile # cut -d: -f6 somefile # colon-delimited file # grep "/bin/bash" /etc/passwd | cut -d':' -f1-4,6,7 # field 1 through 4, 6 and 7
Extract columns from text files: awk
awk -F : '{print $6}' somefile # colon-delimited file # awk --field-searator="\\t" '{print $6}' filename # tab-delimited # awk -F":" '{ print $1 " " $3 }' /etc/passwd
Substitution of text: sed (stream editor)
Substitution of text: perl
- Add or remove 'chr' from vcf file https://www.biostars.org/p/18530/
How to wrap a long linux command
Use backslash character. However, make sure the backslash character is the last character at a line. For example the first example below does not work since there is an extra space character after \.
Example 1 (not work)
sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \ libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \ libcups2-dev libpulse-dev libudev-dev
vs example 2 (work)
sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \ libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \ libcups2-dev libpulse-dev libudev-dev
Working with Files
Low-level File Access
- file descriptors: 0 means standard input, 1 means standard output, 2 means standard error.
- size_t write(int fildes, const void *buf, size_t nbytes);
#include <unistd.h> #include <stdlib.h> int main() { if ((write(1, "Here is some data\n", 18)) != 17) write(2, "A write error has occurred on file descriptor\n", 46); exit(0); }
- size_t read(int fildes, void *buf, size_t nbytes); returns the number of data bytes actually read. If a read call returns 0, it had nothing to read; it reached the end of the file. An error on the call will cause it to return -1.
- To create a new file descriptor we use the open system call. int open(const char *path, int oflags, mode_t mode);
- The next program will do file copy.
#include <unistd.h> #include <sys/stat.h> #include <fcntl.h> #include <stdlib.h> int main() { char c; int in, out; in = open("file.in", O_RDONLY); out = open("file.out", O_WRONLY|O_CREAT, S_IRUSER|S_IWUSR); while(read(in,&c,1) == 1) write(out,&c,1) exit(0); }
The Standard I/O Library
- fopen, fclose
- fread, fwrite
- fflush
- fseek
- fgetc, getc, getchar
- fputc, putc, putchar
- fgets, gets
- printf, fprintf and sprintf
- scanf, fscanf and sscanf
Formatted Input and Output
- prinf, fprintf and sprintf
- scanf, fscanf and sscanf
Stream Errors
File and Directory Maintenance
Scanning Directories
- opendir, closedir
- readdir
- telldir
- seekdir
UNIX environment
Logging
Resources and Limits
Terminals
Reading from and Writing to the Terminal
The termios Structure
Terminal Output
Detecting Keystokes
Curses
A technique between command line and full GUI.
Example: vi.
Data Management
Development Tools
make and Makefiles
- minimal make A minimal tutorial on make from Karl Broman.
- http://makefiletutorial.com/index.html