Linux Programming: Difference between revisions

From 太極
Jump to navigation Jump to search
 
(179 intermediate revisions by the same user not shown)
Line 3: Line 3:
* [https://hpc.nih.gov/training/handouts/BashScripting.pptx Bash shell scripting for Helix and Biowulf]
* [https://hpc.nih.gov/training/handouts/BashScripting.pptx Bash shell scripting for Helix and Biowulf]
* [http://google.github.io/styleguide/shell.xml Shell Style Guide] from Google
* [http://google.github.io/styleguide/shell.xml Shell Style Guide] from Google
* http://explainshell.com/
* http://learnshell.org/
* http://learnshell.org/
* http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
* http://tldp.org '''T'''he '''L'''inux '''D'''ocumentation '''P'''roject
** [http://tldp.org/LDP/Bash-Beginners-Guide/html/index.html Bash Guide for Beginners]
** [http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html BASH Programming] - Introduction HOW-TO
** [http://tldp.org/LDP/abs/html/index.html Advanced Bash-Scripting Guide]
* [https://bash.cyberciti.biz/guide/Main_Page Linux Shell Scripting Tutorial] from cyberciti.biz
* [https://bash.cyberciti.biz/guide/Main_Page Linux Shell Scripting Tutorial] from cyberciti.biz
* [http://www.tecmint.com/enable-shell-debug-mode-linux/ Shell debugging]
* [http://www.tecmint.com/enable-shell-debug-mode-linux/ Shell debugging]
* [https://www.tecmint.com/useful-tips-for-writing-bash-scripts-in-linux/ 10 Useful Tips for Writing Effective Bash Scripts in Linux]
* [https://www.tecmint.com/useful-tips-for-writing-bash-scripts-in-linux/ 10 Useful Tips for Writing Effective Bash Scripts in Linux]
* [https://zwischenzugs.com/2018/01/06/ten-things-i-wish-id-known-about-bash/ Ten Things I Wish I’d Known About bash] & [https://leanpub.com/learnbashthehardway Learn Bash the Hard Way] $4.99
* [https://opensource.com/article/20/1/improve-bash-scripts 5 ways to improve your Bash scripts]
=== Understand shell command options ===
[http://explainshell.com/ explainshell.com]. For example, https://explainshell.com/explain?cmd=rsync+-avz+--progress+--partial+-e


=== Check shell scripts ===
=== Check shell scripts ===
[https://www.howtogeek.com/788955/how-to-validate-the-syntax-of-a-linux-bash-script-before-running-it/ How To Validate the Syntax of a Linux Bash Script Before Running It]
[http://www.shellcheck.net/ ShellCheck] & download the binary from [https://launchpad.net/ubuntu/+source/shellcheck Launchpad].
[http://www.shellcheck.net/ ShellCheck] & download the binary from [https://launchpad.net/ubuntu/+source/shellcheck Launchpad].


If a statement missed a single quote the shell may show an error on a different line (though the error message is still useful). Therefore it is useful to verify the syntax of the script first before running it.
If a statement missed a single quote the shell may show an error on a different line (though the error message is still useful). Therefore it is useful to verify the syntax of the script first before running it.
=== Writing Secure Shell Scripts ===
[https://www.linuxjournal.com/content/writing-secure-shell-scripts Writing Secure Shell Scripts]
=== Bioinformatics ===
[https://github.com/stephenturner/oneliners Bioinformatics one-liners]
=== Data science ===
[https://datascienceatthecommandline.com/2e/chapter-4-creating-command-line-tools.html Data Science at the Command Line] Obtain, Scrub, Explore, and Model Data with Unix Power Tools
== Special characters ==
[https://www.howtogeek.com/439199/15-special-characters-you-need-to-know-for-bash/ 15 Special Characters You Need to Know for Bash]
== Progress bar ==
[https://www.linuxjournal.com/content/how-add-simple-progress-bar-shell-script How to Add a Simple Progress Bar in Shell Script]


== Simple calculation ==
== Simple calculation ==
Line 38: Line 62:
== Here documents ==
== Here documents ==
=== << ===
=== << ===
http://linux.die.net/abs-guide/here-docs.html
* http://linux.die.net/abs-guide/here-docs.html
* [https://www.cyberciti.biz/faq/using-heredoc-rediection-in-bash-shell-script-to-write-to-file/ How to use a here documents to write data to a file in bash script]
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
#!/bin/bash
#!/bin/bash
Line 46: Line 71:
this is a here
this is a here
document
document
$var on line
!FUNKY!
!FUNKY!
</syntaxhighlight>
</syntaxhighlight>
To disable pathname/parameter/variable expansion, command substitution, arithmetic expansion such as $HOME, ..., add quotes to EOF; 'EOF'.


=== <<< here string ===
=== <<< here string ===
Line 53: Line 81:


== Redirect ==
== Redirect ==
=== stdin, stdout, and stderr ===
[https://www.howtogeek.com/435903/what-are-stdin-stdout-and-stderr-on-linux/ What Are stdin, stdout, and stderr on Linux?]
Redirecting output. File descriptor number 1 (2) means standard output (error).
Redirecting output. File descriptor number 1 (2) means standard output (error).
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 68: Line 100:
</syntaxhighlight>
</syntaxhighlight>


=== Redirec to some location that needs sudo right ===
=== Using cat or echo to create a new file that needs sudo right ===
The following command does not work
The following command does not work
<pre>
<syntaxhighlight lang='bash'>
sudo cat myFile > /opt/myFile
sudo cat myFile > /opt/myFile
</pre>
</syntaxhighlight>
We can use [http://stackoverflow.com/questions/82256/how-do-i-use-sudo-to-redirect-output-to-a-location-i-dont-have-permission-to-wr something] like
 
<pre>
Solution 1 ('''sudo sh -c'''). We can use [http://stackoverflow.com/questions/82256/how-do-i-use-sudo-to-redirect-output-to-a-location-i-dont-have-permission-to-wr something] like
<syntaxhighlight lang='bash'>
sudo sh -c 'cat myFile > /opt/myFile'
sudo sh -c 'cat myFile > /opt/myFile'
</pre>
</syntaxhighlight>
 
Solution 2 ('''sudo tee'''). See '[https://www.digitalocean.com/community/tutorials/how-to-configure-nginx-as-a-web-server-and-reverse-proxy-for-apache-on-one-ubuntu-16-04-server How To Configure Nginx as a Web Server and Reverse Proxy for Apache on One Ubuntu 16.04 Server]'
<syntaxhighlight lang='bash'>
echo "<?php phpinfo(); ?>" | sudo tee /var/www/html/info.php
</syntaxhighlight>
 
If we want to append something to an existing file, use '''-a''' option in the '''tee''' command.


=== Create a simple text file with multiple lines ===
=== Create a simple text file with multiple lines; write data to a file in bash script ===
Each of the methods below can be used in a bash script.
Each of the methods below can be used in a bash script.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
# Method 1: printf
# Method 1: printf. We can add \t for tab delimiter
printf '%s \n' 'Line 1' 'Line 2' 'Line 3' > out.txt
$ printf '%s \n' 'Line 1' 'Line 2' 'Line 3' > out.txt


# Method 2: echo
# Method 2: echo. We can add \t for tab delimiter
echo 'Line 1
$ echo -e 'Line 1\t12\t13
Line 2
$ Line 2\t22\t23
Line 3' > out.txt
$ Line 3\t32\t33' > out.txt


# Method 3: echo
# Method 3: echo
echo $'Line 1\nLine 2\nLine 3' > out.txt
$ echo $'Line 1\nLine 2\nLine 3' > out.txt


# Method 4: here document
# Method 4: here document, http://tldp.org/LDP/abs/html/here-docs.html
cat <<EOF >out.txt
# For the TAB character, use Ctrl-V, TAB.
Line 1
# Note that first line can be: cat <<EOF > out.txt
Line 2
# The filename can be a variable if this is used inside a bash file
Line 3
$ cat > out.txt <<EOF
EOF
> line1  Second
> lin2    abcd
> line3ss dkflaf
> EOF
$
</syntaxhighlight>
</syntaxhighlight>
See also [https://www.cyberciti.biz/faq/using-heredoc-rediection-in-bash-shell-script-to-write-to-file/ How to use a here documents to write data to a file in bash script]
To escape the quotes, use a back slash. For example
{{Pre}}
echo $'#!/bin/bash\nmodule load R/3.6.0\nRscript --vanilla -e "rmarkdown::render(\'gse6532.Rmd\')"'
</pre>
will obtain
<pre>
#!/bin/bash
module load R/3.6.0
Rscript --vanilla -e "rmarkdown::render('gse6532.Rmd')"
</pre>


=== >& ===
=== >& ===
Line 110: Line 167:
command &>/dev/null
command &>/dev/null
</syntaxhighlight>
</syntaxhighlight>
In addition we can put a process in the background by adding the '&' sign; see the [[Linux#dclock_.28digital.29|dclock]] example.


=== tee -redirect to both a file and the screen same time ===
=== tee -redirect to both a file and the screen same time ===
Line 116: Line 175:
* http://www.cyberciti.biz/faq/saving-stdout-stderr-into-separate-files/
* http://www.cyberciti.biz/faq/saving-stdout-stderr-into-separate-files/
* https://en.wikipedia.org/wiki/Tee_(command)
* https://en.wikipedia.org/wiki/Tee_(command)
* [https://www.howtoforge.com/linux-tee-command/ Linux tee Command Explained for Beginners (6 Examples)]
* [https://stackoverflow.com/a/6991563 Since bash version 4 you may use |& as an abbreviation for 2>&1 |]


<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 124: Line 185:
command1 2>&1 | tee log.txt
command1 2>&1 | tee log.txt


# '-a' for append, sudo
# use the option '-a' for *append*
echo "new line of text" | sudo tee -a /etc/apt/sources.list
echo "new line of text" | sudo tee -a /etc/apt/sources.list
# redirect output of one command to another
ls file* | tee output.txt | wc -l


# streaming file (e.g. running an arduino sketch on Udoo)
# streaming file (e.g. running an arduino sketch on Udoo)
Line 137: Line 201:
command > >(tee stdout.log) 2> >(tee stderr.log >&2)
command > >(tee stdout.log) 2> >(tee stderr.log >&2)
</syntaxhighlight>
</syntaxhighlight>
=== Methods To Create A File In Linux ===
[https://www.2daygeek.com/linux-command-to-create-a-file/ 10 Methods To Create A File In Linux]
=== Prepend ===
[https://www.cyberciti.biz/faq/bash-prepend-text-lines-to-file/ BASH Prepend A Text / Lines To a File]


== Pipe ==
== Pipe ==
Line 167: Line 237:


=== Dash (-)  at the end of a command mean? ===
=== Dash (-)  at the end of a command mean? ===
For example
* http://unix.stackexchange.com/questions/16357/usage-of-dash-in-place-of-a-filename. It means 'standard input' or anything that will be used (required or interpreted) by the software. The following example is from [https://opensource.com/article/18/7/how-use-dd-linux How to use dd command] <syntaxhighlight lang='bash'>
<pre>
# ssh [email protected] "dd if=/dev/sda | gzip -1 -" | dd of=backup.gz
gzip -dc /cdrom/cdrom0/file.tar.gz | tar xvf –
</syntaxhighlight>
</pre>
 
* http://unix.stackexchange.com/questions/41828/what-does-dash-at-the-end-of-a-command-mean
* http://unix.stackexchange.com/questions/41828/what-does-dash-at-the-end-of-a-command-mean
* http://unix.stackexchange.com/questions/16357/usage-of-dash-in-place-of-a-filename
It means 'standard input' or anything that will be used (required or interpreted) by the software.


=== Process substitution ===
=== Process substitution ===
Line 200: Line 265:
</syntaxhighlight>
</syntaxhighlight>
where '''-s''' means silent and '''-S''' means showing error messages if it fails. Note that '''curl''' will download the file to standard output. So using the pipe operator is a reasonable sequence after running the '''curl'''.
where '''-s''' means silent and '''-S''' means showing error messages if it fails. Note that '''curl''' will download the file to standard output. So using the pipe operator is a reasonable sequence after running the '''curl'''.
=== Use wget to download and decompress at one line ===
https://stackoverflow.com/questions/16262980/redirect-pipe-wget-download-directly-into-gunzip
<syntaxhighlight lang='bash'>
wget -O - ftp://ftp.direcory/file.gz | gunzip -c > file.out
</syntaxhighlight>
where "-O -" means to print to standard output (sort of like the default behavior of "curl"). See https://www.gnu.org/software/wget/manual/wget.html
=== Use pipe and while loop to process multiple files ===
See an example at [[#while|while]].


=== Pipe vs redirect ===
=== Pipe vs redirect ===
Line 242: Line 317:


== Comments ==
== Comments ==
For a single line, we can use the '#' sign.
For a single line, we can use the '#' sign. [https://www.cyberciti.biz/faq/bash-comment-out-multiple-line-code/ Shell Script Put Multiple Line Comments under Bash/KSH].


For a block of code, we use
For a block of code, we use
Line 263: Line 338:
</syntaxhighlight>
</syntaxhighlight>


=== '''export''' command ===
=== When do I need to use the '''export''' command ===
Consider the following
<pre>
MY_DIRECTORY=/path/to/my/directory
export MY_DIRECTORY
./my_script.sh
</pre>
If you don’t use the export command in the above example, the MY_DIRECTORY variable will not be available to the my_script.sh script. It will only be available within the '''current shell session''' as a local shell variable.
 
When you set a variable in a shell session without using the export command, it is only available within that shell session as a local shell variable. This means that the variable and its value are only accessible within the current shell session and '''are not passed to child processes (e.g. my_script.sh) or other programs that are started from the command line'''.
 
Cf. When I put '''LS_COLORS''' in the .bashrc file, I don't need to use the export command.
 
=== '''export -n''' command: remove from environment ===
https://linuxconfig.org/learning-linux-commands-export
https://linuxconfig.org/learning-linux-commands-export


Line 300: Line 388:
declare -x VISUAL="nano"
declare -x VISUAL="nano"
</syntaxhighlight>
</syntaxhighlight>
=== echo command ===
* https://en.wikipedia.org/wiki/Echo_(command)
* [https://www.howtogeek.com/446071/how-to-use-the-echo-command-on-linux/ How to Use the Echo Command on Linux]
** Writing Text to the Terminal
** Using Variables With echo
** Using Commands With echo
** Formatting Text With echo
** Using echo With Files and Directories
** Writing to Files with echo


=== String manipulation ===
=== String manipulation ===
http://www.thegeekstuff.com/2010/07/bash-string-manipulation/
http://www.thegeekstuff.com/2010/07/bash-string-manipulation/


==== Concatenate string variables ====
==== '''dirname''' and '''basename''' commands ====
http://www.tldp.org/LDP/LG/issue18/bash.html
<syntaxhighlight lang='bash'>
# On directories
$ dirname ~/Downloads
/home/chronos/user
$ basename ~/Downloads
Downloads
 
# On files
$ dirname ~/Downloads/DNA_Helix.zip
/home/chronos/user/Downloads
 
$ basename ~/Downloads/DNA_Helix.zip
DNA_Helix.zip
$ basename ~/Downloads/DNA_Helix.zip .zip
DNA_Helix
$ basename ~/Downloads/annovar.latest.tar.gz
annovar.latest.tar.gz
$ basename ~/Downloads/annovar.latest.tar.gz .gz
annovar.latest.tar
$ basename ~/Downloads/annovar.latest.tar.gz .tar.gz
annovar.latest
$ basename ~/Downloads/annovar.latest.tar.gz .latest.tar.gz
annovar
</syntaxhighlight>
 
==== Escape characters and quotes ====
<pre>
echo $USER  # brb
 
echo My name is $USER
 
echo "My name is $USER"  # My name is brb
 
echo 'My name is $USER'  # 'My name is $USER'; single quote will not interpret the variable
          # we use the single quotes if we want to present the characters literally or
          # pass the characters to the shell.
grep '.*/udp' /etc/services  # normally . and * and slash characters have special meaning
 
echo \$USER  # we escape $ so $ lost its special meaning
 
echo '\'
 
echo \'text\'  # 'text'
</pre>
 
==== When to use double quotes with a variable ====
[https://unix.stackexchange.com/questions/78002/when-to-use-double-quotes-with-a-variable-in-shell-script when to use double quotes with a variable in shell script?]
 
==== Concatenate string variables (not safe) ====
http://stackoverflow.com/questions/4181703/how-can-i-concatenate-string-variables-in-bash
http://stackoverflow.com/questions/4181703/how-can-i-concatenate-string-variables-in-bash
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 344: Line 492:
Note that the [http://tldp.org/LDP/abs/html/dblparens.html double parentheses construct] in ((a+=12)) permits arithmetic expansion and evaluation.
Note that the [http://tldp.org/LDP/abs/html/dblparens.html double parentheses construct] in ((a+=12)) permits arithmetic expansion and evaluation.


==== concatenate a string variable and a constant string ====
==== '''${parameter}''' - Concatenate a string variable and a constant string; variable substitution ====
Use parentheses around the variable name.
[http://tldp.org/LDP/abs/html/parameter-substitution.html#PARAMSUBREF Parameter substitution ${}]. Cf $() for command execution


<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
Line 356: Line 504:
</syntaxhighlight>
</syntaxhighlight>


=== Environment variables ===
And
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$HOME
your_id=${USER}-on-${HOSTNAME}
$PATH
echo "$your_id"
$0 -- name of the shell script
 
$# -- number of parameters passed (so it does include the program itself)
echo "Old \$PATH = $PATH"
$$ process ID of the shell script, often used inside a script for generating unique temp filenames
PATH=${PATH}:/opt/bin  # Add /opt/bin to $PATH for duration of script.
$? -- the exit value of the last run command; 0 means OK and none-zero means something wrong
echo "New \$PATH = $PATH"
$_ -- previous command's last argument
</syntaxhighlight>
</syntaxhighlight>


Example 1 (check if a command run successfully):
And using "{" in order to create a new string based on an existing variable
<syntaxhighlight lang='bash'>
<pre>
some_command
pdir="/tmp/files/today"
if [ $? -eq 0 ]; then
fname="report"
    echo OK
mkdir -p $pdir
else
 
    echo FAIL
touch $pdir/$fname  # OK
fi
ls -l $pdir/$fname
# OR
if some_command; then
    printf 'some_command succeeded\n'
else
    printf 'some_command failed\n'
fi


$ tabix -f -p vcf ~/SeqTestdata/usefulvcf/hg19/CosmicCodingMuts.vcf.gz
touch $pdir/$fname_new  # No error but it does not do anything
brb@brb-P45T-A:/tmp$ echo $?
                        # because this variable does not exist yet
0
ls $pdir/$fname_new
$ tabix -f -p vcf ~/Downloads/CosmicCodingMuts.vcf.gz
 
Not a BGZF file: /home/brb/Downloads/CosmicCodingMuts.vcf.gz
touch $pdir/${fname}_new
tbx_index_build failed: /home/brb/Downloads/CosmicCodingMuts.vcf.gz
ls $pdir/${fname}_new  # Works
$ echo $?
</pre>
1
 
</syntaxhighlight>
==== '''$(command)''' - Command Execution and Assign Output of Shell Command To a Variable; Command substitution ====
[https://www.cyberciti.biz/faq/unix-linux-bsd-appleosx-bash-assign-variable-command-output/ Bash Assign Output of Shell Command To Variable]


Example 2 (check whether a host is reachable)
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
ping DOMAIN -c2 &> /dev/null
$(command)
if [ $? -eq 0 ];
`command`    # ` is a backquote/backtick, not a single quotation sign
then
            # this is a legacy support; not recommended by https://www.shellcheck.net/
  echo Successful
else
  echo Failure
fi
</syntaxhighlight>
</syntaxhighlight>
where -c is used to limit the number of packets to be sent and &> /dev/null is used to redirect both ''stderr'' and ''stdout'' to /dev/null so that it won't be printed on the terminal.
Note all new scripts should use the $(...) form, which was introduced to avoid some rather complex rules.


Example 3 (check if users have supply a correct number of parameters):
Example 1.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
#!/bin/bash
sudo apt-get install linux-headers-$(uname -r)
if [ $# -ne 2 ]; then
  echo "Usage: $0 ProgramName filename"
  exit 1
fi
 
match_text=$1
filename=$2
</syntaxhighlight>
</syntaxhighlight>


Example 4 (make a new directory and cd to it)
Example 2.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
mkdir -p "newDir/subDir"; cd "$_"
user=$(echo "$UID")
</syntaxhighlight>
</syntaxhighlight>


=== Parameter variables ===
Example 3.
* [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html Shell Parameter Expansion] - Important !!
* http://tldp.org/LDP/abs/html/othertypesv.html
* https://bash.cyberciti.biz/guide/Pass_arguments_into_a_function
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$1, $2, .... -- parameters given to the script
#!/bin/sh
$* -- list of all the parameters, in a single variable
echo The current directory is $PWD
$@ -- subtle variation on $*.
echo The current users are $(who)
$! -- the process id of the last command run in the background.
sudo chown `id -u` SomeDir  # change the ownership to the current user. Dangerous!
                            # Or sudo chown `whoami` SomeDirOrSomeFile
exit 0
</syntaxhighlight>
</syntaxhighlight>
For example,
 
Example 4. Create a new file with automatically generated filename
<pre>
touch file-$(date -I)
</pre>
 
Example 5. Use '''$(your expression)''' to run nest expressions. For example,
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ touch /tmp/tmpfile_$$
# cd into the directory containing the 'touch' command.
cd $(dirname $(type -P touch))


$ set foo bar bam
BACKUPDIR=/nas/backup
$ echo $#
LASTDAYPATH=${BACKUPDIR}/$(ls ${BACKUPDIR} | tail -n 1)
3
$ echo $@
foo bar bam
$ set foo bar bam &
[1] 28212
$ echo $!
28212
[1]+  Done                    set foo bar bam
</syntaxhighlight>
</syntaxhighlight>


We can also use parentheses around the variable name.
The concept of putting the result of a command into a script variable is very powerful, as it makes it easy to use existing commands in scripts and capture their output.
 
'''Arithmetic Expansion'''
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
QT_ARCH=x86_64
$((...))
QT_SDK_BINARY=QtSDK-4.8.0-${QT_ARCH}.tar.gz
QT_SD_URL=https://xxx.com/$QT_SDK_BINARY
</syntaxhighlight>
</syntaxhighlight>
 
is a better alternative to the '''expr''' command. More examples:
[http://stackoverflow.com/questions/1224766/how-do-i-rename-the-extension-for-a-batch-of-files How do I rename the extension for a batch of files?] See '''man bash''' [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html Shell Parameter Expansion]
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
# Solution 1:
for i in $(seq 1 3)
for file in *.html; do
  do echo SRR$(( i + 1027170 ))'_1'.fastq
    mv "$file" "`basename "$file" .html`.txt"
done
done
</syntaxhighlight>
Note that the single quote above is required. The above will output SRR1027171_1.fastq, SRR102172_1.fastq and SRR1027173_1.fastq.


# Solution 2:
'''Parameter Expansion'''
for file in *.html
<syntaxhighlight lang='bash'>
do
${parameter}
mv "$file" "${file%.html}.txt"
done
</syntaxhighlight>
</syntaxhighlight>


==== Discard the extension name ====
==== Double Parentheses (()) ====
[https://fedoramagazine.org/bash-shell-scripting-for-beginners-part-1/ Bash Shell Scripting for beginners (Part 1)] fedoramagazine. Double parentheses are simple, they are for mathematical equations.
 
==== extract substring ====
https://www.cyberciti.biz/faq/how-to-extract-substring-in-bash/
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ vara=fillename.ext
${parameter:offset:length}
$ echo $vara
fillename.ext
$ echo ${vara::-4} # works on Bash 4.3, eg Ubuntu
fillename
$ echo ${vara::${#vara}-4} # works on Bash 4.1, eg Biowulf readhat
</syntaxhighlight>
</syntaxhighlight>
http://stackoverflow.com/questions/27658675/how-to-remove-last-n-characters-from-a-bash-variable-string


Or better with (See [https://stackoverflow.com/questions/965053/extract-filename-and-extension-in-bash Extract filename and extension in Bash] and [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html Shell parameter expansion]).
Example:
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ UEFI_ZIP_FILE="UDOOX86_B02-UEFI_Update_rel102.zip"
## define var named u ##
$ UEFI_ZIP_DIR="${UEFI_ZIP_FILE%.*}"
u="this is a test"
$ echo $UEFI_ZIP_DIR
UDOOX86_B02-UEFI_Update_rel102


$ FILE="example.tar.gz"
var="${u:10:4}"
$ echo "${FILE%%.*}"
echo "${var}"
example
$ echo "${FILE%.*}"
example.tar
$ echo "${FILE#*.}"
tar.gz
$ echo "${FILE##*.}"
gz
</syntaxhighlight>
</syntaxhighlight>


==== Space in variable value====
Or use the '''cut''' command.
Suppose we have a script file called 'foo' that can remove spaces from a file name. Note: '''tr''' command is used to delete characters specified by the '-d' parameter.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
#!/bin/sh
u="this is a test"
NAME=`ls $1 | tr -d ' '`
echo "$u" | cut -d' ' -f 4
echo $NAME
echo "$u" | cut --delimiter=' ' --fields=4
mv $1 $NAME
##########################################
## WHERE
##  -d' ' : Use a whitespace as delimiter
##  -f 4  : Select only 4th field
##########################################
var="$(cut -d' ' -f 4 <<< $u)"
echo "${var}"
</syntaxhighlight>
</syntaxhighlight>
Now we try the program:
<syntaxhighlight lang='bash'>
$ touch 'file 1.txt'
$ ./foo 'file 1.txt'
ls: cannot access file: No such file or directory
ls: cannot access 1.txt: No such file or directory


mv: cannot stat ‘file’: No such file or directory
=== Environment variables ===
</syntaxhighlight>
[https://www.howtogeek.com/668503/how-to-set-environment-variables-in-bash-on-linux/ How to Set Environment Variables in Bash on Linux]
The way to fix the program is to use double quotes around $1
 
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
#!/bin/sh
$HOME
NAME=`ls "$1" | tr -d ' '`
$PATH
echo $NAME
$0 -- name of the shell script
mv "$1" $NAME
$# -- number of parameters passed (so it does include the program itself)
</syntaxhighlight>
$$ process ID of the shell script, often used inside a script for generating unique temp filenames
and test it
$? -- the exit value of the last run command; 0 means OK and none-zero means something wrong
<syntaxhighlight lang='bash'>
$_ -- previous command's last argument
$ ./foo "file 1.txt"
file1.txt
</syntaxhighlight>
</syntaxhighlight>


If we concatenate the variable, put the double quotes around the variables, not the whole string.
Example 1 (check if a command run successfully):
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ rm "$outputDir/tmp/$tmpfd/tmpa"  # fine
some_command
if [ $? -eq 0 ]; then
    echo OK
else
    echo FAIL
fi
# OR
if some_command; then
    printf 'some_command succeeded\n'
else
    printf 'some_command failed\n'
fi


$ rm "$outputDir/tmp/$tmpfd/tmp*.txt"
$ tabix -f -p vcf ~/SeqTestdata/usefulvcf/hg19/CosmicCodingMuts.vcf.gz
rm: annovar6-12/tmp/tmp_bt20_raw/tmp*.txt: No such file or directory
brb@brb-P45T-A:/tmp$ echo $?
 
0
$ rm "$outputDir"/tmp/$tmpfd/tmp*.txt
$ tabix -f -p vcf ~/Downloads/CosmicCodingMuts.vcf.gz
Not a BGZF file: /home/brb/Downloads/CosmicCodingMuts.vcf.gz
tbx_index_build failed: /home/brb/Downloads/CosmicCodingMuts.vcf.gz
$ echo $?
1
</syntaxhighlight>
</syntaxhighlight>


See https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters
Example 2 (check whether a host is reachable)
 
<syntaxhighlight lang='bash'>
== Conditions ==
ping DOMAIN -c2 &> /dev/null
We can use the '''test''' command to check if a file exists. The command is test -f <filename>.
if [ $? -eq 0 ];
 
[] is just the same as writing test, and would always leave a space after the test
word.
<pre>
if test -f fred.c; then ...; fi
 
if [ -f fred.c ]
then
then
...
  echo Successful
else
  echo Failure
fi
fi
</syntaxhighlight>
where -c is used to limit the number of packets to be sent and &> /dev/null is used to redirect both ''stderr'' and ''stdout'' to /dev/null so that it won't be printed on the terminal.


if [ -f fred.c ]; then
Example 3 (check if users have supply a correct number of parameters):
...
<syntaxhighlight lang='bash'>
#!/bin/bash
if [ $# -ne 2 ]; then
  echo "Usage: $0 ProgramName filename"
  exit 1
fi
fi
</pre>


=== What is the difference between test, [ and [[ ? ===
match_text=$1
http://mywiki.wooledge.org/BashFAQ/031
filename=$2
</syntaxhighlight>
 
Example 4 (make a new directory and cd to it)
<syntaxhighlight lang='bash'>
mkdir -p "newDir/subDir"; cd "$_"
</syntaxhighlight>


[ ("test" command) and [[ ("new test" command) are used to evaluate expressions. [[ works only in Bash, Zsh and the Korn shell, and is more powerful; [ and ''test'' are available in POSIX shells.
==== How to List Environment Variables ====
[https://www.howtogeek.com/842780/linux-list-environment-variables/ How to List Environment Variables on Linux]
<pre>
printenv
</pre>


''test'' implements the old, portable syntax of the command. In almost all shells (the oldest Bourne shells are the exception), [ is a synonym for ''test'' (but requires a final argument of ]).
==== Unset/Remove an environment variable ====
<syntaxhighlight lang='bash'>
$ export MSG="HELLO WORLD"
$ echo $MSG
HELLO WORLD
$ unset MSG
$ echo $MSG


[[ is a new improved version of it, and is a keyword, not a program.
$
</syntaxhighlight>


=== String comparison ===
==== Set an environment variable and run a command on the same line, env command ====
<ul>
<li>[https://stackoverflow.com/a/10856348 Setting an environment variable before a command in Bash is not working for the second command in a pipe]
<li>[https://stackoverflow.com/a/20858414 What does 'bash -c' do?]
<pre>
<pre>
==  ==> strings are equal (== is a synonym for =)
FOO=bar bash -c 'somecommand someargs | somecommand2'
=  ==> strings are equal
!=  ==> strings are not equal
-z  ==> string is null
-n  ==> string is not null
</pre>
</pre>
For example, the following script check if users have provided an argument to the script.
<li>env: run a program in a modified environment. [https://www.man7.org/linux/man-pages/man1/env.1.html man env], [https://www.geeksforgeeks.org/env-command-in-linux-with-examples/# env command in Linux with Examples]
<pre>
env RSTUDIO_WHICH_R=/opt/R/4.2.3/bin/R rstudio ~/Project/project.Rproj
</pre>
Note that the environment is not changed. RSTUDIO_WHICH_R is not exported.
<li>https://en.wikipedia.org/wiki/Env. ''Note that this use of env is often unnecessary since most shells support setting environment variables in front of a command''.
<pre>
env DISPLAY=foo.bar:1.0 xcalc
# OR
DISPLAY=foo.bar:1.0 xcalc
</pre>
</ul>
 
=== Parameter variables ===
* [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html Shell Parameter Expansion] - Important !!
* http://tldp.org/LDP/abs/html/othertypesv.html
* https://bash.cyberciti.biz/guide/Pass_arguments_into_a_function
<syntaxhighlight lang='bash'>
$1, $2, .... -- parameters given to the script
$* -- list of all the parameters, in a single variable
$@ -- subtle variation on $*.
$! -- the process id of the last command run in the background.
</syntaxhighlight>
Example 1.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$!/bin/sh
#!/bin/bash
if [ -z "$1"]; then
echo "$1 likes to eat $2 and $3 every day."
  echo "Provide a \"file name\", using quotes to nullify the space."
echo "bye:-)"
  exit 1
fi
mv -i "$1" `ls "$1" | tri -d ' '`
</syntaxhighlight>
</syntaxhighlight>
where the '''-i''' parameter is to reconfirm the overwrite by the '''mv''' command.


To check whether Xcode (either full Xcode or command line developer tools only) has been installed or not on Mac
Example 2.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
if [ -z "$(xcode-select -p 2>&1 | grep error)" ]
$ touch /tmp/tmpfile_$$
then
 
  echo "Xcode has been installed";
$ set foo bar bam
else
$ echo $#
  echo "Xcode has not been installed";
3
fi
$ echo $@
foo bar bam
$ set foo bar bam &
[1] 28212
$ echo $!
28212
[1]+  Done                    set foo bar bam
</syntaxhighlight>


# only print out message if xcode was not found
Example 3. [https://www.lifewire.com/pass-arguments-to-bash-script-2200571 $@] parameter for a variable number of parameters
if [ -n "$(xcode-select -p 2>&1 | grep error)" ]
<syntaxhighlight lang='bash'>
then
$ cat stats.sh
  echo "Xcode has not been installed";
for FILE1 in "$@"
fi
do
wc $FILE1
done
$ sh stats.sh songlist1 songlist2 songlist3
</syntaxhighlight>
</syntaxhighlight>
note the 'error' keyword comes from macOS when the [[#Install_Xcode|Xcode has not been installed]]. Also the double quotes around '''$( )''' is needed to avoid the error [http://stackoverflow.com/questions/13781216/bash-meaning-of-too-many-arguments-error-from-if-square-brackets [: too many arguments” error].


=== Arithmetic/Integer comparison ===
We can also use parentheses around the variable name.
<pre>
<syntaxhighlight lang='bash'>
expr1 -eq expr2  ==> check equal
QT_ARCH=x86_64
expr1 -ne expr2 ==> check not equal
QT_SDK_BINARY=QtSDK-4.8.0-${QT_ARCH}.tar.gz
expr1 -gt expr2  ==> expr1 > expr2
QT_SD_URL=https://xxx.com/$QT_SDK_BINARY
expr1 -ge expr2  ==> expr1 >= expr2
</syntaxhighlight>
expr1 -lt expr2  ==> expr1 < expr2
 
expr1 -le expr2  ==> expr1 <= expr2
[http://stackoverflow.com/questions/1224766/how-do-i-rename-the-extension-for-a-batch-of-files How do I rename the extension for a batch of/multiple files?] See '''man bash''' [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html Shell Parameter Expansion]
! expr  ==> opposite of expr
<syntaxhighlight lang='bash'>
# Solution 1:
for file in *.html; do
    mv "$file" "`basename "$file" .html`.txt"
done
 
# Solution 2:
for file in *.html
do
  mv "$file" "${file%.html}.txt"
done
</syntaxhighlight>
 
==== Get filename without Path ====
[https://tecadmin.net/how-to-extract-filename-extension-in-shell-script/ How to Extract Filename & Extension in Shell Script]
<pre>
fullfilename="/var/log/mail.log"
filename=$(basename "$fullfilename")
echo $filename
</pre>
</pre>


=== File conditionals ===
==== Extension without filename ====
[https://tecadmin.net/how-to-extract-filename-extension-in-shell-script/ How to Extract Filename & Extension in Shell Script]
<pre>
<pre>
-d file  ==> True if the file is a directory
fullfilename="/var/log/mail.log"
-e file  ==> True if the file exists
filename=$(basename "$fullfilename")
-f file  ==> True if the file is a regular file
ext="${filename##*.}"
-r file  ==> True if the file is readable
echo $ext
-s file  ==> True if the file has non-zero size
-w file  ==> True if the file is writable
-x file  ==> True if the file is executable
</pre>
</pre>


Example: Suppose we want to know if the first argument (if given) match a specific string. We can use (note the space before and after '==')
==== Discard the extension name and "%" symbol ====
<pre>
<syntaxhighlight lang='bash'>
#!/bin/bash
$ vara=fillename.ext
if [ $1 == "console" ]; then
$ echo $vara
  echo 'Console'
fillename.ext
else
$ echo ${vara::-4} # works on Bash 4.3, eg Ubuntu
  echo 'Non-console'
fillename
fi
$ echo ${vara::${#vara}-4} # works on Bash 4.1, eg Biowulf readhat
</pre>
</syntaxhighlight>
http://stackoverflow.com/questions/27658675/how-to-remove-last-n-characters-from-a-bash-variable-string


=== Check if running as root ===
Another way (not assuming 3 letters for the suffix) https://www.cyberciti.biz/faq/unix-linux-extract-filename-and-extension-in-bash/
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
if [ $UID -ne 0 ];
dest="/nas100/backups/servers/z/zebra/mysql.tgz"
then
## get file name i.e. basename such as mysql.tgz
  echo "Run as root"
tempfile="${dest##*/}"
  exit 1;
fi
## display filename
echo "${tempfile%.*}"
</syntaxhighlight>
</syntaxhighlight>


== Control Structures ==
Or better with (See [https://stackoverflow.com/questions/965053/extract-filename-and-extension-in-bash Extract filename and extension in Bash] and [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html Shell parameter expansion]). [https://tecadmin.net/how-to-extract-filename-extension-in-shell-script/ How to Extract Filename & Extension in Shell Script]
=== '''if''' ===
<syntaxhighlight lang='bash'>
<pre>
fullfilename="/var/log/mail.log"
if condition
filename=$(basename "$fullfilename")
then
fname="${filename%.*}"
  statements
echo $fname   # mail
elif [ condition ]; then
 
  statements
$ UEFI_ZIP_FILE="UDOOX86_B02-UEFI_Update_rel102.zip"
else
$ UEFI_ZIP_DIR="${UEFI_ZIP_FILE%.*}"
  statements
$ echo $UEFI_ZIP_DIR
fi
UDOOX86_B02-UEFI_Update_rel102
</pre>
For example, we can run a '''cp''' command if two files are different.
<pre>
if ! cmp -s "$filesrc" "$filecur"
then
    cp $filesrc $filecur
fi
</pre>
==== String Comparison ====
http://stackoverflow.com/questions/2237080/how-to-compare-strings-in-bash
<syntaxhighlight lang='bash'>
answer=no
if [ -f "genome.fa" ]; then
   echo -n 'Do you want to continue [yes/no]: '
  read answer
fi


if [ "$answer" == "no" ]; then
$ FILE="example.tar.gz"
echo AAA
$ echo "${FILE%%.*}"
fi
example
$ echo "${FILE%.*}"
example.tar
$ echo "${FILE#*.}"
tar.gz
$ echo "${FILE##*.}"
gz
</syntaxhighlight>


if [ "$answer"=="no" ]; then
==== Space in variable value====
# failed if condition
Suppose we have a script file called 'foo' that can remove spaces from a file name. Note: '''tr''' command is used to delete characters specified by the '-d' parameter.
echo BBB
<syntaxhighlight lang='bash'>
fi
#!/bin/sh
NAME=`ls $1 | tr -d ' '`
echo $NAME
mv $1 $NAME
</syntaxhighlight>
</syntaxhighlight>
# You want the quotes around $answer, because if $answer is empty.
Now we try the program:
# Space in bash is important.
<syntaxhighlight lang='bash'>
#* Spaces between '''if''' and '''[''' and ''']''' are important
$ touch 'file 1.txt'
#* A space before and after the double equal signs is important all. So if we reply with 'yes', the code still runs 'echo BBB' statement.
$ ./foo 'file 1.txt'
ls: cannot access file: No such file or directory
ls: cannot access 1.txt: No such file or directory


=== '''while''' ===
mv: cannot stat ‘file’: No such file or directory
<pre>
</syntaxhighlight>
while condition do
The way to fix the program is to use double quotes around $1
  statements
<syntaxhighlight lang='bash'>
done
#!/bin/sh
</pre>
NAME=`ls "$1" | tr -d ' '`
echo $NAME
mv "$1" $NAME
</syntaxhighlight>
and test it
<syntaxhighlight lang='bash'>
$ ./foo "file 1.txt"
file1.txt
</syntaxhighlight>
 
If we concatenate the variable, put the double quotes around the variables, not the whole string.
<syntaxhighlight lang='bash'>
$ rm "$outputDir/tmp/$tmpfd/tmpa"  # fine
 
$ rm "$outputDir/tmp/$tmpfd/tmp*.txt"
rm: annovar6-12/tmp/tmp_bt20_raw/tmp*.txt: No such file or directory
 
$ rm "$outputDir"/tmp/$tmpfd/tmp*.txt
</syntaxhighlight>
 
See https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters
 
==== getopts function - parse options from shell script command line ====
* https://www.lifewire.com/pass-arguments-to-bash-script-2200571
* https://www.computerhope.com/unix/bash/getopts.htm
* [https://www.howtogeek.com/778410/how-to-use-getopts-to-parse-linux-shell-script-options/ How to Use getopts to Parse Linux Shell Script Options]


'''until'''
==== Check if command line argument is missing (? :) and specifying the default (:-) ====
Search for [https://stackoverflow.com/a/3953666 ternary (conditional) operator] and check out [https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html parameter Expansion] in Bash Reference Manual. [https://linuxhint.com/bash_operator_examples/ 74 Bash Operators Examples]
<pre>
<pre>
until condition
#!/usr/bin/env bash
do
 
  statements
NAME=${1?Error: no name given}
done
NAME2=${2:-friend}
 
echo "HELLO! $NAME and $NAME2"
</pre>
</pre>


=== '''AND list''' ===
=== Shell expansion ===
<pre>
https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html#Shell-Expansions
statement1 && statement2 && statement3 && ...
==== Curly brace {} expansion and array ====
</pre>
* A [https://wizardzines.com/comics/parameter-expansion/?s=09 Comic] from Wizard zines.
If command1 finishes successfully then run command2.
* [https://www.cyberciti.biz/faq/explain-brace-expansion-in-cp-mv-bash-shell-commands/ Explain: {,} in cp or mv Bash Shell Commands]
* [https://unix.stackexchange.com/questions/157286/copying-files-with-multiple-extensions Copy multiple types of extensions]
: <syntaxhighlight lang='bash'>
cp -v *.{txt,jpg,png} destination/
</syntaxhighlight>
* [https://www.linux.com/blog/learn/2019/2/all-about-curly-braces-bash All about {Curly Braces} in Bash]
** Array Builder <syntaxhighlight lang='bash'>
echo {0..10}
 
echo {10..0..2}
echo {z..a..2}


=== '''OR list''' ===
mkdir test{10..12}  # test10, test11, test12 directories
<pre>
rm -rf test{10..12}
statement1 || statement2 || statement3 || ...
</syntaxhighlight>
</pre>
** Parameter expansion <syntaxhighlight lang='bash'>
If command1 fails then run command2.
# convert jpg to png
for i in *.jpg; do convert $i ${i%jpg}png; done


For example,
a="Hello World!"
<syntaxhighlight lang='bash'>
echo Goodbye${a#Hello}
codename=$(lsb_release -s -c)
# Goodbye World!
if [ $codename == "rafaela" ] || [ $codename == "rosa" ]; then
  codename="trusty"
fi
</syntaxhighlight>
</syntaxhighlight>
** Output Grouping
* [https://www.makeuseof.com/bash-script-array-usage/ How to Use Arrays in a Bash Script]


=== for + do + done ===
==== Square brackets ====
<pre>
[https://www.linux.com/blog/2019/3/using-square-brackets-bash-part-1 Using Square Brackets in Bash: Part 1]
for variable in values
do
  statements
done
</pre>


The values can be an explicit list
Globbing: Using wildcards to get all the results that fit a certain pattern is precisely
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
i=1
ls *.jpg # the asterisk means "zero or more characters"
for day in Mon Tue Wed Thu Fri
ls d*k?  # ?, which means "exactly one character"
do
 
  echo "Weekday $((i++)) : $day"
touch file0{0..9}{0..9} # This will create files file000, file001, file002, etc., through file097, file098 and file099.
done
ls file0[78]?          # list the files in the 70s and 80s
</syntaxhighlight>
ls file0[259][278]      # list file022, file027, file028, file052, file057, file058, file092, file097, and file98
or a variable
<syntaxhighlight lang='bash'>
i=1
weekdays="Mon Tue Wed Thu Fri"
for day in $weekdays
do
echo "Weekday $((i++)) : $day"
done
# Output
# Weekday 1 : Mon
# Weekday 2 : Tue
# Weekday 3 : Wed
# Weekday 4 : Thu
# Weekday 5 : Fri
</syntaxhighlight>
</syntaxhighlight>


Note that we should not put a double quotes around $weekdays variable. If we put a double quotes around $weekdays, it will prevent word splitting. See [http://www.thegeekstuff.com/2011/07/bash-for-loop-examples/ thegeekstuff] article.
== Conditions ==
<syntaxhighlight lang='bash'>
We can use the '''test''' command to check if a file exists. The command is test -f <filename>.
i=1
 
weekdays="Mon Tue Wed Thu Fri"
[] is just the same as writing test, and would always leave a space after the test
for day in "$weekdays"
word.
do
<pre>
echo "Weekday $((i++)) : $day"
if test -f fred.c; then ...; fi
done
# Output
# Weekday 1 : Mon Tue Wed Thu Fri
</syntaxhighlight>


if [ -f fred.c ]
then
...
fi
if [ -f fred.c ]; then
...
fi
</pre>
=== Boolean variables ===
[https://www.cyberciti.biz/faq/how-to-declare-boolean-variables-in-bash-and-use-them-in-a-shell-script/ How to declare Boolean variables in bash and use them in a shell script]
<pre>
failed=0 # False
jobdone=1 # True
## more readable syntax ##
failed=false
jobdone=true
if [ $failed -eq 1 ]
then
    echo "Job failed"
else
    echo "Job done"
fi
</pre>
We can define them as a string and make our code more readable.
=== What is the difference between test, [ and [[ ? ===
http://mywiki.wooledge.org/BashFAQ/031


[ ("test" command) and [[ ("new test" command) are used to evaluate expressions. [[ works only in Bash, Zsh and the Korn shell, and is more powerful; [ and ''test'' are available in POSIX shells.


To loop over all script files in a directory
''test'' implements the old, portable syntax of the command. In almost all shells (the oldest Bourne shells are the exception), [ is a synonym for ''test'' (but requires a final argument of ]).
<syntaxhighlight lang='bash'>
 
FILES=/path/to/PATTERN*.sh
[[ is a new improved version of it, and is a keyword, not a program.
for f in $FILES;
 
do
=== String comparison ===
(
<pre>
  "$f"
==  ==> strings are equal (== is a synonym for =)
)&
=  ==> strings are equal
done
!=  ==> strings are not equal
wait
-z  ==> string is null
</syntaxhighlight>
-n  ==> string is not null
OR
</pre>
For example, the following script check if users have provided an argument to the script.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
FILES="
$!/bin/sh
file1
if [ -z "$1"]; then
/path/to/file2
  echo "Provide a \"file name\", using quotes to nullify the space."
/path/to/file3
  exit 1
"
fi
for f in $FILES;
mv -i "$1" `ls "$1" | tri -d ' '`
do
(
  "$f"
)&
done
wait
</syntaxhighlight>
</syntaxhighlight>
Here we run the script in the background and wait to exit until all are finished.
where the '''-i''' parameter is to reconfirm the overwrite by the '''mv''' command.


See [http://www.cyberciti.biz/faq/bash-loop-over-file/ loop over files] from cyberciti.biz.
To check whether Xcode (either full Xcode or command line developer tools only) has been installed or not on Mac
<syntaxhighlight lang='bash'>
if [ -z "$(xcode-select -p 2>&1 | grep error)" ]
then
  echo "Xcode has been installed";
else
  echo "Xcode has not been installed";
fi


==== Example 1 ====
# only print out message if xcode was not found
To convert pdfs to tifs using ImageMagick (for looping over files, check [http://www.cyberciti.biz/faq/bash-loop-over-file/ cyberciti.biz])
if [ -n "$(xcode-select -p 2>&1 | grep error)" ]
<syntaxhighlight lang="bash">
then
outdir="../plosone"
  echo "Xcode has not been installed";
indir="../fig"
fi
</syntaxhighlight>
note the 'error' keyword comes from macOS when the [[#Install_Xcode|Xcode has not been installed]]. Also the double quotes around '''$( )''' is needed to avoid the error [http://stackoverflow.com/questions/13781216/bash-meaning-of-too-many-arguments-error-from-if-square-brackets [: too many arguments” error].


if [[ ! -d $outdir ]];
[https://www.cyberciti.biz/faq/bash-check-if-string-starts-with-character-such-as/ Check if string starts with such as "#"]. <syntaxhighlight lang='bash'>
then
if [[ "$var" =~ ^#.* ]]; then
  mkdir $outdir
    echo "yes"
fi
fi
in=(file1.pdf file2.pdf file3.pdf)
for (( i=0; i<${#in[@]} ; i++ ))
do
  convert -strip -units PixelsPerInch -density 300 -resample 300 \
          -alpha off -colorspace RGB -depth 8 -trim -bordercolor white \
          -border 1% -resize '2049x2758>' -resize '980x980<' +repage \
          -compress lzw $indir/${in[$i]} $outdir/Figure$[$i+1].tiff
done
</syntaxhighlight>
</syntaxhighlight>


==== Example 2 ====
=== Arithmetic/Integer comparison ===
A second [http://www.everydayanalytics.ca/2015/01/WTIandOntarioGasPrices.html example] is to download all the (Ontario gasoline price) data with wget and parsing and concatenating the data with other *nix tools like 'sed':
<pre>
<syntaxhighlight lang="bash">
expr1 -eq expr2  ==> check equal
# Download data
expr1 -ne expr2  ==> check not equal
for i in $(seq 1990 2014)
expr1 -gt expr2  ==> expr1 > expr2
        do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv
expr1 -ge expr2  ==> expr1 >= expr2
done
expr1 -lt expr2  ==> expr1 < expr2
expr1 -le expr2  ==> expr1 <= expr2
! expr  ==> opposite of expr
</pre>


# Retain the header
=== File conditionals ===
head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv
<pre>
-d file  ==> True if the file is a directory
-e file  ==> True if the file exists
-f file  ==> True if the file is a regular file
-r file  ==> True if the file is readable
-s file  ==> True if the file has non-zero size
-w file  ==> True if the file is writable
-x file  ==> True if the file is executable
</pre>


# Loop over the files and use sed to extract the relevant lines
Example 1: Suppose we want to know if the first argument (if given) match a specific string. We can use (note the space before and after '==')
for i in $(seq 1990 2014)
<syntaxhighlight lang='bash'>
        do
#!/bin/bash
        tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv
if [ $1 == "console" ]; then
        done
  echo 'Console'
else
  echo 'Non-console'
fi
</syntaxhighlight>
</syntaxhighlight>


==== Example 3 ====
Example 2: [https://www.cyberciti.biz/faq/linux-unix-script-check-if-file-empty-or-not/ Check If File Is Empty Or Not Using Shell Script]
Download all 20 sra files (60GB in total) from [ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP032/SRP032789 SRP032789].
<syntaxhighlight lang='bash'>
<syntaxhighlight lang="bash">
#!/bin/bash
for x in $(seq 1027175 1027180)
_file="$1"
  do wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP032/SRP032789/SRR$x/SRR$x.sra
[ $# -eq 0 ] && { echo "Usage: $0 filename"; exit 1; }
done
[ ! -f "$_file" ] && { echo "Error: $0 file not found."; exit 2; }
if [ -s "$_file" ]
then
echo "$_file has some data."
        # do something as file has data
else
echo "$_file is empty."
        # do something as file is empty
fi
</syntaxhighlight>
</syntaxhighlight>


==== Example 4 ====
=== Check if running as root ===
Convert all files from DOS to Unix format
<syntaxhighlight lang='bash'>
<syntaxhighlight lang="bash">
if [ $UID -ne 0 ];
for f in *.txt; do  tr -d '\r' < $f > tmp.txt;  mv tmp.txt $f  ; done
then
# Or
   echo "Run as root"
for file in $*; do   tr -d '\r' < $f > tmp.txt;   mv tmp.txt $f  ; done
   exit 1;
fi
</syntaxhighlight>
</syntaxhighlight>


==== Example 5 ====
== Control Structures ==
Include all files in a directory
=== '''if''' ===
<syntaxhighlight lang="bash">
<pre>
for f in /etc/*.conf
if condition
do
then
  echo "$f"
  statements
done
elif [ condition ]; then
</syntaxhighlight>
  statements
else
  statements
fi
</pre>
For example, we can run a '''cp''' command if two files are different.
<pre>
if ! cmp -s "$filesrc" "$filecur"
then
    cp $filesrc $filecur
fi
</pre>
==== String Comparison ====
http://stackoverflow.com/questions/2237080/how-to-compare-strings-in-bash
<syntaxhighlight lang='bash'>
answer=no
if [ -f "genome.fa" ]; then
  echo -n 'Do you want to continue [yes/no]: '
  read answer
fi


==== Example 6: use ping to find all the live machines on the network ====
if [ "$answer" == "no" ]; then
<syntaxhighlight lang="bash">
echo AAA
for ip in 192.168.0.{1..255} ;
fi
do
 
  ping $ip -c 2 &> /dev/null ;
if [ "$answer"=="no" ]; then
 
# failed if condition
  if [ $? -eq 0 ];
echo BBB
  then
fi
    echo $ip is alive
</syntaxhighlight>
  fi
# You want the quotes around $answer, because if $answer is empty.
# Space in bash is important.
#* Spaces between '''if''' and '''[''' and ''']''' are important
#* A space before and after the double equal signs is important all. So if we reply with 'yes', the code still runs 'echo BBB' statement.


=== '''while''' ===
<pre>
while condition do
  statements
done
done
</pre>
* https://www.cyberciti.biz/faq/bash-while-loop/, https://bash.cyberciti.biz/guide/While_loop
* http://www.tldp.org/LDP/Bash-Beginners-Guide/html/sect_09_02.html, http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html
* Pipe and while <syntaxhighlight lang='bash'>
$ function mylist() {
  ls *.r
}
$ mylist | while read file; do wc -l ${file}; done
</syntaxhighlight>
</syntaxhighlight>


==== Example 7: run in parallel ====
'''until'''
<syntaxhighlight lang="bash">
<pre>
for ip in 192.168.0.{1..255} ;
until condition
do
do  
  (
   statements
      ping $ip -c2 &> /dev/null ;
done
    
</pre>
      if [ $? -eq 0 ];
 
      then
=== case ===
      echo $ip is alive
[https://www.howtogeek.com/766978/how-to-use-case-statements-in-bash-scripts/ How to Use Case Statements in Bash Scripts]
      fi
  )&
  done
wait
</syntaxhighlight>
where we enclose the loop body in ()&. () encloses a block of commands to run as a subshell and & sends it to the background. '''wait''' waits for all background jobs to complete.


'''Good technique !!!'''
=== Semicolon ===
Command1; command2; command3; command4


* [[#GNU_Parallel|GNU '''parallel''' command]]
Every commands will be executed whether the execution is successful or not.
* http://unix.stackexchange.com/questions/103920/parallelize-a-bash-for-loop
* http://stackoverflow.com/questions/27934784/shell-script-to-loop-and-start-processes-in-parallel
* http://superuser.com/questions/158165/parallel-shell-loops


== Functions ==
=== '''AND list &&''' ===
== List of commands ==
[https://www.linuxuprising.com/2021/11/how-to-run-command-after-previous-one.html How To Run A Command After The Previous One Has Finished On Linux]
<pre>
<pre>
break  ==> escaping from an enclosing for, while or until loop
statement1 && statement2 && statement3 && ...
:      ==> null command
</pre>
continue ==> make the enclosing for, while or until loo continue at the next iteration
If command1 finishes successfully then run command2.
.     ==> executes the command in the current shell
eval  ==> evaluate arguments
exec  ==> replacing the current shell with a different program
export ==> make the variable named as its parameter available in subshells
expr  ==> evaluate its arguments as an expression
printf ==> similar to echo
set    ==> sets the parameter variables for the shell. Useful for using fields in commands that output spaced-separated values
shift  ==> moves all the parameter variables down by one.
trap  ==> specify the actions to take on receipt of signals.
unset  ==> remove variables or functions from the environment.
mktemp ==> create a temporary file
</pre>


== '''set -e''', '''set -x''' and '''trap''' ==
<syntaxhighlight lang='bash'>
Exit immediately if a command exits with a non-zero status. Type '''help set''' in command line. Very useful!
touch /tmp/f1
echo "data" >/tmp/f2
[ -s /tmp/f1 ]
echo $?    # 1
[ -s /tmp/f2 ]
echo $?    # 0


See also the [[#trap|trap]] command that is related to non-zero exit.
[ -s /tmp/f1 ] && echo "not empty" || echo "empty"  # empty
[ -s /tmp/f2 ] && echo "not empty" || echo "empty"  # not empty
</syntaxhighlight>


See
=== '''OR list ||''' ===
* [http://stackoverflow.com/questions/19622198/what-means-the-set-e-operation-in-a-bash-script-and-some-other-information-abo stackoverflow.com]
<pre>
* [http://www.peterbe.com/plog/set-ex set -ex]
statement1 || statement2 || statement3 || ...
* [https://hpc.nih.gov/docs/b2-userguide.html#monitor NIH/Biowulf]
</pre>
If command1 fails then run command2.


=== trap and error handler ===
For example,
* http://www.computerhope.com/unix/utrap.htm
<syntaxhighlight lang='bash'>
* http://linuxcommand.org/wss0160.php
codename=$(lsb_release -s -c)
* http://www.tutorialspoint.com/unix/unix-signals-traps.htm
if [ $codename == "rafaela" ] || [ $codename == "rosa" ]; then
* http://www.ibm.com/developerworks/aix/library/au-usingtraps/
  codename="trusty"
* http://bash.cyberciti.biz/guide/Trap_statement
fi
* http://steve-parker.org/sh/trap.shtml (trap with a user-defined function)
</syntaxhighlight>
* http://www.turnkeylinux.org/blog/shell-error-handling (set -e)
 
* http://unix.stackexchange.com/questions/17314/what-is-signal-0-in-a-trap-command (do something on EXIT)
=== Chaining rule (command1 && command2 || command3) ===
* http://unix.stackexchange.com/questions/79648/how-to-trigger-error-using-trap-command
[https://opensource.com/article/18/11/control-operators-bash-shell Coupled commands with control operators in Bash]
 
[https://www.tecmint.com/chaining-operators-in-linux-with-practical-examples/ 10 Useful Chaining Operators in Linux with Practical Examples].  
* Ampersand Operator (&),
* semi-colon Operator (;),
* AND Operator (&&),
* OR Operator (||),
* NOT Operator (!),
* AND – OR operator (&& – ||),
* PIPE Operator (|),
* Command Combination Operator {},
* Precedence Operator (),
* Concatenation Operator (\).  
 
A combination of ‘AND‘ and ‘OR‘ Operator is much like an ‘if-else‘ statement.
<syntaxhighlight lang='bash'>
$ ping -c3 www.google.com && echo "Verified" || echo "Host Down"
</syntaxhighlight>


The syntax to use '''trap''' command is
[https://opensource.com/article/19/10/programming-bash-syntax-tools How to program with Bash: Syntax and tools]
<pre>
<pre>
trap command signal
# command1 && command2
$ Dir=/root/testdir ; mkdir $Dir/ && cd $Dir
 
# command1 || command2
$ Dir=/root/testdir ; mkdir $Dir || echo "$Dir was not created."
 
# preceding commands ; command1 && command2 || command3 ; following commands
# "If command1 exits with a return code of 0, then execute command2, otherwise execute command3."
$ Dir=/root/testdir ; mkdir $Dir && cd $Dir || echo "$Dir was not created."
$ Dir=~/testdir ; mkdir $Dir && cd $Dir || echo "$Dir was not created."
</pre>
</pre>
For example,
 
=== for + do + done ===
<pre>
<pre>
$ cat traptest.sh
for variable in values
#!/bin/sh
do
  statements
done
</pre>


trap 'rm -f /tmp/tmp_file_$$' INT
The values can be an explicit list
echo creating file /tmp/tmp_file_$$
<syntaxhighlight lang='bash'>
date > /tmp/tmp_file_$$
i=1
 
for day in Mon Tue Wed Thu Fri
echo 'press interrupt to interrupt ...'
do
while [ -f /tmp/tmp_file_$$ ]; do
echo "Weekday $((i++)) : $day"
  echo file exists
done
  sleep 1
</syntaxhighlight>
or a variable
<syntaxhighlight lang='bash'>
i=1
weekdays="Mon Tue Wed Thu Fri"
for day in $weekdays
do
echo "Weekday $((i++)) : $day"
done
done
echo the file no longer exists
# Output
# Weekday 1 : Mon
# Weekday 2 : Tue
# Weekday 3 : Wed
# Weekday 4 : Thu
# Weekday 5 : Fri
</syntaxhighlight>


trap - INT
Note that we should not put a double quotes around $weekdays variable. If we put a double quotes around $weekdays, it will prevent word splitting. See [http://www.thegeekstuff.com/2011/07/bash-for-loop-examples/ thegeekstuff] article.
echo creaing file /tmp/tmp_file_$$
<syntaxhighlight lang='bash'>
date > /tmp/tmp_file_$$
i=1
echo 'press interrupt to interrupt ...'
weekdays="Mon Tue Wed Thu Fri"
while [ -f /tmp/tmp_file_$$ ]; do
for day in "$weekdays"
  echo file exists
do
  sleep 1
echo "Weekday $((i++)) : $day"
done
done
echo we never get here
# Output
exit 0
# Weekday 1 : Mon Tue Wed Thu Fri
</pre>
</syntaxhighlight>
will get an output like
 
<pre>
$ ./traptest.sh
creating file /tmp/tmp_file_21389
press interrupt to interrupt ...
file exists
file exists
^Cthe file no longer exists
creaing file /tmp/tmp_file_21389
press interrupt to interrupt ...
file exists
file exists
^C
</pre>
The first when we use trap, it will delete the file when we hit Ctrl+C. The second time when we use trap, we do not specify any command to be exected when an INT signal occurs. So the default behavior occurs. That is, the final echo and exit statements are never executed.


Note that the following two are different.
<pre>
trap - INT
trap '' INT
</pre>
The second command will IGNORE signals (Ctrl+C in this case) so if we apply this statement above, we will not be able to use Ctrl+C to kill the execution.


== Command Execution - $(command) ==
To loop over all script files in a directory
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$(command)
FILES=/path/to/PATTERN*.sh
`command`    # ` is a backquote/backtick, not a single quotation sign
for f in $FILES;
            # this is a legacy support; not recommended by https://www.shellcheck.net/
do
(
  "$f"
)&
done
wait
</syntaxhighlight>
</syntaxhighlight>
Note all new scripts should use the $(...) form, which was introduced to avoid some rather complex rules.
OR
 
Example 1.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
sudo apt-get install linux-headers-$(uname -r)
FILES="
</syntaxhighlight>
file1
 
/path/to/file2
Example 2.
/path/to/file3
<syntaxhighlight lang='bash'>
"
user=$(echo "$UID")
for f in $FILES;
do
(
  "$f"
)&
done
wait
</syntaxhighlight>
</syntaxhighlight>
Here we run the script in the background and wait to exit until all are finished.


Example 3.
See [http://www.cyberciti.biz/faq/bash-loop-over-file/ loop over files] from cyberciti.biz.
<syntaxhighlight lang='bash'>
#!/bin/sh
echo The current directory is $PWD
echo The current users are $(who)
sudo chown `id -u` SomeDir  # change the ownership to the current user. Dangerous!
                            # Or sudo chown `whoami` SomeDirOrSomeFile
exit 0
</syntaxhighlight>


Note that '''$(your expression)''' is a better way as it allows you to run nest expressions. For example,
==== Example 1: convert pdfs to tifs using ImageMagick ====
<syntaxhighlight lang='bash'>
"for" looping over files, check [http://www.cyberciti.biz/faq/bash-loop-over-file/ cyberciti.biz])
cd $(dirname $(type -P touch))
<syntaxhighlight lang="bash">
</syntaxhighlight>
outdir="../plosone"
will cd you into the directory containing the 'touch' command.  
indir="../fig"


The concept of putting the result of a command into a script variable is very powerful, as it makes it easy to use existing commands in scripts and capture their output.
if [[ ! -d  $outdir ]];
then
  mkdir $outdir
fi


'''Arithmetic Expansion'''
in=(file1.pdf file2.pdf file3.pdf)
<syntaxhighlight lang='bash'>
 
$((...))
for (( i=0; i<${#in[@]} ; i++ ))
</syntaxhighlight>
do
is a better alternative to the '''expr''' command. More examples:
  convert -strip -units PixelsPerInch -density 300 -resample 300 \
<syntaxhighlight lang='bash'>
          -alpha off -colorspace RGB -depth 8 -trim -bordercolor white \
for i in $(seq 1 3)
          -border 1% -resize '2049x2758>' -resize '980x980<' +repage \
  do echo SRR$(( i + 1027170 ))'_1'.fastq
          -compress lzw $indir/${in[$i]} $outdir/Figure$[$i+1].tiff
done
done
</syntaxhighlight>
</syntaxhighlight>
Note that the single quote above is required. The above will output SRR1027171_1.fastq, SRR102172_1.fastq and SRR1027173_1.fastq.


'''Parameter Expansion'''
==== Example 2: download with wget and parsing with 'sed' ====
<syntaxhighlight lang='bash'>
A second [http://www.everydayanalytics.ca/2015/01/WTIandOntarioGasPrices.html example] is to download all the (Ontario gasoline price) data with wget and parsing and concatenating the data with other *nix tools like 'sed':
${parameter}
<syntaxhighlight lang="bash">
</syntaxhighlight>
# Download data
for i in $(seq 1990 2014)
        do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv
done


== Bash shell find out if a command exists or not ==
# Retain the header
http://www.cyberciti.biz/faq/unix-linux-shell-find-out-posixcommand-exists-or-not/
head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv


=== POSIX built-in commands ===
# Loop over the files and use sed to extract the relevant lines
* '''command''' is one of bash built-in commands (alias, bind, command, declare, echo, help, let, printf, read, source, type, typeset, ulimit and unalias).
for i in $(seq 1990 2014)
* [https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html Bash Builtin Commands] and  [https://www.gnu.org/software/bash/manual/bashref.html#Shell-Builtin-Commands Shell Builtin Commands]
        do
* [http://ftp.gnu.org/gnu/bash/ Bash source code]
        tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv
* [https://unix.stackexchange.com/questions/319667/what-is-command-on-bash What is '''command''' on bash?]
        done
* [https://unix.stackexchange.com/questions/11454/what-is-the-difference-between-a-builtin-command-and-one-that-is-not What is the difference between a builtin command and one that is not?]
</syntaxhighlight>
* Use '''command''' command to tell if a command can be found.
 
* Use '''type''' command to tell if a command is built-in.
==== Example 3: download ====
Download all 20 sra files (60GB in total) from [ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP032/SRP032789 SRP032789].
<syntaxhighlight lang="bash">
for x in $(seq 1027175 1027180)
  do wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP032/SRP032789/SRR$x/SRR$x.sra
done
</syntaxhighlight>


<syntaxhighlight lang='bash'>
https://github.com/MarioniLab/EmptyDrops2017/blob/master/data/download_10x.sh
# command -v will return >0 when the command1 is not found
<pre>
command -v command1 >/dev/null && echo "command1 Found In \$PATH" || echo "command1 Not Found in \$PATH"
for x in \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/neurons_900/neurons_900_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/1.1.0/293t/293t_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/1.1.0/jurkat/jurkat_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/t_4k/t_4k_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/neuron_9k/neuron_9k_raw_gene_bc_matrices.tar.gz
do
    wget $x
    destname=$(basename $x)
    stub=$(echo $destname | sed "s/_raw_.*//")
    mkdir -p $stub
    tar -xvf $destname -C $stub
    rm $destname
done
</pre>


$ help command
==== Example 4: convert files from DOS to Unix ====
command: command [-pVv] command [arg ...]
Convert all files from DOS to Unix format
    Execute a simple command or display information about commands.
<syntaxhighlight lang="bash">
   
for f in *.txt; do  tr -d '\r' < $f > tmp.txt;  mv tmp.txt $f ; done
    Runs COMMAND with ARGS suppressing  shell function lookup, or display
# Or
    information about the specified COMMANDsCan be used to invoke commands
for file in $*; do  tr -d '\r' < $f > tmp.txt;  mv tmp.txt $f  ; done
    on disk when a function with the same name exists.
</syntaxhighlight>
   
    Options:
      -p use a default value for PATH that is guaranteed to find all of
    the standard utilities
      -v print a description of COMMAND similar to the `type' builtin
      -V print a more verbose description of each COMMAND
   
    Exit Status:
    Returns exit status of COMMAND, or failure if COMMAND is not found.


$ type command   
==== Example 5: print all files in a directory ====
command is a shell builtin
<syntaxhighlight lang="bash">
$ type export
for f in /etc/*.conf
export is a shell builtin
do
$ type wget
  echo "$f"
wget is /usr/bin/wget
done
$ type tophat
</syntaxhighlight>
-bash: type: tophat: not found
$ type sleep
sleep is /bin/sleep


$ command -v tophat
==== Example 6: use ping to find all the live machines on the network ====
$ command -v wget
<syntaxhighlight lang="bash">
/usr/bin/wget
for ip in 192.168.0.{1..255} ;
</syntaxhighlight>
do
On macOS,
  ping $ip -c 2 &> /dev/null ;
<syntaxhighlight lang='bash'>
 
$ help command
  if [ $? -eq 0 ];
command: command [-pVv] command [arg ...]
  then
    Runs COMMAND with ARGS ignoring shell functions.  If you have a shell
     echo $ip is alive
     function called `ls', and you wish to call the command `ls', you can
  fi
    say "command ls".  If the -p option is given, a default value is used
 
    for PATH that is guaranteed to find all of the standard utilities.  If
done
    the -V or -v option is given, a string is printed describing COMMAND.
    The -V option produces a more verbose description.
</syntaxhighlight>
</syntaxhighlight>


=== type -P ===
==== Example 7: sed on multiple files ====
<pre>
<pre>
type -P command1 &>/dev/null && echo "Found" || echo "Not Found"
for i in *.htm*; do sed -i 's/String1/String2/' "$i"; done
</pre>
Note if the string contains special characters like forward slashes (eg https://www.google.com), we need to escape them by using the backslash sign.


$ help type
==== Example 8: run in parallel ====
type: type [-afptP] name [name ...]
<syntaxhighlight lang="bash">
    Display information about command type.
for ip in 192.168.0.{1..255} ;
   
do
    For each NAME, indicate how it would be interpreted if used as a
  (
    command name.
      ping $ip -c2 &> /dev/null ;
   
 
    Options:
       if [ $? -eq 0 ];
       -a display all locations containing an executable named NAME;
      then
    includes aliases, builtins, and functions, if and only if
      echo $ip is alive
    the `-p' option is not also used
       fi
       -f suppress shell function lookup
  )&
      -P force a PATH search for each NAME, even if it is an alias,
  done
    builtin, or function, and returns the name of the disk file
wait
    that would be executed
</syntaxhighlight>
      -p returns either the name of the disk file that would be executed,
where we enclose the loop body in ()&. () encloses a block of commands to run as a subshell and & sends it to the background. '''wait''' waits for all background jobs to complete.
    or nothing if `type -t NAME' would not return `file'.
 
      -t output a single word which is one of `alias', `keyword',
'''Good technique !!!'''
    `function', `builtin', `file' or `', if NAME is an alias, shell
 
    reserved word, shell function, shell builtin, disk file, or not
* [[#GNU_Parallel|GNU '''parallel''' command]]
    found, respectively
* http://unix.stackexchange.com/questions/103920/parallelize-a-bash-for-loop
   
* http://stackoverflow.com/questions/27934784/shell-script-to-loop-and-start-processes-in-parallel
    Arguments:
* http://superuser.com/questions/158165/parallel-shell-loops
      NAME Command name to be interpreted.
 
   
=== wait command ===
    Exit Status:
<ul>
    Returns success if all of the NAMEs are found; fails if any are not found.
<li>An example where we shall wait until files are deleted before continuing the script.
typeset: typeset [-aAfFgilrtux] [-p] name[=value] ...
<syntaxhighlight lang='sh'>
     Set variable values and attributes.
cd /home/ubuntu
      
 
     Obsolete. See `help declare'.
if [ -d "R-devel" ]; then
</pre>
     rm -rf "R-devel" &
     wait # Wait for the deletion to complete
     echo "R-devel folder deleted successfully."
else
    echo "R-devel folder does not exist."
fi


== pause by '''read -p''' command ==
wget -O - https://stat.ethz.ch/R/daily/R-devel.tar.gz | tar -xzk
http://www.cyberciti.biz/tips/linux-unix-pause-command.html
<pre>
read -p "Press [Enter] key to start backup..."
</pre>


If we want to ask users about a yes/no question, we can use [http://stackoverflow.com/questions/226703/how-do-i-prompt-for-input-in-a-linux-shell-script this method]
cd R-devel
<pre>
./configure --prefix=/opt/R/devel --enable-R-shlib
while true; do
make
    read -p "Do you wish to install this program? " yn
</syntaxhighlight>
    case $yn in
</ul>
        [Yy]* ) make install; break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;
    esac
done
</pre>
OR
<pre>
echo "Do you wish to install this program?"
select yn in "Yes" "No"; do
    case $yn in
        Yes ) make install; break;;
        No ) exit;;
    esac
done
</pre>


=== Keyboard input and Arithmetic ===
== Functions ==
http://linuxcommand.org/wss0110.php
* http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-8.html, http://tldp.org/LDP/abs/html/functions.html
* http://www.thegeekstuff.com/2010/04/unix-bash-function-examples/
* https://www.howtoforge.com/tutorial/linux-shell-scripting-lessons-5/
* [https://wizardzines.com/comics/bash-functions/ Cartoon] from wizardzines.com


read
<syntaxhighlight lang='bash'>
<pre>
#!/bin/bash
#!/bin/bash


echo -n "Enter some text > "
fun () { echo "This is a function"; echo; }
read text
echo "You entered: $text"
</pre>


Arithmetic
fun () { echo "This is a function"; echo } # Error!
<pre>
#!/bin/bash
function quit {
  exit
}


# An applications of the simple command
function hello {
# echo $((2+2))
  echo Hello!
# That is, when you surround an arithmetic expression with the double parentheses,
}
# the shell will perform arithmetic evaluation.
first_num=0
second_num=0


echo -n "Enter the first number --> "
function e {
read first_num
  echo $1
echo -n "Enter the second number -> "
read second_num
$ ./e World
</syntaxhighlight>


echo "first number + second number = $((first_num + second_num))"
=== [https://www.cyberciti.biz/faq/how-to-find-bash-shell-function-source-code-on-linuxunix/ How to find bash shell function source code on Linux/Unix] ===
echo "first number - second number = $((first_num - second_num))"
<syntaxhighlight lang='bash'>
echo "first number * second number = $((first_num * second_num))"
$ type -a function_name
echo "first number / second number = $((first_num / second_num))"
echo "first number % second number = $((first_num % second_num))"
echo "first number raised to the"
echo "power of the second number  = $((first_num ** second_num))"
</pre>
and a program that formats an arbitrary number of seconds into hours and minutes:
<pre>
#!/bin/bash


seconds=0
# To list all function names
$ declare -F
$ declare -F | grep function_name
$ declare -F | grep foo
</syntaxhighlight>


echo -n "Enter number of seconds > "
How do I find the file where a bash function is defined?
read seconds
<syntaxhighlight lang='bash'>
declare -F function_name
</syntaxhighlight>


# use the division operator to get the quotient
=== Function arguments ===
hours=$((seconds / 3600))
<syntaxhighlight lang='bash'>
# use the modulo operator to get the remainder
source ~/bin/setpath # add bgzip & tabix directories to $PATH
seconds=$((seconds % 3600))
minutes=$((seconds / 60))
seconds=$((seconds % 60))


echo "$hours hour(s) $minutes minute(s) $seconds second(s)"
function raw2exon {
</pre>
  # put your comments here
 
  inputvcf=$1
== xargs ==
  outputvcf=$2
xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (the default command is echo, located at /bin/echo) one or more times with any initial-arguments followed by items read from standard input.
  inputbed=$3
  if [[ $4 ]]; then
    oldpath=$PWD
    cd $4
  fi
 
  bgzip -c $inputvcf > $inputvcf.gz
  tabix -p vcf $inputvcf.gz
 
  head -$(grep '#' $inputvcf | wc -l) $inputvcf > $outputvcf # header
  tabix -R $inputbed $inputvcf.gz >> $outputvcf
  wc -l $inputvcf
  wc -l $outputvcf
  rm $inputvcf.gz $inputvcf.gz.tbi
  if [[ $4 ]]; then
    cd $oldpath
  fi
}         


* [https://en.wikipedia.org/wiki/Xargs Wikipedia]
inputbed=S04380110_Regions.bed
* [https://www.howtoforge.com/tutorial/linux-xargs-command/ 8 Practical Examples of Linux Xargs Command for Beginners]
* [http://www.computerhope.com/unix/xargs.htm man] page


=== Example1 - Find files named core in or below the directory /tmp and delete them ===
raw2exon 'mu0001_raw.vcf' 'mu0001_exon.vcf' $inputbed ~/Downloads/
<syntaxhighlight lang='bash'>
find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f
</syntaxhighlight>
</syntaxhighlight>


=== Example2 - Find files from the grep coammand and sort them by date ===
==== Exit function ====
<syntaxhighlight lang='bash'>
[https://bash.cyberciti.biz/guide/Exit_command  exit command and the exit statuses]
grep -l "Polyphen" tmp/*.* | xargs ls -lt
<pre>
</syntaxhighlight>
$ cat testfun.sh
#!/bin/bash
ping -q -c 1 $1 >/dev/null 2>&1
if [ $? -ne 0 ]
then
  echo "An error occurred while checking the server status".
  exit 3
fi


=== Example3 - [http://stackoverflow.com/questions/4341442/gzip-with-all-cores Gzip with multiple jobs] ===
exit 0
<syntaxhighlight lang='bash'>
$ chmod +x testfun.sh
CORES=$(grep -c '^processor' /proc/cpuinfo)
$ ./testfun.sh www.cyberciti.biz999
find /source -type f -print0 | xargs -0 -n 1 -P $CORES gzip -9
An error occurred while checking the server status.
</syntaxhighlight>
$ echo $?
where
3
* find -print0 / xargs -0 protects you from whitespace in filenames
</pre>
* xargs -n 1 means one gzip process per file
* xargs -P specifies the number of jobs
* gzip -9 means maximum compression


== [https://en.wikipedia.org/wiki/GNU_parallel GNU Parallel] ==
== List of commands ==
* http://www.gnu.org/software/parallel/
<pre>
* https://www.gnu.org/software/parallel/parallel_tutorial.html
break  ==> escaping from an enclosing for, while or until loop
* https://www.biostars.org/p/63816/
:     ==> null command
* https://biowize.wordpress.com/2015/03/23/task-automation-with-bash-and-parallel/
continue ==> make the enclosing for, while or until loo continue at the next iteration
* http://www.shakthimaan.com/posts/2014/11/27/gnu-parallel/news.html
.     ==> executes the command in the current shell
* https://www.msi.umn.edu/support/faq/how-can-i-use-gnu-parallel-run-lot-commands-parallel
eval  ==> evaluate arguments
* http://deepdish.io/2014/09/15/gnu-parallel/
exec  ==> replacing the current shell with a different program
* http://davetang.org/muse/2013/11/18/using-gnu-parallel/
export ==> make the variable named as its parameter available in subshells
* https://vimeo.com/20838834, https://youtu.be/OpaiGYxkSuQ
expr  ==> evaluate its arguments as an expression
 
printf ==> similar to echo
A simple trick without using GNU Parallel is [[#Example_7:_run_in_parallel|run the commands in background]].
set    ==> sets the parameter variables for the shell. Useful for using fields in commands that output spaced-separated values
shift  ==> moves all the parameter variables down by one.
trap  ==> specify the actions to take on receipt of signals.
unset  ==> remove variables or functions from the environment.
mktemp ==> create a temporary file
</pre>
 
== Run the previous command ==
[https://unix.stackexchange.com/a/3748 Understanding the exclamation mark (!) in bash]
<pre>
$ apt update  # Permission denied
$ sudo !!    # Equivalent sudo apt update
</pre>
 
''' "!" ''' invokes history expansion. To run the most recent command ''beginning'' with “foo”:
<pre>
!foo
# Run the most recent command beginning with "service" as root
sudo !service
</pre>
 
== Cache console output on the CLI? ==
Try the ‘’’script’’’ command line utility to create a typescript of everything printed on your terminal.
 
To exit (to end script session) type ‘’’exit’’’ or logout or press control-D.
 
== '''set -e''', '''set -x''' and '''trap''' ==
Exit immediately if a command exits with a non-zero status. Type '''help set''' in command line. Very useful!
 
See also the [[#trap|trap]] command that is related to non-zero exit.
 
See
* [http://stackoverflow.com/questions/19622198/what-means-the-set-e-operation-in-a-bash-script-and-some-other-information-abo stackoverflow.com]
* [http://www.peterbe.com/plog/set-ex set -ex]


=== Example: same command, different command line argument ===
=== '''bash -x''' ===
Input from the command line:
Call your script with something like
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
parallel echo ::: A B C
bash –x –v hello_world.sh
</syntaxhighlight>
</syntaxhighlight>
 
OR
Input from a file:
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
parallel -a abc-file echo
#!/bin/bash –x -v
echo Hello World!
</syntaxhighlight>
</syntaxhighlight>
where
* '''-x''' displays commands and their results
* '''-v''' displays everything, even comments and spaces


Input is a STDIN:
This is the same as using '''set -x''' in your bash script.
 
=== '''set -x''' example ===
Bash script
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
cat abc-file | parallel echo
set -ex
</syntaxhighlight>
export DEBIAN_FRONTEND=noninteractive
 
codename=$(lsb_release -s -c)
if [ $codename == "rafaela" ] || [ $codename == "rosa" ]; then
  codename="trusty"
fi
 
echo $codename
echo step 1
echo step 2


Another similar example is to gzip each individual files
exit 0
<syntaxhighlight lang='bash'>
parallel gzip --best ::: *.html
</syntaxhighlight>
</syntaxhighlight>


=== Example: each command containing an index ===
Without '''-x''' option:
Instead of
<pre>
<syntaxhighlight lang='bash'>
trusty
for i in $(seq 1 100)
step 1
do
step 2
  someCommand data$i.fastq > output$i.txt &
</pre>
done
</syntaxhighlight>
, we can use
<syntaxhighlight lang='bash'>
parallel --jobs 16 someCommand data{}.fastq '>' output{}.txt ::: {1..100}
</syntaxhighlight>


=== Example: each command containing an index ===
With '''-x''' option:
<syntaxhighlight lang='bash'>
<pre>
for i in *gz; do
+ export DEBIAN_FRONTEND=noninteractive
  zcat $i > $(basename $i .gz).unpacked
+ DEBIAN_FRONTEND=noninteractive
done
++ lsb_release -s -c
</syntaxhighlight>
+ codename=rafaela
can be written as
+ '[' rafaela == rafaela ']'
<syntaxhighlight lang='bash'>
+ codename=trusty
parallel 'zcat {} > {.}.unpacked' ::: *.gz
+ echo trusty
</syntaxhighlight>
trusty
+ echo step 1
step 1
+ echo step 2
step 2
+ exit 0
</pre>


=== Example: run several subscripts from a master script ===
=== trap and error handler ===
Suppose I have a bunch of script files: script1.sh, script2.sh, ... And an optional master script (file ext does not end with .sh).
* http://www.computerhope.com/unix/utrap.htm
My goal is to run them using GNU Parallel.
* http://linuxcommand.org/wss0160.php
* http://www.tutorialspoint.com/unix/unix-signals-traps.htm
* http://www.ibm.com/developerworks/aix/library/au-usingtraps/
* http://bash.cyberciti.biz/guide/Trap_statement
* http://steve-parker.org/sh/trap.shtml (trap with a user-defined function)
* http://www.turnkeylinux.org/blog/shell-error-handling (set -e)
* http://unix.stackexchange.com/questions/17314/what-is-signal-0-in-a-trap-command (do something on EXIT)
* http://unix.stackexchange.com/questions/79648/how-to-trigger-error-using-trap-command
* [https://opensource.com/article/20/6/bash-trap Using Bash traps in your scripts]
* [http://redsymbol.net/articles/bash-exit-traps/ How "Exit Traps" Can Make Your Bash Scripts Way More Robust And Reliable]


I can just run them using
The syntax to use '''trap''' command is
<syntaxhighlight lang='bash'>
<pre>
parallel './{}' ::: *.sh
trap command signal
</syntaxhighlight>
</pre>
where "./" means the .sh files are located in the current directory and {} denotes each individual .sh file.
For example,
 
<pre>
More detail:
$ cat traptest.sh
<syntaxhighlight lang='bash'>
#!/bin/sh
$ mkdir test-par; cd test-par
$ echo echo A > script1.sh
$ echo echo B > script2.sh
$ echo echo C > script3.sh
$ echo echo D > script4.sh
$ chmod +x *.sh


$ cat > script    # master script (not needed for GNU parallel method)
trap 'rm -f /tmp/tmp_file_$$' INT
./script1.sh
echo creating file /tmp/tmp_file_$$
./script2.sh
date > /tmp/tmp_file_$$
./script3.sh
./script4.sh


$ time bash script
echo 'press interrupt to interrupt ...'
A
while [ -f /tmp/tmp_file_$$ ]; do
B
  echo file exists
C
  sleep 1
D
done
echo the file no longer exists


real 0m0.025s
trap - INT
user 0m0.004s
echo creaing file /tmp/tmp_file_$$
sys 0m0.004s
date > /tmp/tmp_file_$$
echo 'press interrupt to interrupt ...'
while [ -f /tmp/tmp_file_$$ ]; do
  echo file exists
  sleep 1
done
echo we never get here
exit 0
</pre>
will get an output like
<pre>
$ ./traptest.sh
creating file /tmp/tmp_file_21389
press interrupt to interrupt ...
file exists
file exists
^Cthe file no longer exists
creaing file /tmp/tmp_file_21389
press interrupt to interrupt ...
file exists
file exists
^C
</pre>
The first when we use trap, it will delete the file when we hit Ctrl+C. The second time when we use trap, we do not specify any command to be exected when an INT signal occurs. So the default behavior occurs. That is, the final echo and exit statements are never executed.


$ time parallel './{}' ::: *.sh    # No need of a master script
Note that the following two are different.
                                  # may need to add --gnu option if asked.
<pre>
A
trap - INT
B
trap '' INT
C
</pre>
D
The second command will IGNORE signals (Ctrl+C in this case) so if we apply this statement above, we will not be able to use Ctrl+C to kill the execution.


real 0m0.778s
=== DEBUG trap to step through line by line ===
user 0m0.588s
[https://twitter.com/b0rk/status/1312413117436104705 You can use the "DEBUG" trap to step through a bash script line by line]
sys 0m0.144s    # longer time because of the parallel overhead
</syntaxhighlight>


=== Note ===
== Bash shell find out if a command exists or not ==
* When I run scripts (seqtools_vc) sequentially I can get the standard output on screen. However, I may not get these output when I use GNU parallel.
http://www.cyberciti.biz/faq/unix-linux-shell-find-out-posixcommand-exists-or-not/
* There is a risk/problem if all scripts are trying to generate required/missing files when they detect the required files are absent.


== Debugging Scripts ==
=== POSIX ===
[http://www.tecmint.com/trace-shell-script-execution-in-linux/ How to Trace Execution of Commands in Shell Script with Shell Tracing]
* [https://en.wikipedia.org/wiki/POSIX Portable Operating System Interface]
* [https://statisticsglobe.com/as-posixlt-function-r as.POSIXlt Function in R (2 Examples)]


http://www.cyberciti.biz/tips/debugging-shell-script.html
=== POSIX built-in commands ===
* '''command''' is one of bash built-in commands (alias, bind, command, declare, echo, help, let, printf, read, source, type, typeset, ulimit and unalias).
* [https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html Bash Builtin Commands] and  [https://www.gnu.org/software/bash/manual/bashref.html#Shell-Builtin-Commands Shell Builtin Commands]
* [http://ftp.gnu.org/gnu/bash/ Bash source code]
* [https://unix.stackexchange.com/questions/319667/what-is-command-on-bash What is '''command''' on bash?]
* [https://unix.stackexchange.com/questions/11454/what-is-the-difference-between-a-builtin-command-and-one-that-is-not What is the difference between a builtin command and one that is not?]
* Use '''command''' command to tell if a command can be found.
* Use '''type''' command to tell if a command is built-in.


* Run a shell script with -x option. Then each lines of the script will be shown on the stdout. We can see which line takes long time or which lines broke the code (''it still runs through the script'').
<syntaxhighlight lang='bash'>
<pre>
# command -v will return >0 when the command1 is not found
$ bash -x script-name
command -v command1 >/dev/null && echo "command1 Found In \$PATH" || echo "command1 Not Found in \$PATH"
</pre>
* Use of set builtin command
* Use of intelligent DEBUG function


To run a bash script line by line:
$ help command
* [http://bashdb.sourceforge.net/ Bash Debugger]
command: command [-pVv] command [arg ...]
* Use '''Geany'''. See the next session.
    Execute a simple command or display information about commands.
   
    Runs COMMAND with ARGS suppressing  shell function lookup, or display
    information about the specified COMMANDs.  Can be used to invoke commands
    on disk when a function with the same name exists.
   
    Options:
      -p use a default value for PATH that is guaranteed to find all of
    the standard utilities
      -v print a description of COMMAND similar to the `type' builtin
      -V print a more verbose description of each COMMAND
   
    Exit Status:
    Returns exit status of COMMAND, or failure if COMMAND is not found.


=== Geany ===
$ type command   
* (Ubuntu 12.04 only): By default, it does not have the terminal tab. Install virtual terminal emulator. Run
command is a shell builtin
<syntaxhighlight lang='bash'>
$ type export
sudo apt-get install libvte-dev
export is a shell builtin
</syntaxhighlight>
$ type wget
* Step 1: Keyboard shortcut. Select a region of code. Edit -> >Commands->Send selection to Terminal. You can also assign a keybinding for this. To do so: go to Edit->Preferences and pick the Keybindings tab. See a screenshot [http://askubuntu.com/questions/528367/shortcut-to-send-selection-to-terminal-in-geany here]. I assign F12 (no any quote) for the shortcut. [http://www.geany.org/manual/current/#keybindings This is a complete list of the keybindings].
wget is /usr/bin/wget
$ type tophat
-bash: type: tophat: not found
$ type sleep
sleep is /bin/sleep
 
$ command -v tophat
$ command -v wget
/usr/bin/wget
</syntaxhighlight>
On macOS,
<syntaxhighlight lang='bash'>
$ help command
command: command [-pVv] command [arg ...]
    Runs COMMAND with ARGS ignoring shell functions. If you have a shell
    function called `ls', and you wish to call the command `ls', you can
    say "command ls". If the -p option is given, a default value is used
    for PATH that is guaranteed to find all of the standard utilities. If
    the -V or -v option is given, a string is printed describing COMMAND.
    The -V option produces a more verbose description.
</syntaxhighlight>


* Step 2: Newline character. Another issue is that the last line of sent code does not have a newline character. So I need to switch to the Terminal and press Enter. The solution is to modify the <geany.conf> (find its location using locate geany.conf. On my ubuntu 14 (geany 1.26), it is under '''~/.config/geany/geany.conf''') and set send_selection_unsafe=true. See [http://www.r-bloggers.com/using-geany-for-programming-in-r/ here].
=== type -P ===
* Step 3: PATH variable.
<pre>
<pre>
$ tmpname=$(basename $inputVCF)
type -P command1 &>/dev/null && echo "Found" || echo "Not Found"
Command 'basename' is available in '/usr/bin/basename'
The command could not be located because '/usr/bin' is not included in the PATH environment variable.
</pre>
The solution is to run '''PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin''' in the Terminal window before running our script.
* Step 4 (optional): Change background color.
Another handy change to geany is to change its background to black. To do that, go to Edit -> Preferences -> Editor. Once on the Editor options level, select the Display tab to the far right of the dialog, and you will notice a checkbox marked ''invert syntax highlighting colors''.


See [https://ask.fedoraproject.org/en/question/25734/how-to-set-gnome-terminal-in-geany-instead-of-xterm/ this post] about changing the default terminal in the ''Terminal'' window. The default is xterm (see the output of '''echo $TERM''').
$ help type
type: type [-afptP] name [name ...]
    Display information about command type.
   
    For each NAME, indicate how it would be interpreted if used as a
    command name.
   
    Options:
      -a display all locations containing an executable named NAME;
    includes aliases, builtins, and functions, if and only if
    the `-p' option is not also used
      -f suppress shell function lookup
      -P force a PATH search for each NAME, even if it is an alias,
    builtin, or function, and returns the name of the disk file
    that would be executed
      -p returns either the name of the disk file that would be executed,
    or nothing if `type -t NAME' would not return `file'.
      -t output a single word which is one of `alias', `keyword',
    `function', `builtin', `file' or `', if NAME is an alias, shell
    reserved word, shell function, shell builtin, disk file, or not
    found, respectively
   
    Arguments:
      NAME Command name to be interpreted.
   
    Exit Status:
    Returns success if all of the NAMEs are found; fails if any are not found.
typeset: typeset [-aAfFgilrtux] [-p] name[=value] ...
    Set variable values and attributes.
   
    Obsolete.  See `help declare'.
</pre>


== Examples ==
=== Find all bash builtin commands ===
* <[http://nebc.nerc.ac.uk/downloads/bl8_only/upgrade8.sh upgrade8.sh]> file from [http://environmentalomics.org/bio-linux-installation/ BioLinux installation] page
https://www.cyberciti.biz/faq/linux-unix-bash-shell-list-all-builtin-commands/
* [http://padamson.github.io/r/shiny/2016/03/13/install-required-r-packages.html Install required R packages] using a mixture of bash and R.
<pre>
$ help
$ help | less
$ help | grep read
</pre>


== How to wrap a long linux command ==
=== Find if a command is internal or external ===
Use backslash character. However, make sure the backslash character is the last character at a line. For example the first example below does not work since there is an extra space character after \.
<pre>
$ type -a COMMAND-NAME-HERE
$ type -a cd
$ type -a uname
$ type -a :


Example 1 (not work)
$ command -V ls
<pre>
$ command -V cd
sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \
$ command -V food
  libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \
  libcups2-dev libpulse-dev libudev-dev
</pre>
</pre>
vs example 2 (work)
 
== pause by '''read -p''' command ==
http://www.cyberciti.biz/tips/linux-unix-pause-command.html
<pre>
<pre>
sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \
read -p "Press [Enter] key to start backup..."
  libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \
  libcups2-dev libpulse-dev libudev-dev
</pre>
</pre>


== Command line path navigation ==
If we want to ask users about a yes/no question, we can use [http://stackoverflow.com/questions/226703/how-do-i-prompt-for-input-in-a-linux-shell-script this method]
'''pushd''' and '''popd''' are used to switch between multiple directories without the copying nad posting of directory paths. Thy operate on a stack; a last in first out data structure ('''LIFO''').
<pre>
<syntaxhighlight lang='bash'>
while true; do
pushd /var/www
    read -p "Do you wish to install this program? " yn
pushd /usr/src
    case $yn in
dirs
        [Yy]* ) make install; break;;
pushd +2
        [Nn]* ) exit;;
popd
        * ) echo "Please answer yes or no.";;
</syntaxhighlight>
    esac
done
</pre>
OR
<pre>
echo "Do you wish to install this program?"
select yn in "Yes" "No"; do
    case $yn in
        Yes ) make install; break;;
        No ) exit;;
    esac
done
</pre>


When we have only two locations, an alternative and easier way is '''cd -'''.
=== Keyboard input and Arithmetic ===
<syntaxhighlight lang='bash'>
http://linuxcommand.org/wss0110.php
cd /usr/src
# Do something
cd /var/www
cd -    # /usr/src
</syntaxhighlight>


== bd – Quickly Go Back to a Parent Directory ==
read
* https://www.tecmint.com/bd-quickly-go-back-to-a-linux-parent-directory/
<pre>
* https://raw.github.com/vigneshwaranr/bd/master/bd
#!/bin/bash


== Create log file  ==
echo -n "Enter some text > "
* Create a log file with date
read text
<syntaxhighlight lang='bash'>
echo "You entered: $text"
logfile="output_$(date +"%Y%m%d%H%M").log"
</pre>
</syntaxhighlight>
* Redirect the error to a log file
<syntaxhighlight lang='bash'>
logfile="output_$(date +"%Y%m%d%H%M").log"


module load XXX || exit 1
Arithmetic
<pre>
#!/bin/bash


echo "All output redirected to '$logfile'"
# An applications of the simple command
set -ex
# echo $((2+2))
# That is, when you surround an arithmetic expression with the double parentheses,
# the shell will perform arithmetic evaluation.
first_num=0
second_num=0


exec 2>$logfile
echo -n "Enter the first number --> "
read first_num
echo -n "Enter the second number -> "
read second_num


# Task 1
echo "first number + second number = $((first_num + second_num))"
start_time=$(date +%s)
echo "first number - second number = $((first_num - second_num))"
# Do something with possible error output
echo "first number * second number = $((first_num * second_num))"
end_time=$(date +%s)
echo "first number / second number = $((first_num / second_num))"
echo "Task 1 Started: tarted: "$start_date"; Ended: "$end_date"; Elapsed time: "$(($end_time - $start_time))" sec">>$logfile
echo "first number % second number = $((first_num % second_num))"
echo "first number raised to the"
echo "power of the second number  = $((first_num ** second_num))"
</pre>
and a program that formats an arbitrary number of seconds into hours and minutes:
<pre>
#!/bin/bash


# Task 2
seconds=0
start_time=$(date +%s)
# Do something with possible error output
end_time=$(date +%s)
echo "Task 1 Started: tarted: "$start_date"; Ended: "$end_date"; Elapsed time: "$(($end_time - $start_time))" sec">>$logfile
</syntaxhighlight>


= Text processing =
echo -n "Enter number of seconds > "
== tr command ==
read seconds
''It seems tr does not take general regular expression.''


The '''tr''' utility copies the given input to produced the output with substitution or deletion of selected characters. '''tr''' abbreviated as translate or transliterate.
# use the division operator to get the quotient
hours=$((seconds / 3600))
# use the modulo operator to get the remainder
seconds=$((seconds % 3600))
minutes=$((seconds / 60))
seconds=$((seconds % 60))


* http://www.thegeekstuff.com/2012/12/linux-tr-command/
echo "$hours hour(s) $minutes minute(s) $seconds second(s)"
* http://www.cyberciti.biz/faq/how-to-use-linux-unix-tr-command/
</pre>
 
== xargs ==
xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (the default command is echo, located at /bin/echo) one or more times with any initial-arguments followed by items read from standard input.


It will read from STDIN and write to STDOUT. The syntax is
* [https://en.wikipedia.org/wiki/Xargs Wikipedia]
<ul>
<li>[https://www.howtogeek.com/435164/how-to-use-the-xargs-command-on-linux/ How to Use the xargs Command on Linux]. Need to string some Linux commands together, but one of them doesn’t accept piped input.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
tr [OPTION] SET1 [SET2]
$ touch a.txt b.txt
$ ls -1 ./*.txt
./a.txt
./b.txt
$ ls -1 ./*.txt | xargs
./a.txt ./b.txt
</syntaxhighlight>
</syntaxhighlight>
</li>
<li>[https://www.cloudsavvyit.com/7984/using-xargs-in-combination-with-bash-c-to-create-complex-commands/ Using xargs in Combination With bash -c to Create Complex Commands]
</li>
</ul>
* [https://www.howtoforge.com/tutorial/linux-xargs-command/ 8 Practical Examples of Linux Xargs Command for Beginners]
* [http://www.computerhope.com/unix/xargs.htm man] page


If both the SET1 and SET2 are specified and ‘-d’ OPTION is not specified, then tr command will replace each characters in SET1 with each character in same position in SET2. For example,
=== Example1 - Find files named core in or below the directory /tmp and delete them ===
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
# translate to uppercase
find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f
$ echo 'linux' | tr "[:lower:]" "[:upper:]"
</syntaxhighlight>
where, '''-0''' If there are blank spaces or characters (including single quote, newlines, et al) many commands will not work. This option take cares of file names with blank space.


# Translate braces into parenthesis
Another case: suppose I have a file with filename ''-sT''. It seems not possible to delete it directly with the ''rm'' command.
$ tr '{}' '()' < inputfile > outputfile
<syntaxhighlight lang='bash'>
$ rm "-sT"
rm: invalid option -- 's'
Try 'rm ./-sT' to remove the file ‘-sT’.
Try 'rm --help' for more information.
$ $ ls *T
ls: option requires an argument -- 'T'
Try 'ls --help' for more information.
$ ls "*T"
ls: cannot access *T: No such file or directory
$ ls "*s*"
ls: cannot access *s*: No such file or directory


# Replace comma with line break
$ find . -maxdepth 1 -iname '*-sT'
$ tr ',' '\n' < inputfile
./-sT
$ find . -maxdepth 1 -iname '*-sT' | xargs -0 /bin/rm -f
$ find . -maxdepth 1 -iname '*-sT' | xargs /bin/rm -f  # WORKS
</syntaxhighlight>


# Translate white-space to tabs
Similarly, suppose I have a file of zero size. The file name is "-f3". I cannot delete it.
$ echo "This is for testing" | tr [:space:] '\t'
<syntaxhighlight lang='bash'>
$ ls -lt
total 448
-rw-r--r-- 1 mingc mingc      0 Jan 16 11:35 -f3
$ rm -f3
rm: invalid option -- '3'
Try `rm ./-f3' to remove the file `-f3'.
Try `rm --help' for more information.
$ find . -size  0 -print0 |xargs -0 rm
</syntaxhighlight>


# Join/merge all the lines in a file into a single line
=== Example2 - Find files from the grep coammand and sort them by date ===
$ tr -s '\n' ' ' < file.txt 
<syntaxhighlight lang='bash'>
# note sed cannot match \n easily as tr command.
grep -l "Polyphen" tmp/*.* | xargs ls -lt
# See
# http://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed
# https://unix.stackexchange.com/questions/26788/using-sed-to-convert-newlines-into-spaces
</syntaxhighlight>
</syntaxhighlight>


tr can also be used to remove particular characters using -d option. For example,
=== Example3 - [http://stackoverflow.com/questions/4341442/gzip-with-all-cores Gzip with multiple jobs] ===
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ echo "the geek stuff" | tr -d 't'
CORES=$(grep -c '^processor' /proc/cpuinfo)
he geek suff
find /source -type f -print0 | xargs -0 -n 1 -P $CORES gzip -9
</syntaxhighlight>
</syntaxhighlight>
where
* find -print0 / xargs -0 protects you from whitespace in filenames
* xargs -n 1 means one gzip process per file
* xargs -P specifies the number of jobs
* gzip -9 means maximum compression
== [https://en.wikipedia.org/wiki/GNU_parallel GNU Parallel] ==
* http://www.gnu.org/software/parallel/
* https://www.gnu.org/software/parallel/parallel_tutorial.html
* https://www.biostars.org/p/63816/
* https://biowize.wordpress.com/2015/03/23/task-automation-with-bash-and-parallel/
* http://www.shakthimaan.com/posts/2014/11/27/gnu-parallel/news.html
* https://www.msi.umn.edu/support/faq/how-can-i-use-gnu-parallel-run-lot-commands-parallel
* http://deepdish.io/2014/09/15/gnu-parallel/
* http://davetang.org/muse/2013/11/18/using-gnu-parallel/
* https://vimeo.com/20838834, https://youtu.be/OpaiGYxkSuQ
A simple trick without using GNU Parallel is [[#Example_7:_run_in_parallel|run the commands in background]].


A practical example
=== Example: same command, different command line argument ===
Input from the command line ([https://www.gnu.org/software/parallel/man.html#SYNOPSIS Synopsis] about the triple colon ":::"):
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
#!/bin/bash
parallel echo ::: A B C
echo -n "Enter file name : "
parallel gzip --best ::: *.html # '--best' means best compression
read myfile
parallel gunzip ::: *.CEL.gz
echo -n "Are you sure ( yes or no ) ? "
read confirmation
confirmation="$(echo ${confirmation} | tr 'A-Z' 'a-z')"
if [ "$confirmation" == "yes" ]; then
  [ -f $myfile ] &&  /bin/rm $myfile || echo "Error - file $myfile not found"
else
  : # do nothing
fi
</syntaxhighlight>
</syntaxhighlight>


Second example
Input from a file:
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ ifconfig | cut -c-10 | tr -d ' ' | tr -s '\n'
parallel -a abc-file echo
eth0
eth1
ip6tnl0
lo
sit0
</syntaxhighlight>
</syntaxhighlight>
where tr -d ' ' deletes every space character in each line. The \n newline character is squeezed using tr -s '\n' to produce a list of interface names. We use cut to extract the first 10 characters of each line.


== Regular Expression ==
Input is a STDIN:
* [http://opensourceforu.efytimes.com/2011/04/sed-explained-part-1  A summary table]
* https://regexper.com/ You can type for example '[a-z]*.[0-9]' to see what it is doing.
** ( ?[a-zA-Z]+ ?) match all words in a given text
** [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} match an IP address
* [http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/ 15 Practical Grep Command Examples In Linux]
* Period means a single character. [https://www.digitalocean.com/community/tutorials/using-grep-regular-expressions-to-search-for-text-patterns-in-linux Using Grep & Regular Expressions to Search for Text Patterns in Linux]
* Linux command line: '''grep PATTERN FILENAME''' or '''grep -E PATTERN FILENAME''' (extended regular expression)
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
echo -e "today is Monday\nHow are you" | grep Monday
cat abc-file | parallel echo
 
find . -iname "*after*" | parallel wc -l
</syntaxhighlight>


grep -E "[a-z]+" filename
Another similar example is to gzip each individual files
# or
<syntaxhighlight lang='bash'>
egrep "[a-z]+" filename


grep -i PATTERN FILENAME # ignore case
</syntaxhighlight>


grep -v PATTERN FILENAME # inverse match
=== Example: each command containing an index ===
Instead of
<syntaxhighlight lang='bash'>
for i in $(seq 1 100)
do
  someCommand data$i.fastq > output$i.txt &
done
</syntaxhighlight>
, we can use
<syntaxhighlight lang='bash'>
parallel --jobs 16 someCommand data{}.fastq '>' output{}.txt ::: {1..100}
</syntaxhighlight>


grep -c PATTERN FILENAME # count the number of lines in which a matching string appears
=== Example: each command not containing an index ===
<syntaxhighlight lang='bash'>
for i in *gz; do
  zcat $i > $(basename $i .gz).unpacked
done
</syntaxhighlight>
can be written as
<syntaxhighlight lang='bash'>
parallel 'zcat {} > {.}.unpacked' ::: *.gz
</syntaxhighlight>


grep -n PATTERN FILENAME # print the line number
=== Example: run several subscripts from a master script ===
Suppose I have a bunch of script files: script1.sh, script2.sh, ... And an optional master script (file ext does not end with .sh).
My goal is to run them using GNU Parallel.


grep -R PATTERN DIR      # recursively search many files
I can just run them using
grep -r PATTERN DIR      # recursively search many files
<syntaxhighlight lang='bash'>
parallel './{}' ::: *.sh
</syntaxhighlight>
where "./" means the .sh files are located in the current directory and {} denotes each individual .sh file.


grep -e "pattern1" -e "pattern2" FILENAME # multiple patterns
More detail:
grep -f PATTERNFILE FILENAME # PATTERNFILE contains patterns line-by-line
<syntaxhighlight lang='bash'>
$ mkdir test-par; cd test-par
$ echo echo A > script1.sh
$ echo echo B > script2.sh
$ echo echo C > script3.sh
$ echo echo D > script4.sh
$ chmod +x *.sh


grep -F PATTERN FILENAME # Interpret PATTERN as a  list  of  fixed  strings,  separated  by
$ cat > script    # master script (not needed for GNU parallel method)
                        # newlines,  any  of  which is to be matched.
./script1.sh
./script2.sh
./script3.sh
./script4.sh


grep -r --include *.{c,cpp} PATTERN DIR # including files in which to search
$ time bash script
grep -r --exclude "README" PATTERN DIR  # excluding files in which to search
A
B
C
D


grep -o \<dt\>.*<\/dt\> FILENAME # print only the matched string (<dt> .... </dt>)
real 0m0.025s
user 0m0.004s
sys 0m0.004s
 
$ time parallel './{}' ::: *.sh    # No need of a master script
                                  # may need to add --gnu option if asked.
A
B
C
D


grep -w                  # checking for full words, not for sub-strings
real 0m0.778s
grep -E -w "SRR2923335.1|SRR2923335.1999" # match in words (either SRR2923335.1 or SRR2923335.1999)
user 0m0.588s
sys 0m0.144s    # longer time because of the parallel overhead
</syntaxhighlight>
</syntaxhighlight>
* Extract the IP address from ifconfig command
<syntaxhighlight lang='bash'>
$ ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:14:d1:b0:df:9f 
          inet addr:192.168.1.172  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::214:d1ff:feb0:df9f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29113 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:28561660 (28.5 MB)  TX bytes:3516957 (3.5 MB)


$ ifconfig eth1 | egrep -o "inet addr:[^ ]*" | grep -o "[0-9.]*"
=== Note ===
192.168.1.172
* When I run scripts (seqtools_vc) sequentially I can get the standard output on screen. However, I may not get these output when I use GNU parallel.
</syntaxhighlight>
* There is a risk/problem if all scripts are trying to generate required/missing files when they detect the required files are absent.
where egrep -o "inet addr:[^ ]*" will match the pattern starting with inet addr: and ends with some non-space character sequence (specified by [^ ]*). Now in the next pipe, it prints the character combination of digits and '.'.


== Extract columns or fields from text files: cut ==
== [https://github.com/shenwei356/rush rush] - cross-platform tool for executing jobs in parallel ==
http://www.thegeekstuff.com/2013/06/cut-command-examples/


To extract fixed columns (say columns 5-7 of a file):
== Debugging Scripts ==
<syntaxhighlight lang='bash'>
* [https://www.tecmint.com/enable-shell-debug-mode-linux/ How To Enable Shell Script Debugging Mode in Linux] (very good) Some options (note options can be used in 1. the '''set''' command 2. the first line of the shell file or 3. the terminal where the shell is invoked)
cut -c5-7 somefile
** -e: exit if a command yields a nonzero exit status
</syntaxhighlight>
** -v: short for verbose
If the field delimiter is different from TAB you need to specify it using -d:
** -n: short for noexec or no ecxecution
<syntaxhighlight lang='bash'>
** -x: short for xtrace or execution trace
cut -d' ' -f100-105 myfile > outfile
* [http://www.tecmint.com/trace-shell-script-execution-in-linux/ How to Trace Execution of Commands in Shell Script with Shell Tracing]
#
* [https://www.tecmint.com/check-syntax-in-shell-script/ How to Perform Syntax Checking Debugging Mode in Shell Scripts]
cut -d: -f6 somefile  # colon-delimited file
* http://www.cyberciti.biz/tips/debugging-shell-script.html
#
grep "/bin/bash" /etc/passwd | cut -d':' -f1-4,6,7    # field 1 through 4, 6 and 7


cut -f3 --complement somefile # print all the columns except the third column
Run a shell script with -x option. Then each lines of the script will be shown on the stdout. We can see which line takes long time or which lines broke the code (''it still runs through the script'').
</syntaxhighlight>
<pre>
$ bash -x script-name
</pre>
* Use of set builtin command
* Use of intelligent DEBUG function


To specify the output delimiter, we shall use --output-delimiter. NOTE that to specify the Tab delimiter in '''cut''', we shall use $'\t'. See http://www.computerhope.com/unix/ucut.htm. For example,
To run a bash script line by line:
<syntaxhighlight lang='bash'>
* [http://bashdb.sourceforge.net/ Bash Debugger]
cut -f 1,3 -d ':' --output-delimiter=$'\t' somefile
* Use '''Geany'''. See the next session.
</syntaxhighlight>


If I am not sure about the number of the final field, I can leave the number off.
=== Geany ===
* (Ubuntu 12.04 only): By default, it does not have the terminal tab. Install virtual terminal emulator. Run
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
cut -f 1- -d ':' --output-delimiter=$'\t' somefile
sudo apt-get install libvte-dev
</syntaxhighlight>
</syntaxhighlight>
* Step 1: Keyboard shortcut. Select a region of code. Edit -> >Commands->Send selection to Terminal. You can also assign a keybinding for this. To do so: go to Edit->Preferences and pick the Keybindings tab. See a screenshot [http://askubuntu.com/questions/528367/shortcut-to-send-selection-to-terminal-in-geany here]. I assign F12 (no any quote) for the shortcut. [http://www.geany.org/manual/current/#keybindings This is a complete list of the keybindings].


== Substitution of text: sed (stream editor) ==
* Step 2: Newline character. Another issue is that the last line of sent code does not have a newline character. So I need to switch to the Terminal and press Enter. The solution is to modify the <geany.conf> (find its location using locate geany.conf. On my ubuntu 14 (geany 1.26), it is under '''~/.config/geany/geany.conf''') and set send_selection_unsafe=true. See [http://www.r-bloggers.com/using-geany-for-programming-in-r/ here].
* https://en.wikipedia.org/wiki/Sed
* Step 3: PATH variable.
* [http://www.thegeekstuff.com/2009/11/unix-sed-tutorial-append-insert-replace-and-count-file-lines/ Append, Insert, Replace, and Count File Lines]
<pre>
$ tmpname=$(basename $inputVCF)
Command 'basename' is available in '/usr/bin/basename'
The command could not be located because '/usr/bin' is not included in the PATH environment variable.
</pre>
The solution is to run '''PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin''' in the Terminal window before running our script.
* Step 4 (optional): Change background color.
Another handy change to geany is to change its background to black. To do that, go to Edit -> Preferences -> Editor. Once on the Editor options level, select the Display tab to the far right of the dialog, and you will notice a checkbox marked ''invert syntax highlighting colors''.


By default, ''sed'' only prints the substituted text. To save the changes along the substitutions to the same file, use the '''-i''' option.
See [https://ask.fedoraproject.org/en/question/25734/how-to-set-gnome-terminal-in-geany-instead-of-xterm/ this post] about changing the default terminal in the ''Terminal'' window. The default is xterm (see the output of '''echo $TERM''').
<syntaxhighlight lang='bash'>
sed 's/text/replace/' file > newfile
mv newfile file
# OR better
sed -i 's/text/replace/' file
</syntaxhighlight>


The '''sed''' command will replace the first occurrence of the pattern in each line. If we want to replace every occurrence, we need to add the '''g''' parameter at the end, as follows:
== Examples ==
<syntaxhighlight lang='bash'>
* <[http://nebc.nerc.ac.uk/downloads/bl8_only/upgrade8.sh upgrade8.sh]> file from [http://environmentalomics.org/bio-linux-installation/ BioLinux installation] page
sed 's/pattern/replace/g' file
* [http://padamson.github.io/r/shiny/2016/03/13/install-required-r-packages.html Install required R packages] using a mixture of bash and R.
</syntaxhighlight>


To remove blank lines
== How to wrap a long linux command ==
<syntaxhighlight lang='bash'>
Use backslash character. However, make sure the backslash character is the last character at a line. For example the first example below does not work since there is an extra space character after \.
sed '/^$/d' filename
</syntaxhighlight>


To [http://serverfault.com/questions/466118/using-sed-to-remove-both-an-opening-and-closing-square-bracket-around-a-string remove square brackets]
Example 1 (not work)
<syntaxhighlight lang='bash'>
<pre>
# method 1. replace ] & [ by the empty string
sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \
$ echo '00[123]44' | sed 's/[][]//g'
  libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \
0012344
  libcups2-dev libpulse-dev libudev-dev
# method 2 - use tr
</pre>
$ echo '00[123]00' | tr -d '[]'
vs example 2 (work)
0012300
<pre>
</syntaxhighlight>
sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \
  libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \
  libcups2-dev libpulse-dev libudev-dev
</pre>


To replace all three-digit numbers with another specified word in a file
== Command line path navigation ==
'''pushd''' and '''popd''' are used to switch between multiple directories without the copying nad posting of directory paths. Thy operate on a stack; a last in first out data structure ('''LIFO''').
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
sed -i 's/\b[0-9]\{3\}\b/NUMBER/g' filename
pushd /var/www
 
pushd /usr/src
echo -e "I love 111 but not 1111." | sed 's/\b[0-9]\{3\}\b/NUMBER/g'
dirs
pushd +2
popd
</syntaxhighlight>
</syntaxhighlight>
where {3} is used for matching the preceding character thrice. \ in \{3\} is used to give a special meaning for { and }. \b is the word boundary marker.


Variable string and quoting
When we have only two locations, an alternative and easier way is '''cd -'''.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
text=hello
cd /usr/src
echo hello world | sed "s/$text/HELLO/"
# Do something
cd /var/www
cd -    # /usr/src
</syntaxhighlight>
</syntaxhighlight>
Double quoting expand the expression by evaluating it.


== Application of sed: Get the top directory name of a tarball or zip file without extract it ==
== bd – Quickly Go Back to a Parent Directory ==
* https://www.tecmint.com/bd-quickly-go-back-to-a-linux-parent-directory/
* https://raw.github.com/vigneshwaranr/bd/master/bd
 
== Create log file  ==
* Create a log file with date
<syntaxhighlight lang='bash'>
logfile="output_$(date +"%Y%m%d%H%M").log"
</syntaxhighlight>
* Redirect the error to a log file
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
dn=`unzip -vl filename.zip | sed -n '5p' | awk '{print $8}'` # 5 is the line number to print
logfile="output_$(date +"%Y%m%d%H%M").log"
echo -e "$(basename $dn)"


dn=`tar -tf filename.tar.bz2 | grep -o '^[^/]\+' | sort -u`
module load XXX || exit 1
echo -e $dn


dn=`tar -tf filename.tar.gz | grep -o '^[^/]\+' | sort -u`
echo "All output redirected to '$logfile'"
echo -e $dn
set -ex


# Assume there is a sub-directory called htslibXXXX
exec 2>$logfile
dn=$(basename `find -maxdepth 1 -name 'htslib*'`)
echo -e $dn
</syntaxhighlight>


== Application of sed: Grab the line number from the 'grep -n' command output ==
# Task 1
Follow [http://stackoverflow.com/questions/10589929/find-the-line-number-where-a-specific-word-appears-with-grep here]
start_time=$(date +%s)
<syntaxhighlight lang='bash'>
# Do something with possible error output
grep -n 'regex' filename | sed 's/^\([0-9]\+\):.*$/\1/'  # return line numbers for each matches
end_time=$(date +%s)
# OR
echo "Task 1 Started: tarted: "$start_date"; Ended: "$end_date"; Elapsed time: "$(($end_time - $start_time))" sec">>$logfile
grep -n 'regex' filename | awk -F: '{print $1}'


echo 123:ABCD | sed 's/^\([0-9]\+\):.*$/\1/'            # 123
# Task 2
start_time=$(date +%s)
# Do something with possible error output
end_time=$(date +%s)
echo "Task 1 Started: tarted: "$start_date"; Ended: "$end_date"; Elapsed time: "$(($end_time - $start_time))" sec">>$logfile
</syntaxhighlight>
</syntaxhighlight>
where '''\1''' means to keep the substring of the pattern and '''\(''' & '''\)''' are used to mark the pattern. See http://www.grymoire.com/Unix/Sed.html for more examples, e.g. search repeating words or special patterns.


If we want to find the to directory for a zipped file (see [https://en.wikipedia.org/wiki/Zip_(file_format) wikipedia] for the zip format), we can use
= Text processing =
== tr (similar to sed) ==
''It seems tr does not take general regular expression.''
 
The '''tr''' utility copies the given input to produced the output with substitution or deletion of selected characters. '''tr''' abbreviated as translate or transliterate.
 
* http://www.thegeekstuff.com/2012/12/linux-tr-command/
* http://www.cyberciti.biz/faq/how-to-use-linux-unix-tr-command/
* https://www.howtoforge.com/linux-tr-command/
 
It will read from STDIN and write to STDOUT. The syntax is
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
unzip -vl snpEff.zip | head | grep -n 'CRC-32' | awk -F: '{print $1}'
tr [OPTION] SET1 [SET2]
</syntaxhighlight>
</syntaxhighlight>


== Substitution of text: perl ==
If both the SET1 and SET2 are specified and ‘-d’ OPTION is not specified, then tr command will replace each characters in SET1 with each character in same position in SET2. For example,
* Add or remove 'chr' from vcf file https://www.biostars.org/p/18530/
 
== awk: operate on rows and/or columns ==
'''awk''' is a tool designed to work with data streams. It can operate on columns and rows. If supports many built-in functionalities, such as arrays and functions, in the C programming language. Its biggest advantage is its flexibility.
 
* https://en.wikipedia.org/wiki/AWK
* https://www.tutorialspoint.com/awk/awk_workflow.htm
* http://www.thegeekstuff.com/2010/01/awk-introduction-tutorial-7-awk-print-examples
* http://www.theunixschool.com/p/awk-sed.html
* http://www.grymoire.com/Unix/Awk.html
 
Structure of an awk script
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
awk pattern { action }
# translate to uppercase
awk ' BEGIN{ print "start" } pattern { AWK commands } END { print "end" } ' file
$ echo 'linux' | tr "[:lower:]" "[:upper:]"
</syntaxhighlight>
The three of components ('''BEGIN''', '''END''' and a common statements block with the '''pattern''' match option) are optional and any of them can be absent in the script. The pattern can be also called a '''condition'''.


The default delimiter for fields is a space.
# Translate braces into parenthesis
$ tr '{}' '()' < inputfile > outputfile


Some examples:
# Replace comma with line break
<syntaxhighlight lang='bash'>
$ tr ',' '\n' < inputfile
awk 'BEGIN { i=0 } { i++ } END { print i}' filename
echo -e "line1\nline2" | awk 'BEGIN { print "start" } { print } END { print  "End" }'


seq 5 | awk 'BEGIN { sum=0; print "Summation:" } { print $1"+"; sum+=$1 } END { print "=="; print sum }'
# Split a long line using the space
$ echo $line | tr ' ' '\n'  


awk -F : '{print $6}' somefile  # colon-delimited file, print the 6th field (cut can do it)
# Translate white-space to tabs
#
$ echo "This is for testing" | tr [:space:] '\t'
awk --field-searator="\\t" '{print $6}' filename    # tab-delimited (cut can do it)
awk -F":" '{ print $1 " " $3 }' /etc/passwd  # (cut can do it)


awk -F "\t" '{OFS="\t"} {$1="mouse"$1; print $0}' genes.gtf > genescb.gtf
# Join/merge all the lines in a file into a single line
# or
$ tr -s '\n' ' ' < file.txt 
awk -F "\t" 'BEGIN {OFS="\t"} {$1="mouse"$1; print $0}' genes.gtf > genescb.gtf
# note sed cannot match \n easily as tr command.  
# replace ELEMENT with mouseELEMENT for data on the 1st column; tab separator was used for input (-F) and output (OFS)
# See
# http://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed
# https://unix.stackexchange.com/questions/26788/using-sed-to-convert-newlines-into-spaces
</syntaxhighlight>


awk 'NR % 4 == 1 {print ">" $0 } NR % 4 == 2 {print $0}' input > output
tr can also be used to remove particular characters using -d option. For example,
# extract rows 1,2,5,6,9,10,13,14,.... from input
<syntaxhighlight lang='bash'>
 
$ echo "the geek stuff" | tr -d 't'
awk 'NR % 4 == 0 {print ">" $0 } NR % 4 == 3 {print $0}' input > output
he geek suff
# extract rows 3,4,7,8,11,12,15,16,.... from input
$ tr -d "\15" < input > output # octal digit 15
 
awk '(NR==2),(NR==4) {print $0}' input
# print rows 2-4.
 
awk '{ print ($1-32)*(5/9) }'
# fahrenheit-to-celsius calculator, http://www.hcs.harvard.edu/~dholland/computers/awk.html
 
# http://stackoverflow.com/questions/3700957/printing-lines-from-a-file-where-a-specific-field-does-not-start-with-something
awk '$7 !~ /^mouse/ { print $0 }' input # column 7 not starting with 'mouse'
awk '$7 ~ /^mouse/ { print $0 }' input  # column 7 starting with 'mouse'
awk '$7 ~ /mouse/ { print $0 }' input   # column 7 containing 'mouse'
</syntaxhighlight>
</syntaxhighlight>


It seems AWK is useful for finding/counting a subset of rows or columns. It is not most used for string substitution.
A practical example
<syntaxhighlight lang='bash'>
#!/bin/bash
echo -n "Enter file name : "
read myfile
echo -n "Are you sure ( yes or no ) ? "
read confirmation
confirmation="$(echo ${confirmation} | tr 'A-Z' 'a-z')"
if [ "$confirmation" == "yes" ]; then
  [ -f $myfile ] &&  /bin/rm $myfile || echo "Error - file $myfile not found"
else
  : # do nothing
fi
</syntaxhighlight>
 
Second example
<syntaxhighlight lang='bash'>
$ ifconfig | cut -c-10 | tr -d ' ' | tr -s '\n'
eth0
eth1
ip6tnl0
lo
sit0


== How to delete the first few rows of a text file ==
# without tr -s '\n'
https://unix.stackexchange.com/questions/37790/how-do-i-delete-the-first-n-lines-of-an-ascii-file-using-shell-commands
eth0


Suppose we want to remove the first 3 rows of a text file


* sed
eth1
<syntaxhighlight lang='bash'>
 
$ sed -e '1,3d' < t.txt    # output to screen
 
ip6tnl0


$ sed -i -e 1,3d yourfile  # directly change the file
</syntaxhighlight>
* tail
<syntaxhighlight lang='bash'>
$ tail -n +4 t.txt    # output to screen
</syntaxhighlight>
* awk
<syntaxhighlight lang='bash'>
$ awk 'NR > 3 { print }' < t.txt    # output to screen
</syntaxhighlight>


== Delete first few characters on each row ==
lo
http://www.theunixschool.com/2014/08/sed-examples-remove-delete-chars-from-line-file.html


* To remove 1st n characters of every line:
<syntaxhighlight lang='bash'>
# delete the first 4 characters from each line
$ sed -r 's/.{4}//' file
</syntaxhighlight>


== Show the first few characters from a text file ==
sit0
<syntaxhighlight lang='bash'>
head -c 50 file  # return the first 50 bytes
</syntaxhighlight>


= Web =
Reference: [http://www.amazon.com/Linux-Scripting-Cookbook-Second-Edition/dp/1782162747 Linux Shell Scripting Cookbook]


== Copy a complete webiste ==
<syntaxhighlight lang='bash'>
wget --mirror --convert-links URL
# OR
wget -r -N -k -l DEPTH URL
</syntaxhighlight>
</syntaxhighlight>
where tr -d ' ' deletes every space character in each line. The \n newline character is squeezed using tr -s '\n' to produce a list of interface names. We use cut to extract the first 10 characters of each line.


== HTTP or FTP authentication ==
== Regular Expression and grep ==
* https://regexper.com/ You can type for example '[a-z]*.[0-9]' to see what it is doing.
** ( ?[a-zA-Z]+ ?) match all words in a given text
** [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} match an IP address
* [http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/ 15 Practical Grep Command Examples In Linux]
* [https://www.cyberciti.biz/faq/sed-remove-last-character-from-each-line/ Sed bracket expressions]. sed remove last character from each line.
* Period means a single character. [https://www.digitalocean.com/community/tutorials/using-grep-regular-expressions-to-search-for-text-patterns-in-linux Using Grep & Regular Expressions to Search for Text Patterns in Linux]
* Linux command line: '''grep PATTERN FILENAME''' or '''grep -E 'PATTERN1|PATTERN2' FILENAME''' (extended regular expression)
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
wget --user username --password pass URL
echo -e "today is Monday\nHow are you" | grep Monday
</syntaxhighlight>
 
grep -E "[a-z]+" filename
# or
egrep "[a-z]+" filename


== Download a web page as plain text (instead of HTML text) ==
grep -i PATTERN FILENAME # ignore case
<syntaxhighlight lang='bash'>
lynx URL -dump > TextWebPage.txt
</syntaxhighlight>


== cURL ==
grep -v PATTERN FILENAME # inverse match
<syntaxhighlight lang='bash'>
curl http://google.com -o index.html --progress
curl http://google.com --silent -o index.html


# Cookies
grep -c PATTERN FILENAME # count the number of lines in which a matching string appears
curl http://example.com --cookie "user=ABCD;pass=EFGH"
curl URL --cookie-jar cookie_file


# Setting a user agent string
grep -n PATTERN FILENAME # print the line number
# http://www.useragentstring.com/pages/useragentstring.php
curl URL --user-agent "Mozilla/5.0"


# Authenticating
grep -R PATTERN DIR      # recursively search many files and follow symbolic links
curl -u user:pass http://test_auth.com
grep -r PATTERN DIR      # recursively search many files
curl -u user http://test_auth.com


# Printing response headers excluding the data
grep -e "pattern1" -e "pattern2" FILENAME # multiple patterns OR operation (older Linux)
# For example, to check whether a page is reachable or not
egrep 'pattern1|pattern2' FILENAME        # multiple patterns (newer Linux)
# by checking the 'Content-length' parameter.
grep -f PATTERNFILE FILENAME # PATTERNFILE contains patterns line-by-line
curl -I URL
</syntaxhighlight>


== Image crawler and downloader ==
grep -F PATTERN FILENAME # Interpret PATTERN as a  list  of  fixed  strings,  separated  by
<syntaxhighlight lang='bash'>
                        # newlines,  any  of  which is to be matched.
#!/bin/bash
#Desc: Images downloader
#Filename: img_downloader.sh


if [ $# -ne 3 ];
grep -r --include \*.Rmd --include \*.R "file\.csv" ./   # search with only Rmd & R files
then
  echo "Usage: $0 URL -d DIRECTORY"
   exit -1
fi


for i in {1..4}
grep -r --exclude "README" PATTERN DIR              # excluding files in which to search
do
  case $1 in
  -d) shift; directory=$1; shift ;;
  *) url=${url:-$1}; shift;;
  esac
done


mkdir -p $directory;
grep -o \<dt\>.*<\/dt\> FILENAME # print only the matched string (<dt> .... </dt>)
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")


echo Downloading $url
grep -w                  # checking for full words, not for sub-strings
curl -s $url | egrep -o "<img src=[^>]*>" |
grep -E -w "SRR2923335.1|SRR2923335.1999" # match in words (either SRR2923335.1 or SRR2923335.1999)
sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
</syntaxhighlight>
* Extract the IP address from ifconfig command
<syntaxhighlight lang='bash'>
$ ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:14:d1:b0:df:9f 
          inet addr:192.168.1.172  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::214:d1ff:feb0:df9f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29113 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:28561660 (28.5 MB)  TX bytes:3516957 (3.5 MB)


sed -i "s|^/|$baseurl/|" /tmp/$$.list
$ ifconfig eth1 | egrep -o "inet addr:[^ ]*" | grep -o "[0-9.]*"
192.168.1.172
</syntaxhighlight>
where egrep -o "inet addr:[^ ]*" will match the pattern starting with inet addr: and ends with some non-space character sequence (specified by [^ ]*). Now in the next pipe, it prints the character combination of digits and '.'.


cd $directory;
=== --include option ===
 
<ul>
while read filename;
<li>[https://stackoverflow.com/a/10628271 how do I use the grep --include option for multiple file types?] You can use multiple --include flags. '''grep -r --include=*.{html,php,htm} "pattern" /some/path/'''
do
<syntaxhighlight lang='bash'>
  echo Downloading $filename
grep -r --include *.{c,cpp} PATTERN DIR # including files in which to search
  curl -s -O "$filename" --silent
</syntaxhighlight>
<li>[https://stackoverflow.com/a/24197797 grep --include command doesn't work in OSX Zsh]. The trick is to use '''quotes'''.
<syntaxhighlight lang='bash'>
grep -rl --include='*.Rmd' "pattern" ./


done < /tmp/$$.list
grep --include='*.rb' --include=='*.h*' -rnw . -e "pattern" 
</syntaxhighlight>
</syntaxhighlight>
</ul>


== Find broken links in a website by '''lynx -traversal''' ==
=== Bash Find Out IF a Variable Contains a Substring ===
<syntaxhighlight lang='bash'>
* [https://www.cyberciti.biz/faq/bash-find-out-if-variable-contains-substring/ Bash Find Out IF a Variable Contains a Substring]
#!/bin/bash  
* [https://www.howtogeek.com/825503/how-to-tell-if-a-bash-string-contains-a-substring-on-linux/ How to Tell If a Bash String Contains a Substring on Linux]
#Desc: Find broken links in a website


if [ $# -ne 1 ];
=== grep returns TRUE or FALSE ===
then
[https://unix.stackexchange.com/a/48536 Can grep return true/false or are there alternative methods]
  echo -e "$Usage: $0 URL\n"
  exit 1;
fi


echo Broken links:  
== less -S: print long lines ==
Causes lines longer than the screen width to be chopped rather than folded. [https://www.man7.org/linux/man-pages/man1/less.1.html man less].


mkdir /tmp/$$.lynx
== cut: extract columns or character positions from text files ==
cd /tmp/$$.lynx
http://www.thegeekstuff.com/2013/06/cut-command-examples/


lynx -traversal $1 > /dev/null
<syntaxhighlight lang='bash'>
count=0;
cut -f 5-7 somefile  # columns 5-7.
cut -c 5-7 somefile  # character positions 5-7
</syntaxhighlight>
'''The default delimiter is TAB'''. If the field delimiter is different from TAB you need to specify it using -d:
<syntaxhighlight lang='bash'>
cut -d' ' -f100-105 myfile > outfile
#
cut -d: -f6 somefile  # colon-delimited file
#
grep "/bin/bash" /etc/passwd | cut -d':' -f1-4,6,7    # field 1 through 4, 6 and 7


sort -u reject.dat > links.txt
cut -f3 --complement somefile # print all the columns except the third column
</syntaxhighlight>


while read link;
To specify the output delimiter, we shall use --output-delimiter. NOTE that to specify the Tab delimiter in '''cut''', we shall use $'\t'. See http://www.computerhope.com/unix/ucut.htm. For example,
do
<syntaxhighlight lang='bash'>
  output=`curl -I $link -s | grep "HTTP/.*OK"`;
cut -f 1,3 -d ':' --output-delimiter=$'\t' somefile
  if [[ -z $output ]];
  then
    echo $link;
    let count++
  fi
done < links.txt
 
[ $count -eq 0 ] && echo No broken links found.
</syntaxhighlight>
</syntaxhighlight>


== Track changes to a website ==
If I am not sure about the number of the final field, I can leave the number off.
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
#!/bin/bash
cut -f 1- -d ':' --output-delimiter=$'\t' somefile
#Desc: Script to track changes to webpage
</syntaxhighlight>


if [ $# -ne 1 ];
=== A simple shell function to show the first 3 columns and 3 rows of the matrix ===
then  
<syntaxhighlight lang='sh'>
  echo -e "$Usage: $0 URL\n"
function show_matrix() {
  exit 1;
    if [ -z "$1" ] || [ -z "$2" ]; then
fi
        echo "Usage: show_matrix <filename> <delimiter>"
        return 1
    fi


first_time=0
    if [ "$2" != "tab" ] && [ "$2" != "comma" ]; then
# Not first time
        echo "Delimiter must be 'tab' or 'comma'"
        return 1
    fi


if [ ! -e "last.html" ];
    if [ "$2" == "tab" ]; then
then
        cut -f1-3 "$1" | head -n 3
  first_time=1
    elif [ "$2" == "comma" ]; then
  # Set it is first time run
        cut -d',' -f1-3 "$1" | head -n 3
fi
    fi
}
# show_matrix data.txt tab
# show_matrix data.txt comma
</syntaxhighlight>


curl --silent $1 -o recent.html
== awk: operate on rows and/or columns ==
'''awk''' is a tool designed to work with data streams. It can operate on columns and rows. If supports many built-in functionalities, such as arrays and functions, in the C programming language. Its biggest advantage is its flexibility.


if [ $first_time -ne 1 ];
* https://en.wikipedia.org/wiki/AWK
then
* https://www.tutorialspoint.com/awk/awk_workflow.htm
  changes=$(diff -u last.html recent.html)
* http://www.thegeekstuff.com/2010/01/awk-introduction-tutorial-7-awk-print-examples
  if [ -n "$changes" ];
* http://www.theunixschool.com/p/awk-sed.html
  then
* http://www.grymoire.com/Unix/Awk.html
    echo -e "Changes:\n"
* https://www.howtogeek.com/562941/how-to-use-the-awk-command-on-linux/
    echo "$changes"
* [https://www.networkworld.com/article/3454979/the-many-faces-of-awk.html The many faces of awk]
  else
** Plucking out columns of data
    echo -e "\nWebsite has no changes"
** Printing simple text
  fi
** Doing math with awk
else
  echo "[First run] Archiving.."


fi
Structure of an awk script
 
<syntaxhighlight lang='bash'>
cp recent.html last.html
awk pattern { action }
awk ' BEGIN{ print "start" } pattern { AWK commands } END { print "end" } ' file
</syntaxhighlight>
</syntaxhighlight>
The three of components ('''BEGIN''', '''END''' and a common statements block with the '''pattern''' match option) are optional and any of them can be absent in the script. The pattern can be also called a '''condition'''.


== POST/GET ==
The default delimiter for fields is a space.
Look at a web site source and look for the 'name' field in a <input> tag.


http://www.w3schools.com/html/html_forms.asp
Some examples:
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
# -d is used for posting in curl
awk 'BEGIN { i=0 } { i++ } END { print i}' filename
curl URL -d "postvar1=var1&postvar2=var2"
echo -e "line1\nline2" | awk 'BEGIN { print "start" } { print } END { print  "End" }'
# OR the 'get' command with the 'post-data' option
get URL --post-data "postvar1=var1&postvar2=var2" -O out.html
</syntaxhighlight>


== Change detection of a website ==
seq 5 | awk 'BEGIN { sum=0; print "Summation:" } { print $1"+"; sum+=$1 } END { print "=="; print sum }'
* http://bhfsteve.blogspot.com/2013/03/monitoring-web-page-for-changes-using.html
* https://www.reddit.com/r/commandline/comments/2e2bkj/linux_software_to_monitor_website_changes/
* http://specto.sourceforge.net/ and https://www.linux.com/news/monitor-web-page-changes-specto
* http://www.mostlymaths.net/2010/01/cron-diff-wget-watch-changes-in-webpage.html


= Working with Files =
awk -F : '{print $6}' somefile  # colon-delimited file, print the 6th field (cut can do it)
== '''iconv''' command ==
#
[http://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/ How to Convert Files to UTF-8 Encoding in Linux]
awk --field-searator="\\t" '{print $6}' filename    # tab-delimited (cut can do it)
awk -F":" '{ print $1 " " $3 }' /etc/passwd  # (cut can do it)


== '''nl''' command ==
awk -F "\t" '{OFS="\t"} {$1="mouse"$1; print $0}' genes.gtf > genescb.gtf
Add line numbers to a text file
# or
<syntaxhighlight lang='bash'>
awk -F "\t" 'BEGIN {OFS="\t"} {$1="mouse"$1; print $0}' genes.gtf > genescb.gtf
$ cat demo_file
# replace ELEMENT with mouseELEMENT for data on the 1st column; tab separator was used for input (-F) and output (OFS)
THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.
this line is the 1st lower case line in this file.
This Line Has All Its First Character Of The Word With Upper Case.


Two lines above this line is empty.
awk 'NR % 4 == 1 {print ">" $0 } NR % 4 == 2 {print $0}' input > output
And this is the last line.
# extract rows 1,2,5,6,9,10,13,14,.... from input
$ nl demo_file
    1 THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.
    2 this line is the 1st lower case line in this file.
    3 This Line Has All Its First Character Of The Word With Upper Case.
     
    4 Two lines above this line is empty.
    5 And this is the last line.
</syntaxhighlight>


== '''file''' command ==
awk 'NR % 4 == 0 {print ">" $0 } NR % 4 == 3 {print $0}' input > output
<syntaxhighlight lang='bash'>
# extract rows 3,4,7,8,11,12,15,16,.... from input
$ file thumbs/g7.jpg
thumbs/g7.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=10, orientation=upper-left, xresolution=134, yresolution=142, resolutionunit=2, software=Adobe Photoshop CS Windows, datetime=2004:03:31 22:28:58], baseline, precision 8, 100x75, frames 3


$ file index.html
awk '(NR==2),(NR==4) {print $0}' input
index.html: HTML document, ASCII text
# print rows 2-4.


$ file 2742OS_5_01.sh
awk '{ print ($1-32)*(5/9) }'
2742OS_5_01.sh: Bourne-Again shell script, ASCII text executable
# fahrenheit-to-celsius calculator, http://www.hcs.harvard.edu/~dholland/computers/awk.html


$ file R-3.2.3.tar.gz
# http://stackoverflow.com/questions/3700957/printing-lines-from-a-file-where-a-specific-field-does-not-start-with-something
R-3.2.3.tar.gz: gzip compressed data, last modified: Thu Dec 10 03:12:50 2015, from Unix
awk '$7 !~ /^mouse/ { print $0 }' input # column 7 not starting with 'mouse'
awk '$7 ~ /^mouse/ { print $0 }' input  # column 7 starting with 'mouse'
awk '$7 ~ /mouse/ { print $0 }' input  # column 7 containing 'mouse'
</syntaxhighlight>
</syntaxhighlight>


== print by skipping rows ==
It seems AWK is useful for finding/counting a subset of rows or columns. It is not most used for string substitution.
http://stackoverflow.com/questions/604864/print-a-file-skipping-x-lines-in-bash
 
=== Print the string between two parentheses ===
https://unix.stackexchange.com/questions/108250/print-the-string-between-two-parentheses
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
$ tail -n +<N+1> <filename>  # excluding first N lines
$ awk -F"[()]" '{print $2}' file
                            # print by starting at line N+1.
 
$ tail -n +11 /tmp/myfile    # starting at line 11, or skipping the first 10 lines
$ echo ">gi|52546690|ref|NM_001005239.1| subfamily H, member 1 (OR11H1), mRNA" | awk -F"[()]" '{print $2}'
OR11H1
 
$ echo ">gi|284172348|ref|NM_002668.2| proteolipid protein 2 (colonic epithelium-enriched) (PLP2), mRNA" | awk -F"[()]" '{print $2}'
colonic epithelium-enriched  # WRONG
</syntaxhighlight>
</syntaxhighlight>


== '''tail -f''' (follow) ==
=== Insert a line ===
When we use the '-f' (follow) option, we can monitor a growing file. For example, we can create a new file called tmp.txt and run 'tail -f tmp.txt'. Now we open another terminal and run 'for i in {0..100}; do sleep 2; echo $i >> ~/output.txt ; done'. We will see in the 1st terminal that the content of tmp.txt is changed.
https://stackoverflow.com/a/18276534
<pre>
awk '/KEYWORDS/ { print; print "new line"; next }1' foo.input
</pre>
 
=== Count number of columns in file ===
https://stackoverflow.com/a/8629351
<pre>
awk -F'|' '{print NF; exit}' stores.dat  # Change '|' as needed
</pre>
 
== sed (stream editor): substitution of text ==
* https://en.wikipedia.org/wiki/Sed


A practical example is
By default, ''sed'' only prints the substituted text. To save the changes along the substitutions to the same file, use the '''-i''' option.
* Monitor system change
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
sudo tail -f /var/log/syslog
sed 's/text/replace/' file > newfile
mv newfile file
# OR better
sed -i 's/text/replace/' file
</syntaxhighlight>
</syntaxhighlight>
* Monitor a process and terminate itself when a give process dies
 
The '''sed''' command will replace the first occurrence of the pattern in each line. If we want to replace every occurrence, we need to add the '''g''' parameter at the end, as follows:
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
PID=$(pidof Foo)
sed -i 's/pattern/replace/g' file
tail -f textfile --pid $PID
</syntaxhighlight>
</syntaxhighlight>
A process Foo (eg. gedit) is appending data to a file, the tail -f should be executed until the process Foo dies.


== Low-level File Access ==
To remove blank lines
* file descriptors: 0 means standard input, 1 means standard output, 2 means standard error.
<syntaxhighlight lang='bash'>
* '''size_t write(int fildes, const void *buf, size_t nbytes);'''
sed '/^$/d' filename
<pre>
</syntaxhighlight>
#include <unistd.h>
#include <stdlib.h>
int main()
{
  if ((write(1, "Here is some data\n", 18)) != 17)
    write(2, "A write error has occurred on file descriptor\n", 46);
  exit(0);
}
</pre>
* '''size_t read(int fildes, void *buf, size_t nbytes);''' returns the number of data bytes actually read. If a read call returns 0, it had nothing to read; it reached the end of the file. An error on the call will cause it to return -1.
* To create a new file descriptor we use the open system call. '''int open(const char *path, int oflags, mode_t mode);'''


* The next program will do file copy.
To [http://serverfault.com/questions/466118/using-sed-to-remove-both-an-opening-and-closing-square-bracket-around-a-string remove square brackets]
<pre>
<syntaxhighlight lang='bash'>
#include <unistd.h>
# method 1. replace ] & [ by the empty string
#include <sys/stat.h>
$ echo '00[123]44' | sed 's/[][]//g'
#include <fcntl.h>
0012344
#include <stdlib.h>
# method 2 - use tr
int main()
$ echo '00[123]00' | tr -d '[]'
{
0012300
  char c;
</syntaxhighlight>
  int in, out;
  in = open("file.in", O_RDONLY);
  out = open("file.out", O_WRONLY|O_CREAT, S_IRUSER|S_IWUSR);
  while(read(in,&c,1) == 1)
    write(out,&c,1)
  exit(0);
}
</pre>


== The Standard I/O Library ==
To replace all three-digit numbers with another specified word in a file
* fopen, fclose
<syntaxhighlight lang='bash'>
* fread, fwrite
sed -i 's/\b[0-9]\{3\}\b/NUMBER/g' filename
* fflush
* fseek
* fgetc, getc, getchar
* fputc, putc, putchar
* fgets, gets
* printf, fprintf and sprintf
* scanf, fscanf and sscanf


== Formatted Input and Output ==
echo -e "I love 111 but not 1111." | sed 's/\b[0-9]\{3\}\b/NUMBER/g'
* prinf, fprintf and sprintf
</syntaxhighlight>
* scanf, fscanf and sscanf
where {3} is used for matching the preceding character thrice. \ in \{3\} is used to give a special meaning for { and }. \b is the word boundary marker.


== Stream Errors ==
Variable string and quoting
* [http://www.howtogeek.com/223850/how-do-you-run-a-command-in-the-background-with-no-output-unless-there-is-an-error/ How do You Run a Command in the Background with No Output Unless There is an Error?]
<syntaxhighlight lang='bash'>
text=hello
echo hello world | sed "s/$text/HELLO/"
</syntaxhighlight>
Double quoting expand the expression by evaluating it.
 
=== sed takes whatever follows the "s" as the separator ===
* [http://backreference.org/2010/02/20/using-different-delimiters-in-sed/ Using different delimiters in sed]
* http://www.grymoire.com/Unix/Sed.html#uh-2 ,
* https://en.wikipedia.org/wiki/Sed#Substitution_command
 
Suppose I like to replace "../jquery-ui.min.js" with "jquery-ui.js", I can use
{{Pre}}
echo '<script src="../jquery-ui.min.js"></script>' | sed 's|../jquery-ui.min.js|jquery-ui.js|g'
# <script src="jquery-ui.js"></script>
</pre>
 
<syntaxhighlight lang='bash'>
$ cat tmp
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@RG ID:NEAT
$ sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' tmp
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@RG ID:None SM:None LB:None PL:Illumina
$ sed 's/^@RG.*/@RG\tID:None\tSM:None\tLB:None\tPL:Illumina/g' tmp
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@RG ID:None SM:None LB:None PL:Illumina
</syntaxhighlight>
 
=== Case insensitive ===
https://www.cyberciti.biz/faq/unixlinux-sed-case-insensitive-search-replace-matching/
<pre>
# Newer version - add 'i' or 'I' after 'g'
sed 's/find-word/replace-word/gI' input.txt > output.txt
sed -i 's/find-word/replace-word/gI' input.txt
 
# Older version/macOS
sed 's/[wW][oO][rR][dD]/replace-word/g' input.txt > output.txt
sed 's/[Ll]inux/Unix/g' input.txt > output.txt
</pre>
 
=== macOS ===
[https://www.mkyong.com/mac/sed-command-hits-undefined-label-error-on-mac-os-x/ "undefined label" error on Mac OS X]
<pre>
$ sed -i 's/mkyong/google/g' testing.txt
sed: 1: "testing.txt": undefined label 'esting.txt'
 
# Solution
$ sed -i '.bak' 's/mkyong/google/g' testing.txt
</pre>
 
=== Application: Get the top directory name of a tarball or zip file without extract it ===
<syntaxhighlight lang='bash'>
dn=`unzip -vl filename.zip | sed -n '5p' | awk '{print $8}'` # 5 is the line number to print
echo -e "$(basename $dn)"
 
dn=`tar -tf filename.tar.bz2 | grep -o '^[^/]\+' | sort -u`  # '-u' means unique
echo -e $dn
 
dn=`tar -tf filename.tar.gz | grep -o '^[^/]\+' | sort -u`
echo -e $dn


== File and Directory Maintenance ==
# Assume there is a sub-directory called htslibXXXX
dn=$(basename `find -maxdepth 1 -name 'htslib*'`)
echo -e $dn
</syntaxhighlight>
 
=== Application: Grab the line number from the 'grep -n' command output ===
Follow [http://stackoverflow.com/questions/10589929/find-the-line-number-where-a-specific-word-appears-with-grep here]
<syntaxhighlight lang='bash'>
grep -n 'regex' filename | sed 's/^\([0-9]\+\):.*$/\1/'  # return line numbers for each matches
# OR
grep -n 'regex' filename | awk -F: '{print $1}'
 
echo 123:ABCD | sed 's/^\([0-9]\+\):.*$/\1/'            # 123
</syntaxhighlight>
where '''\1''' means to keep the substring of the pattern and '''\(''' & '''\)''' are used to mark the pattern. See http://www.grymoire.com/Unix/Sed.html for more examples, e.g. search repeating words or special patterns.
 
If we want to find the to directory for a zipped file (see [https://en.wikipedia.org/wiki/Zip_(file_format) wikipedia] for the zip format), we can use
<syntaxhighlight lang='bash'>
unzip -vl snpEff.zip | head | grep -n 'CRC-32' | awk -F: '{print $1}'
</syntaxhighlight>
 
=== Application: Delete first few characters on each row ===
http://www.theunixschool.com/2014/08/sed-examples-remove-delete-chars-from-line-file.html
 
* To remove 1st n characters of every line:
<syntaxhighlight lang='bash'>
# delete the first 4 characters from each line
$ sed -r 's/.{4}//' file
</syntaxhighlight>
 
=== Application: delete lines ===
[https://linuxhint.com/sed-command-to-delete-a-line/ Sed Command to Delete a Line]
* Delete a single line
* Delete a range of lines
* Delete multiple lines
* Delete all lines except specified range
* Delete empty lines
* Delete lines based on pattern
* Delete lines starting with a specific character
* Delete lines ending with specific character
* Deleting lines that match the pattern and the next line
* Deleting line from the pattern match to the end
 
=== Application: comment out certain lines ===
https://unix.stackexchange.com/a/128595. To comment lines 2 through 4 of bla.conf:
<pre>
sed -i '2,4 s/^/#/' bla.conf
</pre>
This is useful when I need to comment out line 240 & 242 on shell scripts (related to pdf file) generated from BRB-SeqTools.
 
== Substitution of text: perl ==
* Add or remove 'chr' from vcf file https://www.biostars.org/p/18530/
 
== How to delete the first few rows of a text file ==
https://unix.stackexchange.com/questions/37790/how-do-i-delete-the-first-n-lines-of-an-ascii-file-using-shell-commands
 
Suppose we want to remove the first 3 rows of a text file
 
* sed
: <syntaxhighlight lang='bash'>
$ sed -e '1,3d' < t.txt    # output to screen
 
$ sed -i -e 1,3d yourfile  # directly change the file
</syntaxhighlight>
* tail
: <syntaxhighlight lang='bash'>
$ tail -n +4 t.txt    # output to screen
</syntaxhighlight>
* awk
: <syntaxhighlight lang='bash'>
$ awk 'NR > 3 { print }' < t.txt    # output to screen
</syntaxhighlight>
 
== Delete the last row of a file ==
<syntaxhighlight lang='bash'>
sed -i '$d' FILE
</syntaxhighlight>
 
== Show the first few characters from a text file ==
<syntaxhighlight lang='bash'>
head -c 50 file  # return the first 50 bytes
</syntaxhighlight>
 
== Remove/Delete The Empty Lines In A File ==
https://www.2daygeek.com/remove-delete-empty-lines-in-a-file-in-linux/
<pre>
sed -i '/KEYWORD/d' File
</pre>
 
== cat: merge by rows ==
<syntaxhighlight lang='bash'>
cat file1 file2 > output
</syntaxhighlight>
 
== paste: merge by columns ==
 
<syntaxhighlight lang='bash'>
paste -d"\t" file1 file2 file3 > output
 
paste file1 file2 file3 | column -s $'\t' > output
</syntaxhighlight>
 
= Web =
Reference: [http://www.amazon.com/Linux-Scripting-Cookbook-Second-Edition/dp/1782162747 Linux Shell Scripting Cookbook]
 
== Copy a complete webiste ==
<syntaxhighlight lang='bash'>
wget --mirror --convert-links URL
# OR
wget -r -N -k -l DEPTH URL
</syntaxhighlight>
 
== HTTP or FTP authentication ==
<syntaxhighlight lang='bash'>
wget --user username --password pass URL
</syntaxhighlight>
 
== Download a web page as plain text (instead of HTML text) ==
<syntaxhighlight lang='bash'>
lynx URL -dump > TextWebPage.txt
</syntaxhighlight>
 
== cURL ==
<syntaxhighlight lang='bash'>
curl http://google.com -o index.html --progress
curl http://google.com --silent -o index.html
 
# Cookies
curl http://example.com --cookie "user=ABCD;pass=EFGH"
curl URL --cookie-jar cookie_file
 
# Setting a user agent string
# http://www.useragentstring.com/pages/useragentstring.php
curl URL --user-agent "Mozilla/5.0"
 
# Authenticating
curl -u user:pass http://test_auth.com
curl -u user http://test_auth.com
 
# Printing response headers excluding the data
# For example, to check whether a page is reachable or not
# by checking the 'Content-length' parameter.
curl -I URL
</syntaxhighlight>
 
== Image crawler and downloader ==
<syntaxhighlight lang='bash'>
#!/bin/bash
#Desc: Images downloader
#Filename: img_downloader.sh
 
if [ $# -ne 3 ];
then
  echo "Usage: $0 URL -d DIRECTORY"
  exit -1
fi
 
for i in {1..4}
do
  case $1 in
  -d) shift; directory=$1; shift ;;
  *) url=${url:-$1}; shift;;
  esac
done
 
mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
 
echo Downloading $url
curl -s $url | egrep -o "<img src=[^>]*>" |
sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
 
sed -i "s|^/|$baseurl/|" /tmp/$$.list
 
cd $directory;
 
while read filename;
do
  echo Downloading $filename
  curl -s -O "$filename" --silent
 
done < /tmp/$$.list
</syntaxhighlight>
 
== Find broken links in a website by '''lynx -traversal''' ==
<syntaxhighlight lang='bash'>
#!/bin/bash
#Desc: Find broken links in a website
 
if [ $# -ne 1 ];
then
  echo -e "$Usage: $0 URL\n"
  exit 1;
fi
 
echo Broken links:
 
mkdir /tmp/$$.lynx
cd /tmp/$$.lynx
 
lynx -traversal $1 > /dev/null
count=0;
 
sort -u reject.dat > links.txt
 
while read link;
do
  output=`curl -I $link -s | grep "HTTP/.*OK"`;
  if [[ -z $output ]];
  then
    echo $link;
    let count++
  fi
done < links.txt
 
[ $count -eq 0 ] && echo No broken links found.
</syntaxhighlight>
 
== Track changes to a website ==
<syntaxhighlight lang='bash'>
#!/bin/bash
#Desc: Script to track changes to webpage
 
if [ $# -ne 1 ];
then
  echo -e "$Usage: $0 URL\n"
  exit 1;
fi
 
first_time=0
# Not first time
 
if [ ! -e "last.html" ];
then
  first_time=1
  # Set it is first time run
fi
 
curl --silent $1 -o recent.html
 
if [ $first_time -ne 1 ];
then
  changes=$(diff -u last.html recent.html)
  if [ -n "$changes" ];
  then
    echo -e "Changes:\n"
    echo "$changes"
  else
    echo -e "\nWebsite has no changes"
  fi
else
  echo "[First run] Archiving.."
 
fi
 
cp recent.html last.html
</syntaxhighlight>
 
== POST/GET ==
Look at a web site source and look for the 'name' field in a <input> tag.
 
http://www.w3schools.com/html/html_forms.asp
<syntaxhighlight lang='bash'>
# -d is used for posting in curl
curl URL -d "postvar1=var1&postvar2=var2"
# OR the 'get' command with the 'post-data' option
get URL --post-data "postvar1=var1&postvar2=var2" -O out.html
</syntaxhighlight>
 
== Change detection of a website ==
* http://bhfsteve.blogspot.com/2013/03/monitoring-web-page-for-changes-using.html
* https://www.reddit.com/r/commandline/comments/2e2bkj/linux_software_to_monitor_website_changes/
* http://specto.sourceforge.net/ and https://www.linux.com/news/monitor-web-page-changes-specto
* http://www.mostlymaths.net/2010/01/cron-diff-wget-watch-changes-in-webpage.html
 
= Working with Files =
== '''iconv''' command ==
* [https://www.howtogeek.com/iconv-command-linux/ How To Use the iconv Command on Linux]
* [http://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/ How to Convert Files to UTF-8 Encoding in Linux]
* https://stackoverflow.com/questions/11316986/how-to-convert-iso8859-15-to-utf8
 
<pre>
$ file test.R
test.R: ISO-8859 text, with CRLF line terminators
$ iconv -f ISO-8859 -t UTF-8 test.R  # 'ISO-8859' is not supported
$ iconv -t UTF-8 test.R              # partial conversion??
$ iconv -f ISO-8859-1 -T UTF-8 test.R # Works
</pre>
 
== '''nl''' command ==
Add line numbers to a text file
<syntaxhighlight lang='bash'>
$ cat demo_file
THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.
this line is the 1st lower case line in this file.
This Line Has All Its First Character Of The Word With Upper Case.
 
Two lines above this line is empty.
And this is the last line.
$ nl demo_file
    1 THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.
    2 this line is the 1st lower case line in this file.
    3 This Line Has All Its First Character Of The Word With Upper Case.
     
    4 Two lines above this line is empty.
    5 And this is the last line.
</syntaxhighlight>
 
== '''file''' command ==
<pre style="white-space: pre-wrap; /* CSS 3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* IE 5.5+ */ " >
$ file thumbs/g7.jpg
thumbs/g7.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=10, orientation=upper-left, xresolution=134, yresolution=142, resolutionunit=2, software=Adobe Photoshop CS Windows, datetime=2004:03:31 22:28:58], baseline, precision 8, 100x75, frames 3
 
$ file index.html
index.html: HTML document, ASCII text
 
$ file 2742OS_5_01.sh
2742OS_5_01.sh: Bourne-Again shell script, ASCII text executable
 
$ file R-3.2.3.tar.gz
R-3.2.3.tar.gz: gzip compressed data, last modified: Thu Dec 10 03:12:50 2015, from Unix
</pre>
 
== date ==
[https://www.networkworld.com/article/3481602/displaying-dates-and-times-your-way-in-linux.html Displaying dates and times your way in Linux]
 
== print by skipping rows ==
http://stackoverflow.com/questions/604864/print-a-file-skipping-x-lines-in-bash
<syntaxhighlight lang='bash'>
$ tail -n +<N+1> <filename>  # excluding first N lines
                            # print by starting at line N+1.
$ tail -n +11 /tmp/myfile    # starting at line 11, or skipping the first 10 lines
</syntaxhighlight>
 
== '''tail -f''' (follow) ==
When we use the '-f' (follow) option, we can monitor a growing file. For example, we can create a new file called tmp.txt and run 'tail -f tmp.txt'. Now we open another terminal and run 'for i in {0..100}; do sleep 2; echo $i >> ~/output.txt ; done'. We will see in the 1st terminal that the content of tmp.txt is changed.
 
A practical example is
* Monitor system change
<syntaxhighlight lang='bash'>
sudo tail -f /var/log/syslog
</syntaxhighlight>
* Monitor a process and terminate itself when a give process dies
<syntaxhighlight lang='bash'>
PID=$(pidof Foo)
tail -f textfile --pid $PID
</syntaxhighlight>
A process Foo (eg. gedit) is appending data to a file, the tail -f should be executed until the process Foo dies.
 
== Low-level File Access ==
* file descriptors: 0 means standard input, 1 means standard output, 2 means standard error.
* '''size_t write(int fildes, const void *buf, size_t nbytes);'''
<pre>
#include <unistd.h>
#include <stdlib.h>
int main()
{
  if ((write(1, "Here is some data\n", 18)) != 17)
    write(2, "A write error has occurred on file descriptor\n", 46);
  exit(0);
}
</pre>
* '''size_t read(int fildes, void *buf, size_t nbytes);''' returns the number of data bytes actually read. If a read call returns 0, it had nothing to read; it reached the end of the file. An error on the call will cause it to return -1.
* To create a new file descriptor we use the open system call. '''int open(const char *path, int oflags, mode_t mode);'''
 
* The next program will do file copy.
<pre>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
  char c;
  int in, out;
  in = open("file.in", O_RDONLY);
  out = open("file.out", O_WRONLY|O_CREAT, S_IRUSER|S_IWUSR);
  while(read(in,&c,1) == 1)
    write(out,&c,1)
  exit(0);
}
</pre>
 
== The Standard I/O Library ==
* fopen, fclose
* fread, fwrite
* fflush
* fseek
* fgetc, getc, getchar
* fputc, putc, putchar
* fgets, gets
* printf, fprintf and sprintf
* scanf, fscanf and sscanf
 
== Formatted Input and Output ==
* prinf, fprintf and sprintf
* scanf, fscanf and sscanf
 
== Stream Errors ==
* [http://www.howtogeek.com/223850/how-do-you-run-a-command-in-the-background-with-no-output-unless-there-is-an-error/ How do You Run a Command in the Background with No Output Unless There is an Error?]
 
== File and Directory Maintenance ==


== Scanning Directories ==
== Scanning Directories ==
* opendir, closedir
* opendir, closedir
* readdir
* readdir
* telldir
* telldir
* seekdir
* seekdir
 
 
= UNIX environment =
= UNIX environment =
== Logging ==
== Logging ==
== Resources and Limits ==
== Resources and Limits ==
 
 
= Terminals =
= Terminals =
== Reading from and Writing to the Terminal ==
== Fun command line utilities ==
== The termios Structure ==
[https://ostechnix.com/fun-linux-command-line-tools/ Turn Your Terminal Into A Playground: 20+ Funny Linux Command Line Tools]: cowsay, fortune, figlet, sl, ASCIIquarium, cmatrix, lolcat, ponysay, charasay, party parrot, ternimal, paclear, lavat, pond, cbonsai, dotacat, finger, pinky, no more secrets, hollywood, bucklespring, bb, toilet, sl-alt, fetch utilities, telehack, display star wars episode.
== Terminal Output ==
 
 
== Reading from and Writing to the Terminal ==
== Detecting Keystokes ==
 
 
== The termios Structure ==
= Curses =
== Terminal Output ==
A technique between command line and full GUI.  
 
 
== Detecting Keystokes ==
Example: vi.
 
 
= Curses =
= Data Management =
A technique between command line and full GUI.  
 
 
= Development Tools =
Example: vi.
== make and Makefiles ==
 
* [http://kbroman.org/minimal_make/ minimal make] A minimal tutorial on make from Karl Broman.
= Data Management =
* http://makefiletutorial.com/index.html
 
 
= Development Tools =
== Writing a Manual Page ==
== Books ==
 
[https://www.hpe.com/us/en/insights/articles/top-linux-developers-recommended-programming-books-1808.html Top Linux developers' recommended programming books]
== Distributing Software ==
 
=== The patch Program ===
== GNU Make and Makefiles ==
* [http://kbroman.org/minimal_make/ minimal make] A minimal tutorial on make from Karl Broman.
* http://makefiletutorial.com/index.html
* [http://gromnitsky.users.sourceforge.net/articles/notes-for-new-make-users/ Notes for new Make users]
 
== Writing a Manual Page ==
 
== Distributing Software ==
=== The patch Program ===
 
= Debugging =
== debug a bash shell ==
[https://www.cyberciti.biz/tips/debugging-shell-script.html How To Debug a Bash Shell Script Under Linux or UNIX]


= Debugging =
== gdb ==
== gdb ==



Latest revision as of 16:38, 21 November 2024

Shell Programming

Some Resources

Understand shell command options

explainshell.com. For example, https://explainshell.com/explain?cmd=rsync+-avz+--progress+--partial+-e

Check shell scripts

How To Validate the Syntax of a Linux Bash Script Before Running It

ShellCheck & download the binary from Launchpad.

If a statement missed a single quote the shell may show an error on a different line (though the error message is still useful). Therefore it is useful to verify the syntax of the script first before running it.

Writing Secure Shell Scripts

Writing Secure Shell Scripts

Bioinformatics

Bioinformatics one-liners

Data science

Data Science at the Command Line Obtain, Scrub, Explore, and Model Data with Unix Power Tools

Special characters

15 Special Characters You Need to Know for Bash

Progress bar

How to Add a Simple Progress Bar in Shell Script

Simple calculation

echo

echo $(( 11/5 ))
# or
echo $((11/5))

Note: only return an integer number.

bc: an arbitrary precision calculator language

bc -l <<< "11/5"
# Without '-l' we only get the integer part
# Or interactive
bc -i
scale=5
11/5
quit

where -l means to use the predefined math routines and <<< is a here string. Note bc returns a real number.

Here documents

<<

#!/bin/bash

cat <<!FUNKY!
hello
this is a here
document
$var on line
!FUNKY!

To disable pathname/parameter/variable expansion, command substitution, arithmetic expansion such as $HOME, ..., add quotes to EOF; 'EOF'.

<<< here string

http://linux.die.net/abs-guide/x15683.html

Redirect

stdin, stdout, and stderr

What Are stdin, stdout, and stderr on Linux?

Redirecting output. File descriptor number 1 (2) means standard output (error).

./myProgram > stdout.txt        # redirect std out to <stdout.txt>
./myProgram 2> stderr.txt       # redirect std err to <stderr.txt> by using the 2> operator
./myProgram > stdout.txt 2> stderr.txt # combination of above two
./myProgram > stdout.txt 2>&1   # redirect std err to std out <stdout.txt>
./myProgram >& /dev/null        # prevent writing std out and std err to the screen
ps >> outptu.txt                # append

Redirecting input

./myProgram < input.txt

Using cat or echo to create a new file that needs sudo right

The following command does not work

sudo cat myFile > /opt/myFile

Solution 1 (sudo sh -c). We can use something like

sudo sh -c 'cat myFile > /opt/myFile'

Solution 2 (sudo tee). See 'How To Configure Nginx as a Web Server and Reverse Proxy for Apache on One Ubuntu 16.04 Server'

echo "<?php phpinfo(); ?>" | sudo tee /var/www/html/info.php

If we want to append something to an existing file, use -a option in the tee command.

Create a simple text file with multiple lines; write data to a file in bash script

Each of the methods below can be used in a bash script.

# Method 1: printf. We can add \t for tab delimiter
$ printf '%s \n' 'Line 1' 'Line 2' 'Line 3' > out.txt

# Method 2: echo. We can add \t for tab delimiter
$ echo -e 'Line 1\t12\t13
$ Line 2\t22\t23
$ Line 3\t32\t33' > out.txt

# Method 3: echo
$ echo $'Line 1\nLine 2\nLine 3' > out.txt

# Method 4: here document, http://tldp.org/LDP/abs/html/here-docs.html
# For the TAB character, use Ctrl-V, TAB.
# Note that first line can be: cat <<EOF > out.txt
# The filename can be a variable if this is used inside a bash file
$ cat > out.txt <<EOF
> line1   Second
> lin2    abcd
> line3ss dkflaf
> EOF
$

See also How to use a here documents to write data to a file in bash script

To escape the quotes, use a back slash. For example

echo $'#!/bin/bash\nmodule load R/3.6.0\nRscript --vanilla -e "rmarkdown::render(\'gse6532.Rmd\')"'

will obtain

#!/bin/bash
module load R/3.6.0
Rscript --vanilla -e "rmarkdown::render('gse6532.Rmd')"

>&

&> file is not part of the official POSIX shell spec, but has been added to many Bourne shells as a convenience extension (it originally comes from csh). In a portable shell script (and if you don't need portability, why are you writing a shell script?), use > file 2>&1 only.

Redirect Output and Errors To /dev/null

http://www.cyberciti.biz/faq/how-to-redirect-output-and-errors-to-devnull/

command > /dev/null 2>&1
# OR
command &>/dev/null

In addition we can put a process in the background by adding the '&' sign; see the dclock example.

tee -redirect to both a file and the screen same time

To redirect to both a file and the screen the same time, use tee command. See

command1 |& tee log.txt
## or ##
command1 -arg |& tee log.txt
## or ##
command1 2>&1 | tee log.txt

# use the option '-a' for *append*
echo "new line of text" | sudo tee -a /etc/apt/sources.list

# redirect output of one command to another
ls file* | tee output.txt | wc -l

# streaming file (e.g. running an arduino sketch on Udoo)
# for streaming files, cp command (still need Ctrl + c) will not 
# show anything on screen though copying is executed.
cat /dev/ttymxc3 | tee out.txt      # Ctrl + c
command > >(tee stdout.log) 2> >(tee stderr.log >&2)

Methods To Create A File In Linux

10 Methods To Create A File In Linux

Prepend

BASH Prepend A Text / Lines To a File

Pipe

The operator is |.

ps > psout.txt
sort psout.txt > pssort.out

can be simplified to

ps | sort > pssort.out

For example,

$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync

$cat /etc/passwd | cut -d: -f7 | sort | uniq -c | sort -nr
     18 /bin/sh
     13 /bin/false
      2 /bin/bash
      1 /bin/sync

where cut command will extract the 7th field separated by the : character and write to the output stream. sort command will sort alphabetically sorts the line it reads from its input and returns the new sort to its output. The uniq command will remove and count duplicated lines. The final sort command will sort its input numerically in reverse order.

Dash (-) at the end of a command mean?

Process substitution

https://en.wikipedia.org/wiki/Process_substitution

Powerfulness of pipes

Consider the following commands (samtools gives its output on stdout which is a good opportunity to use pipes)

samtools mpileup -go temp.bcf -uf genome.fa  dedup.bam
bcftools call -vmO v -o sample1_raw.vcf temp.bcf

The disadvantage of this approach is it will create a temporary file (temp.bcf in this case). If the size of the temporary file is enormous large (several hundred of GB), it will waste/eat up the hard disk space no to say the time used to create the temporary file. If we use pipes, we can save the time and disk space of the temporary file.

samtools mpileup -uf genome.fa  dedup.bam | bcftools call -vmO v -o sample1_raw.vcf

Send a stdout to a remote computer

See here (bypass SSH password) for a case (utilize cat, ssh and >> commands).

Execute a bash script downloaded (without saving first) from the internet

See the example of install Gitlab

sudo curl -sS https://packages.gitlab.com/install/repositories/gitlab/raspberry-pi2/script.deb.sh | sudo bash

where -s means silent and -S means showing error messages if it fails. Note that curl will download the file to standard output. So using the pipe operator is a reasonable sequence after running the curl.

Use wget to download and decompress at one line

https://stackoverflow.com/questions/16262980/redirect-pipe-wget-download-directly-into-gunzip

wget -O - ftp://ftp.direcory/file.gz | gunzip -c > file.out

where "-O -" means to print to standard output (sort of like the default behavior of "curl"). See https://www.gnu.org/software/wget/manual/wget.html

Use pipe and while loop to process multiple files

See an example at while.

Pipe vs redirect

  • Pipe is used to pass output to another program or utility.
  • Redirect is used to pass output to either a file or stream.

In other words, thing1 | thing2 does the same thing as thing1 > temp_file && thing2 < temp_file.

Shebang (#!)

A shebang is the character sequence consisting of the characters number sign and exclamation mark (that is, "#!") at the beginning of a script. See the Wikipedia page.

The syntax looks like

#! interpreter [optional-arg]

For example,

  • #!/bin/sh — Execute the file using sh, the Bourne shell, or a compatible shell
  • #!/bin/csh -f — Execute the file using csh, the C shell, or a compatible shell, and suppress the execution of the user’s .cshrc file on startup
  • #!/usr/bin/perl -T — Execute using Perl with the option for taint checks

When Is It Better to Use #!/bin/bash Instead of #!/bin/sh in a Shell Script?

http://www.howtogeek.com/276607/when-is-it-better-to-use-bin-bash-instead-of-bin-sh-in-a-shell-script/

Howto Make Script More Portable With #!/usr/bin/env As a Shebang

https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html

This is useful if the interpreter location is different on Linux and Mac OSs.

# Linux
$ which Rscript
/usr/bin/Rscript
# Mac
$ which Rscript
/usr/local/bin/Rscript

We can use the following on the first line of the shell script.

#!/usr/bin/env Rscript

Comments

For a single line, we can use the '#' sign. Shell Script Put Multiple Line Comments under Bash/KSH.

For a block of code, we use

#!/bin/bash
echo before comment
: <<'END'
bla bla
blurfl
END
echo after comment

Variables

food=Banana
echo $food
food="Apple"
echo $food

When do I need to use the export command

Consider the following

MY_DIRECTORY=/path/to/my/directory
export MY_DIRECTORY
./my_script.sh

If you don’t use the export command in the above example, the MY_DIRECTORY variable will not be available to the my_script.sh script. It will only be available within the current shell session as a local shell variable.

When you set a variable in a shell session without using the export command, it is only available within that shell session as a local shell variable. This means that the variable and its value are only accessible within the current shell session and are not passed to child processes (e.g. my_script.sh) or other programs that are started from the command line.

Cf. When I put LS_COLORS in the .bashrc file, I don't need to use the export command.

export -n command: remove from environment

https://linuxconfig.org/learning-linux-commands-export

It will export an environment variable to the subshell/forked process. For example

$ export MYVAR=10      # export a variable
$ export -n MYVAR      # remove a variable

To see the current process ID, use

echo $$

To create a new process, use

bash

When using the export command without any option and arguments it will simply print all names marked for an export to a child process.

$ export
declare -x EDITOR="nano"
declare -x HISTTIMEFORMAT="%d/%m/%y %T "
declare -x HOME="/home/brb"
declare -x LANG="en_US.UTF-8"
declare -x LESSCLOSE="/usr/bin/lesspipe %s %s"
declare -x LESSOPEN="| /usr/bin/lesspipe %s"
declare -x LOGNAME="brb"
...
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
declare -x PWD="/home/brb"
declare -x SHELL="/bin/bash"
...
declare -x USER="brb"
declare -x VISUAL="nano"

echo command

String manipulation

http://www.thegeekstuff.com/2010/07/bash-string-manipulation/

dirname and basename commands

http://www.tldp.org/LDP/LG/issue18/bash.html

# On directories
$ dirname ~/Downloads
/home/chronos/user
$ basename ~/Downloads
Downloads

# On files
$ dirname ~/Downloads/DNA_Helix.zip
/home/chronos/user/Downloads

$ basename ~/Downloads/DNA_Helix.zip
DNA_Helix.zip
$ basename ~/Downloads/DNA_Helix.zip .zip
DNA_Helix
$ basename ~/Downloads/annovar.latest.tar.gz
annovar.latest.tar.gz
$ basename ~/Downloads/annovar.latest.tar.gz .gz
annovar.latest.tar
$ basename ~/Downloads/annovar.latest.tar.gz .tar.gz
annovar.latest
$ basename ~/Downloads/annovar.latest.tar.gz .latest.tar.gz
annovar

Escape characters and quotes

echo $USER  # brb

echo My name is $USER

echo "My name is $USER"  # My name is brb

echo 'My name is $USER'  # 'My name is $USER'; single quote will not interpret the variable
          # we use the single quotes if we want to present the characters literally or 
          # pass the characters to the shell.
grep '.*/udp' /etc/services  # normally . and * and slash characters have special meaning
   
echo \$USER   # we escape $ so $ lost its special meaning

echo '\'

echo \'text\'  # 'text'

When to use double quotes with a variable

when to use double quotes with a variable in shell script?

Concatenate string variables (not safe)

http://stackoverflow.com/questions/4181703/how-can-i-concatenate-string-variables-in-bash

a='hello'
b='world'
c=$a$b
echo $c

# Bash also supports a += operator 
$ A="X Y"
$ A+="Z"
$ echo "$A"

Often we need to use "double quotes" around the string variables if the string variables represent some directories.

mkdir "tmp 1"
touch "tmp 1/tmpfile"

tmpvar="tmp 1"
echo tmpvar
# tmp 1

ls $tmpvar
ls: cannot access tmp: No such file or directory
ls: cannot access 1: No such file or directory
ls "$tmpvar"
# tmpfile

However, for integers

echo $a
24
((a+=12))
echo $a
36

Note that the double parentheses construct in ((a+=12)) permits arithmetic expansion and evaluation.

${parameter} - Concatenate a string variable and a constant string; variable substitution

Parameter substitution ${}. Cf $() for command execution

x=foo
y=bar
z=$x$y        # $z is now "foobar"
z="$x$y"      # $z is still "foobar"
z="$xand$y"   # does not work
z="${x}and$y" # does work, "fooandbar"

And

your_id=${USER}-on-${HOSTNAME}
echo "$your_id"

echo "Old \$PATH = $PATH"
PATH=${PATH}:/opt/bin  # Add /opt/bin to $PATH for duration of script.
echo "New \$PATH = $PATH"

And using "{" in order to create a new string based on an existing variable

pdir="/tmp/files/today"
fname="report"
mkdir -p $pdir

touch $pdir/$fname  # OK
ls -l $pdir/$fname

touch $pdir/$fname_new  # No error but it does not do anything
                        # because this variable does not exist yet
ls $pdir/$fname_new

touch $pdir/${fname}_new
ls $pdir/${fname}_new   # Works

$(command) - Command Execution and Assign Output of Shell Command To a Variable; Command substitution

Bash Assign Output of Shell Command To Variable

$(command)
`command`    # ` is a backquote/backtick, not a single quotation sign
             # this is a legacy support; not recommended by https://www.shellcheck.net/

Note all new scripts should use the $(...) form, which was introduced to avoid some rather complex rules.

Example 1.

sudo apt-get install linux-headers-$(uname -r)

Example 2.

user=$(echo "$UID")

Example 3.

#!/bin/sh
echo The current directory is $PWD
echo The current users are $(who)
sudo chown `id -u` SomeDir  # change the ownership to the current user. Dangerous!
                            # Or sudo chown `whoami` SomeDirOrSomeFile
exit 0

Example 4. Create a new file with automatically generated filename

touch file-$(date -I)

Example 5. Use $(your expression) to run nest expressions. For example,

# cd into the directory containing the 'touch' command. 
cd $(dirname $(type -P touch))

BACKUPDIR=/nas/backup
LASTDAYPATH=${BACKUPDIR}/$(ls ${BACKUPDIR} | tail -n 1)

The concept of putting the result of a command into a script variable is very powerful, as it makes it easy to use existing commands in scripts and capture their output.

Arithmetic Expansion

$((...))

is a better alternative to the expr command. More examples:

for i in $(seq 1 3)
  do echo SRR$(( i + 1027170 ))'_1'.fastq 
done

Note that the single quote above is required. The above will output SRR1027171_1.fastq, SRR102172_1.fastq and SRR1027173_1.fastq.

Parameter Expansion

${parameter}

Double Parentheses (())

Bash Shell Scripting for beginners (Part 1) fedoramagazine. Double parentheses are simple, they are for mathematical equations.

extract substring

https://www.cyberciti.biz/faq/how-to-extract-substring-in-bash/

${parameter:offset:length}

Example:

## define var named u ##
u="this is a test"

var="${u:10:4}"
echo "${var}"

Or use the cut command.

u="this is a test"
echo "$u" | cut -d' ' -f 4
echo "$u" | cut --delimiter=' ' --fields=4
##########################################
## WHERE
##   -d' ' : Use a whitespace as delimiter
##   -f 4  : Select only 4th field
##########################################
var="$(cut -d' ' -f 4 <<< $u)"
echo "${var}"

Environment variables

How to Set Environment Variables in Bash on Linux

$HOME
$PATH
$0 -- name of the shell script
$# -- number of parameters passed (so it does include the program itself)
$$ process ID of the shell script, often used inside a script for generating unique temp filenames
$? -- the exit value of the last run command; 0 means OK and none-zero means something wrong
$_ -- previous command's last argument

Example 1 (check if a command run successfully):

some_command
if [ $? -eq 0 ]; then
    echo OK
else
    echo FAIL
fi
# OR
if some_command; then
    printf 'some_command succeeded\n'
else
    printf 'some_command failed\n'
fi

$ tabix -f -p vcf ~/SeqTestdata/usefulvcf/hg19/CosmicCodingMuts.vcf.gz
brb@brb-P45T-A:/tmp$ echo $?
0
$ tabix -f -p vcf ~/Downloads/CosmicCodingMuts.vcf.gz
Not a BGZF file: /home/brb/Downloads/CosmicCodingMuts.vcf.gz
tbx_index_build failed: /home/brb/Downloads/CosmicCodingMuts.vcf.gz
$ echo $?
1

Example 2 (check whether a host is reachable)

ping DOMAIN -c2 &> /dev/null
if [ $? -eq 0 ];
then
  echo Successful
else
  echo Failure
fi

where -c is used to limit the number of packets to be sent and &> /dev/null is used to redirect both stderr and stdout to /dev/null so that it won't be printed on the terminal.

Example 3 (check if users have supply a correct number of parameters):

#!/bin/bash
if [ $# -ne 2 ]; then
  echo "Usage: $0 ProgramName filename"
  exit 1
fi

match_text=$1
filename=$2

Example 4 (make a new directory and cd to it)

mkdir -p "newDir/subDir"; cd "$_"

How to List Environment Variables

How to List Environment Variables on Linux

printenv

Unset/Remove an environment variable

$ export MSG="HELLO WORLD"
$ echo $MSG
HELLO WORLD
$ unset MSG
$ echo $MSG

$

Set an environment variable and run a command on the same line, env command

Parameter variables

$1, $2, .... -- parameters given to the script
$* -- list of all the parameters, in a single variable
$@ -- subtle variation on $*. 
$! -- the process id of the last command run in the background.

Example 1.

#!/bin/bash
echo "$1 likes to eat $2 and $3 every day."
echo "bye:-)"

Example 2.

$ touch /tmp/tmpfile_$$

$ set foo bar bam
$ echo $#
3
$ echo $@
foo bar bam
$ set foo bar bam &
[1] 28212
$ echo $!
28212
[1]+  Done                    set foo bar bam

Example 3. $@ parameter for a variable number of parameters

$ cat stats.sh
for FILE1 in "$@"
do
wc $FILE1
done
$ sh stats.sh songlist1 songlist2 songlist3

We can also use parentheses around the variable name.

QT_ARCH=x86_64
QT_SDK_BINARY=QtSDK-4.8.0-${QT_ARCH}.tar.gz
QT_SD_URL=https://xxx.com/$QT_SDK_BINARY

How do I rename the extension for a batch of/multiple files? See man bash Shell Parameter Expansion

# Solution 1:
for file in *.html; do
    mv "$file" "`basename "$file" .html`.txt"
done

# Solution 2:
for file in *.html
do
 mv "$file" "${file%.html}.txt"
done

Get filename without Path

How to Extract Filename & Extension in Shell Script

fullfilename="/var/log/mail.log"
filename=$(basename "$fullfilename")
echo $filename

Extension without filename

How to Extract Filename & Extension in Shell Script

fullfilename="/var/log/mail.log"
filename=$(basename "$fullfilename")
ext="${filename##*.}"
echo $ext

Discard the extension name and "%" symbol

$ vara=fillename.ext
$ echo $vara
fillename.ext
$ echo ${vara::-4} # works on Bash 4.3, eg Ubuntu
fillename
$ echo ${vara::${#vara}-4} # works on Bash 4.1, eg Biowulf readhat

http://stackoverflow.com/questions/27658675/how-to-remove-last-n-characters-from-a-bash-variable-string

Another way (not assuming 3 letters for the suffix) https://www.cyberciti.biz/faq/unix-linux-extract-filename-and-extension-in-bash/

dest="/nas100/backups/servers/z/zebra/mysql.tgz"
## get file name i.e. basename such as mysql.tgz
tempfile="${dest##*/}"
 
## display filename 
echo "${tempfile%.*}"

Or better with (See Extract filename and extension in Bash and Shell parameter expansion). How to Extract Filename & Extension in Shell Script

fullfilename="/var/log/mail.log"
filename=$(basename "$fullfilename")
fname="${filename%.*}"
echo $fname   # mail

$ UEFI_ZIP_FILE="UDOOX86_B02-UEFI_Update_rel102.zip"
$ UEFI_ZIP_DIR="${UEFI_ZIP_FILE%.*}"
$ echo $UEFI_ZIP_DIR
UDOOX86_B02-UEFI_Update_rel102

$ FILE="example.tar.gz"
$ echo "${FILE%%.*}"
example
$ echo "${FILE%.*}"
example.tar
$ echo "${FILE#*.}"
tar.gz
$ echo "${FILE##*.}"
gz

Space in variable value

Suppose we have a script file called 'foo' that can remove spaces from a file name. Note: tr command is used to delete characters specified by the '-d' parameter.

#!/bin/sh
NAME=`ls $1 | tr -d ' '`
echo $NAME
mv $1 $NAME

Now we try the program:

$ touch 'file 1.txt'
$ ./foo 'file 1.txt'
ls: cannot access file: No such file or directory
ls: cannot access 1.txt: No such file or directory

mv: cannot stat ‘file’: No such file or directory

The way to fix the program is to use double quotes around $1

#!/bin/sh
NAME=`ls "$1" | tr -d ' '`
echo $NAME
mv "$1" $NAME

and test it

$ ./foo "file 1.txt"
file1.txt

If we concatenate the variable, put the double quotes around the variables, not the whole string.

$ rm "$outputDir/tmp/$tmpfd/tmpa"  # fine

$ rm "$outputDir/tmp/$tmpfd/tmp*.txt"
rm: annovar6-12/tmp/tmp_bt20_raw/tmp*.txt: No such file or directory

$ rm "$outputDir"/tmp/$tmpfd/tmp*.txt

See https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters

getopts function - parse options from shell script command line

Check if command line argument is missing (? :) and specifying the default (:-)

Search for ternary (conditional) operator and check out parameter Expansion in Bash Reference Manual. 74 Bash Operators Examples

#!/usr/bin/env bash

NAME=${1?Error: no name given}
NAME2=${2:-friend}

echo "HELLO! $NAME and $NAME2"

Shell expansion

https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html#Shell-Expansions

Curly brace {} expansion and array

cp -v *.{txt,jpg,png} destination/
  • All about {Curly Braces} in Bash
    • Array Builder
      echo {0..10}
      
      echo {10..0..2}
      echo {z..a..2}
      
      mkdir test{10..12}  # test10, test11, test12 directories
      rm -rf test{10..12}
    • Parameter expansion
      # convert jpg to png
      for i in *.jpg; do convert $i ${i%jpg}png; done
      
      a="Hello World!"
      echo Goodbye${a#Hello}
      # Goodbye World!
    • Output Grouping
  • How to Use Arrays in a Bash Script

Square brackets

Using Square Brackets in Bash: Part 1

Globbing: Using wildcards to get all the results that fit a certain pattern is precisely

ls *.jpg  # the asterisk means "zero or more characters"
ls d*k?   # ?, which means "exactly one character"

touch file0{0..9}{0..9} # This will create files file000, file001, file002, etc., through file097, file098 and file099.
ls file0[78]?           #  list the files in the 70s and 80s
ls file0[259][278]      #  list file022, file027, file028, file052, file057, file058, file092, file097, and file98

Conditions

We can use the test command to check if a file exists. The command is test -f <filename>.

[] is just the same as writing test, and would always leave a space after the test word.

if test -f fred.c; then ...; fi

if [ -f fred.c ]
then
...
fi

if [ -f fred.c ]; then
...
fi

Boolean variables

How to declare Boolean variables in bash and use them in a shell script

failed=0 # False
jobdone=1 # True
## more readable syntax ##
failed=false 
jobdone=true

if [ $failed -eq 1 ]
then
    echo "Job failed"
else
    echo "Job done"
fi

We can define them as a string and make our code more readable.

What is the difference between test, [ and [[ ?

http://mywiki.wooledge.org/BashFAQ/031

[ ("test" command) and [[ ("new test" command) are used to evaluate expressions. [[ works only in Bash, Zsh and the Korn shell, and is more powerful; [ and test are available in POSIX shells.

test implements the old, portable syntax of the command. In almost all shells (the oldest Bourne shells are the exception), [ is a synonym for test (but requires a final argument of ]).

[[ is a new improved version of it, and is a keyword, not a program.

String comparison

==  ==> strings are equal (== is a synonym for =)
=   ==> strings are equal 
!=  ==> strings are not equal
-z  ==> string is null
-n  ==> string is not null

For example, the following script check if users have provided an argument to the script.

$!/bin/sh
if [ -z "$1"]; then
  echo "Provide a \"file name\", using quotes to nullify the space."
  exit 1
fi
mv -i "$1" `ls "$1" | tri -d ' '`

where the -i parameter is to reconfirm the overwrite by the mv command.

To check whether Xcode (either full Xcode or command line developer tools only) has been installed or not on Mac

if [ -z "$(xcode-select -p 2>&1 | grep error)" ]
then 
   echo "Xcode has been installed";
else
   echo "Xcode has not been installed";
fi

# only print out message if xcode was not found
if [ -n "$(xcode-select -p 2>&1 | grep error)" ]
then 
   echo "Xcode has not been installed";
fi

note the 'error' keyword comes from macOS when the Xcode has not been installed. Also the double quotes around $( ) is needed to avoid the error [: too many arguments” error.

Check if string starts with such as "#".

if [[ "$var" =~ ^#.*  ]]; then
    echo "yes"
fi

Arithmetic/Integer comparison

expr1 -eq expr2  ==> check equal
expr1 -ne expr2  ==> check not equal
expr1 -gt expr2  ==> expr1 > expr2
expr1 -ge expr2  ==> expr1 >= expr2
expr1 -lt expr2  ==> expr1 < expr2
expr1 -le expr2  ==> expr1 <= expr2
! expr  ==> opposite of expr

File conditionals

-d file  ==> True if the file is a directory
-e file  ==> True if the file exists
-f file  ==> True if the file is a regular file
-r file  ==> True if the file is readable
-s file  ==> True if the file has non-zero size
-w file  ==> True if the file is writable
-x file  ==> True if the file is executable

Example 1: Suppose we want to know if the first argument (if given) match a specific string. We can use (note the space before and after '==')

#!/bin/bash
if [ $1 == "console" ]; then
  echo 'Console'
else
  echo 'Non-console'
fi

Example 2: Check If File Is Empty Or Not Using Shell Script

#!/bin/bash
_file="$1"
[ $# -eq 0 ] && { echo "Usage: $0 filename"; exit 1; }
[ ! -f "$_file" ] && { echo "Error: $0 file not found."; exit 2; }
 
if [ -s "$_file" ] 
then
	echo "$_file has some data."
        # do something as file has data
else
	echo "$_file is empty."
        # do something as file is empty 
fi

Check if running as root

if [ $UID -ne 0 ];
then
  echo "Run as root"
  exit 1;
fi

Control Structures

if

if condition
then
  statements
elif [ condition ]; then
  statements
else 
  statements
fi

For example, we can run a cp command if two files are different.

if ! cmp -s "$filesrc" "$filecur"
then
     cp $filesrc $filecur
fi

String Comparison

http://stackoverflow.com/questions/2237080/how-to-compare-strings-in-bash

answer=no
if [ -f "genome.fa" ]; then
  echo -n 'Do you want to continue [yes/no]: '
  read answer
fi

if [ "$answer" == "no" ]; then
echo AAA
fi

if [ "$answer"=="no" ]; then
# failed if condition
echo BBB
fi
  1. You want the quotes around $answer, because if $answer is empty.
  2. Space in bash is important.
    • Spaces between if and [ and ] are important
    • A space before and after the double equal signs is important all. So if we reply with 'yes', the code still runs 'echo BBB' statement.

while

while condition do
  statements
done

until

until condition
do 
  statements
done

case

How to Use Case Statements in Bash Scripts

Semicolon

Command1; command2; command3; command4

Every commands will be executed whether the execution is successful or not.

AND list &&

How To Run A Command After The Previous One Has Finished On Linux

statement1 && statement2 && statement3 && ...

If command1 finishes successfully then run command2.

touch /tmp/f1
echo "data" >/tmp/f2
[ -s /tmp/f1 ] 
echo $?    # 1
[ -s /tmp/f2 ]
echo $?    # 0

[ -s /tmp/f1 ] && echo "not empty" || echo "empty"  # empty
[ -s /tmp/f2 ] && echo "not empty" || echo "empty"  # not empty

OR list ||

statement1 || statement2 || statement3 || ...

If command1 fails then run command2.

For example,

codename=$(lsb_release -s -c)
if [ $codename == "rafaela" ] || [ $codename == "rosa" ]; then
  codename="trusty"
fi

Chaining rule (command1 && command2 || command3)

Coupled commands with control operators in Bash

10 Useful Chaining Operators in Linux with Practical Examples.

  • Ampersand Operator (&),
  • semi-colon Operator (;),
  • AND Operator (&&),
  • OR Operator (||),
  • NOT Operator (!),
  • AND – OR operator (&& – ||),
  • PIPE Operator (|),
  • Command Combination Operator {},
  • Precedence Operator (),
  • Concatenation Operator (\).

A combination of ‘AND‘ and ‘OR‘ Operator is much like an ‘if-else‘ statement.

$ ping -c3 www.google.com && echo "Verified" || echo "Host Down"

How to program with Bash: Syntax and tools

# command1 && command2
$ Dir=/root/testdir ; mkdir $Dir/ && cd $Dir

# command1 || command2
$ Dir=/root/testdir ; mkdir $Dir || echo "$Dir was not created."

# preceding commands ; command1 && command2 || command3 ; following commands
# "If command1 exits with a return code of 0, then execute command2, otherwise execute command3." 
$ Dir=/root/testdir ; mkdir $Dir && cd $Dir || echo "$Dir was not created."
$ Dir=~/testdir ; mkdir $Dir && cd $Dir || echo "$Dir was not created."

for + do + done

for variable in values
do 
  statements
done

The values can be an explicit list

i=1
for day in Mon Tue Wed Thu Fri
do
 echo "Weekday $((i++)) : $day"
done

or a variable

i=1
weekdays="Mon Tue Wed Thu Fri"
for day in $weekdays
do
 echo "Weekday $((i++)) : $day"
done
# Output
# Weekday 1 : Mon
# Weekday 2 : Tue
# Weekday 3 : Wed
# Weekday 4 : Thu
# Weekday 5 : Fri

Note that we should not put a double quotes around $weekdays variable. If we put a double quotes around $weekdays, it will prevent word splitting. See thegeekstuff article.

i=1
weekdays="Mon Tue Wed Thu Fri"
for day in "$weekdays"
do
 echo "Weekday $((i++)) : $day"
done
# Output
# Weekday 1 : Mon Tue Wed Thu Fri


To loop over all script files in a directory

FILES=/path/to/PATTERN*.sh
for f in $FILES;
do
(
   "$f"
)&
done
wait

OR

FILES="
file1
/path/to/file2
/path/to/file3
"
for f in $FILES;
do
(
   "$f"
)&
done
wait

Here we run the script in the background and wait to exit until all are finished.

See loop over files from cyberciti.biz.

Example 1: convert pdfs to tifs using ImageMagick

"for" looping over files, check cyberciti.biz)

outdir="../plosone"
indir="../fig"

if [[ ! -d  $outdir ]];
then
   mkdir $outdir
fi

in=(file1.pdf file2.pdf file3.pdf)

for (( i=0; i<${#in[@]} ; i++ ))
do
  convert -strip -units PixelsPerInch -density 300 -resample 300 \
          -alpha off -colorspace RGB -depth 8 -trim -bordercolor white \
          -border 1% -resize '2049x2758>' -resize '980x980<' +repage \
          -compress lzw $indir/${in[$i]} $outdir/Figure$[$i+1].tiff
done

Example 2: download with wget and parsing with 'sed'

A second example is to download all the (Ontario gasoline price) data with wget and parsing and concatenating the data with other *nix tools like 'sed':

# Download data
for i in $(seq 1990 2014)
        do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv
done

# Retain the header
head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv

# Loop over the files and use sed to extract the relevant lines
for i in $(seq 1990 2014)
        do
        tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv
        done

Example 3: download

Download all 20 sra files (60GB in total) from SRP032789.

for x in $(seq 1027175 1027180) 
   do wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP032/SRP032789/SRR$x/SRR$x.sra
done

https://github.com/MarioniLab/EmptyDrops2017/blob/master/data/download_10x.sh

for x in \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/neurons_900/neurons_900_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/1.1.0/293t/293t_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/1.1.0/jurkat/jurkat_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/t_4k/t_4k_raw_gene_bc_matrices.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/2.1.0/neuron_9k/neuron_9k_raw_gene_bc_matrices.tar.gz
do
    wget $x
    destname=$(basename $x) 
    stub=$(echo $destname | sed "s/_raw_.*//")
    mkdir -p $stub
    tar -xvf $destname -C $stub
    rm $destname
done

Example 4: convert files from DOS to Unix

Convert all files from DOS to Unix format

for f in *.txt; do   tr -d '\r' < $f > tmp.txt;   mv tmp.txt $f  ; done
# Or
for file in $*; do   tr -d '\r' < $f > tmp.txt;   mv tmp.txt $f  ; done

Example 5: print all files in a directory

for f in /etc/*.conf
do
   echo "$f"
done

Example 6: use ping to find all the live machines on the network

for ip in 192.168.0.{1..255} ;
do
  ping $ip -c 2 &> /dev/null ;
  
  if [ $? -eq 0 ];
  then
    echo $ip is alive
  fi

done

Example 7: sed on multiple files

for i in *.htm*; do sed -i 's/String1/String2/' "$i"; done

Note if the string contains special characters like forward slashes (eg https://www.google.com), we need to escape them by using the backslash sign.

Example 8: run in parallel

for ip in 192.168.0.{1..255} ;
do
   (
      ping $ip -c2 &> /dev/null ;
  
      if [ $? -eq 0 ];
      then
       echo $ip is alive
      fi
   )&
  done
wait

where we enclose the loop body in ()&. () encloses a block of commands to run as a subshell and & sends it to the background. wait waits for all background jobs to complete.

Good technique !!!

wait command

  • An example where we shall wait until files are deleted before continuing the script.
    cd /home/ubuntu
    
    if [ -d "R-devel" ]; then
        rm -rf "R-devel" &
        wait # Wait for the deletion to complete
        echo "R-devel folder deleted successfully."
    else
        echo "R-devel folder does not exist."
    fi
    
    wget -O - https://stat.ethz.ch/R/daily/R-devel.tar.gz | tar -xzk
    
    cd R-devel
    ./configure --prefix=/opt/R/devel --enable-R-shlib
    make

Functions

#!/bin/bash

fun () { echo "This is a function"; echo; }

fun () { echo "This is a function"; echo } # Error!
 
function quit {
   exit
}

function hello {
   echo Hello!
}

function e {
   echo $1 
}  
$ ./e World

How to find bash shell function source code on Linux/Unix

$ type -a function_name

# To list all function names
$ declare -F
$ declare -F | grep function_name
$ declare -F | grep foo

How do I find the file where a bash function is defined?

declare -F function_name

Function arguments

source ~/bin/setpath # add bgzip & tabix directories to $PATH

function raw2exon {
  # put your comments here
  inputvcf=$1
  outputvcf=$2
  inputbed=$3
  if [[ $4 ]]; then
    oldpath=$PWD
    cd $4
  fi
  
  bgzip -c $inputvcf > $inputvcf.gz
  tabix -p vcf $inputvcf.gz
  
  head -$(grep '#' $inputvcf | wc -l) $inputvcf > $outputvcf # header
  tabix -R $inputbed $inputvcf.gz >> $outputvcf
  wc -l $inputvcf
  wc -l $outputvcf
  rm $inputvcf.gz $inputvcf.gz.tbi
  if [[ $4 ]]; then
    cd $oldpath
  fi
}           

inputbed=S04380110_Regions.bed

raw2exon 'mu0001_raw.vcf' 'mu0001_exon.vcf' $inputbed ~/Downloads/

Exit function

exit command and the exit statuses

$ cat testfun.sh
#!/bin/bash
ping -q -c 1 $1 >/dev/null 2>&1
if [ $? -ne 0 ]
then
   echo "An error occurred while checking the server status".
   exit 3
fi

exit 0
$ chmod +x testfun.sh
$ ./testfun.sh www.cyberciti.biz999
An error occurred while checking the server status.
$ echo $?
3

List of commands

break  ==> escaping from an enclosing for, while or until loop
:      ==> null command
continue ==> make the enclosing for, while or until loo continue at the next iteration
.      ==> executes the command in the current shell
eval   ==> evaluate arguments
exec   ==> replacing the current shell with a different program
export ==> make the variable named as its parameter available in subshells
expr   ==> evaluate its arguments as an expression
printf ==> similar to echo
set    ==> sets the parameter variables for the shell. Useful for using fields in commands that output spaced-separated values
shift  ==> moves all the parameter variables down by one.
trap   ==> specify the actions to take on receipt of signals.
unset  ==> remove variables or functions from the environment.
mktemp ==> create a temporary file

Run the previous command

Understanding the exclamation mark (!) in bash

$ apt update  # Permission denied
$ sudo !!     # Equivalent sudo apt update

"!" invokes history expansion. To run the most recent command beginning with “foo”:

!foo
# Run the most recent command beginning with "service" as root
sudo !service

Cache console output on the CLI?

Try the ‘’’script’’’ command line utility to create a typescript of everything printed on your terminal.

To exit (to end script session) type ‘’’exit’’’ or logout or press control-D.

set -e, set -x and trap

Exit immediately if a command exits with a non-zero status. Type help set in command line. Very useful!

See also the trap command that is related to non-zero exit.

See

bash -x

Call your script with something like

bash –x –v hello_world.sh

OR

#!/bin/bash –x -v
echo Hello World!

where

  • -x displays commands and their results
  • -v displays everything, even comments and spaces

This is the same as using set -x in your bash script.

set -x example

Bash script

set -ex
export DEBIAN_FRONTEND=noninteractive

codename=$(lsb_release -s -c)
if [ $codename == "rafaela" ] || [ $codename == "rosa" ]; then
  codename="trusty"
fi

echo $codename
echo step 1
echo step 2

exit 0

Without -x option:

trusty
step 1
step 2

With -x option:

+ export DEBIAN_FRONTEND=noninteractive
+ DEBIAN_FRONTEND=noninteractive
++ lsb_release -s -c
+ codename=rafaela
+ '[' rafaela == rafaela ']'
+ codename=trusty
+ echo trusty
trusty
+ echo step 1
step 1
+ echo step 2
step 2
+ exit 0

trap and error handler

The syntax to use trap command is

trap command signal

For example,

$ cat traptest.sh
#!/bin/sh

trap 'rm -f /tmp/tmp_file_$$' INT
echo creating file /tmp/tmp_file_$$
date > /tmp/tmp_file_$$

echo 'press interrupt to interrupt ...'
while [ -f /tmp/tmp_file_$$ ]; do
  echo file exists
  sleep 1
done
echo the file no longer exists

trap - INT
echo creaing file /tmp/tmp_file_$$
date > /tmp/tmp_file_$$
echo 'press interrupt to interrupt ...'
while [ -f /tmp/tmp_file_$$ ]; do
  echo file exists
  sleep 1
done
echo we never get here
exit 0

will get an output like

$ ./traptest.sh
creating file /tmp/tmp_file_21389
press interrupt to interrupt ...
file exists
file exists
^Cthe file no longer exists
creaing file /tmp/tmp_file_21389
press interrupt to interrupt ...
file exists
file exists
^C

The first when we use trap, it will delete the file when we hit Ctrl+C. The second time when we use trap, we do not specify any command to be exected when an INT signal occurs. So the default behavior occurs. That is, the final echo and exit statements are never executed.

Note that the following two are different.

trap - INT
trap '' INT

The second command will IGNORE signals (Ctrl+C in this case) so if we apply this statement above, we will not be able to use Ctrl+C to kill the execution.

DEBUG trap to step through line by line

You can use the "DEBUG" trap to step through a bash script line by line

Bash shell find out if a command exists or not

http://www.cyberciti.biz/faq/unix-linux-shell-find-out-posixcommand-exists-or-not/

POSIX

POSIX built-in commands

# command -v will return >0 when the command1 is not found
command -v command1 >/dev/null && echo "command1 Found In \$PATH" || echo "command1 Not Found in \$PATH"

$ help command
command: command [-pVv] command [arg ...]
    Execute a simple command or display information about commands.
    
    Runs COMMAND with ARGS suppressing  shell function lookup, or display
    information about the specified COMMANDs.  Can be used to invoke commands
    on disk when a function with the same name exists.
    
    Options:
      -p	use a default value for PATH that is guaranteed to find all of
    	the standard utilities
      -v	print a description of COMMAND similar to the `type' builtin
      -V	print a more verbose description of each COMMAND
    
    Exit Status:
    Returns exit status of COMMAND, or failure if COMMAND is not found.

$ type command     
command is a shell builtin
$ type export
export is a shell builtin
$ type wget
wget is /usr/bin/wget
$ type tophat
-bash: type: tophat: not found
$ type sleep
sleep is /bin/sleep

$ command -v tophat
$ command -v wget
/usr/bin/wget

On macOS,

$ help command
command: command [-pVv] command [arg ...]
    Runs COMMAND with ARGS ignoring shell functions.  If you have a shell
    function called `ls', and you wish to call the command `ls', you can
    say "command ls".  If the -p option is given, a default value is used
    for PATH that is guaranteed to find all of the standard utilities.  If
    the -V or -v option is given, a string is printed describing COMMAND.
    The -V option produces a more verbose description.

type -P

type -P command1 &>/dev/null && echo "Found" || echo "Not Found"

$ help type
type: type [-afptP] name [name ...]
    Display information about command type.
    
    For each NAME, indicate how it would be interpreted if used as a
    command name.
    
    Options:
      -a	display all locations containing an executable named NAME;
    	includes aliases, builtins, and functions, if and only if
    	the `-p' option is not also used
      -f	suppress shell function lookup
      -P	force a PATH search for each NAME, even if it is an alias,
    	builtin, or function, and returns the name of the disk file
    	that would be executed
      -p	returns either the name of the disk file that would be executed,
    	or nothing if `type -t NAME' would not return `file'.
      -t	output a single word which is one of `alias', `keyword',
    	`function', `builtin', `file' or `', if NAME is an alias, shell
    	reserved word, shell function, shell builtin, disk file, or not
    	found, respectively
    
    Arguments:
      NAME	Command name to be interpreted.
    
    Exit Status:
    Returns success if all of the NAMEs are found; fails if any are not found.
typeset: typeset [-aAfFgilrtux] [-p] name[=value] ...
    Set variable values and attributes.
    
    Obsolete.  See `help declare'.

Find all bash builtin commands

https://www.cyberciti.biz/faq/linux-unix-bash-shell-list-all-builtin-commands/

$ help
$ help | less
$ help | grep read

Find if a command is internal or external

$ type -a COMMAND-NAME-HERE
$ type -a cd
$ type -a uname
$ type -a :

$ command -V ls
$ command -V cd
$ command -V food

pause by read -p command

http://www.cyberciti.biz/tips/linux-unix-pause-command.html

read -p "Press [Enter] key to start backup..."

If we want to ask users about a yes/no question, we can use this method

while true; do
    read -p "Do you wish to install this program? " yn
    case $yn in
        [Yy]* ) make install; break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;
    esac
done

OR

echo "Do you wish to install this program?"
select yn in "Yes" "No"; do
    case $yn in
        Yes ) make install; break;;
        No ) exit;;
    esac
done

Keyboard input and Arithmetic

http://linuxcommand.org/wss0110.php

read

#!/bin/bash

echo -n "Enter some text > "
read text
echo "You entered: $text"

Arithmetic

#!/bin/bash

# An applications of the simple command
# echo $((2+2))
# That is, when you surround an arithmetic expression with the double parentheses, 
# the shell will perform arithmetic evaluation.
first_num=0
second_num=0

echo -n "Enter the first number --> "
read first_num
echo -n "Enter the second number -> "
read second_num

echo "first number + second number = $((first_num + second_num))"
echo "first number - second number = $((first_num - second_num))"
echo "first number * second number = $((first_num * second_num))"
echo "first number / second number = $((first_num / second_num))"
echo "first number % second number = $((first_num % second_num))"
echo "first number raised to the"
echo "power of the second number   = $((first_num ** second_num))"

and a program that formats an arbitrary number of seconds into hours and minutes:

#!/bin/bash

seconds=0

echo -n "Enter number of seconds > "
read seconds

# use the division operator to get the quotient
hours=$((seconds / 3600))
# use the modulo operator to get the remainder
seconds=$((seconds % 3600))
minutes=$((seconds / 60))
seconds=$((seconds % 60))

echo "$hours hour(s) $minutes minute(s) $seconds second(s)"

xargs

xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (the default command is echo, located at /bin/echo) one or more times with any initial-arguments followed by items read from standard input.

Example1 - Find files named core in or below the directory /tmp and delete them

find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f

where, -0 If there are blank spaces or characters (including single quote, newlines, et al) many commands will not work. This option take cares of file names with blank space.

Another case: suppose I have a file with filename -sT. It seems not possible to delete it directly with the rm command.

$ rm "-sT"
rm: invalid option -- 's'
Try 'rm ./-sT' to remove the file ‘-sT’.
Try 'rm --help' for more information.
$ $ ls *T
ls: option requires an argument -- 'T'
Try 'ls --help' for more information.
$ ls "*T"
ls: cannot access *T: No such file or directory
$ ls "*s*"
ls: cannot access *s*: No such file or directory

$ find . -maxdepth 1 -iname '*-sT'
./-sT
$ find . -maxdepth 1 -iname '*-sT' | xargs -0 /bin/rm -f
$ find . -maxdepth 1 -iname '*-sT' | xargs /bin/rm -f   # WORKS

Similarly, suppose I have a file of zero size. The file name is "-f3". I cannot delete it.

$ ls -lt
total 448
-rw-r--r-- 1 mingc mingc      0 Jan 16 11:35 -f3
$ rm -f3
rm: invalid option -- '3'
Try `rm ./-f3' to remove the file `-f3'.
Try `rm --help' for more information.
$ find . -size  0 -print0 |xargs -0 rm

Example2 - Find files from the grep coammand and sort them by date

grep -l "Polyphen" tmp/*.* | xargs ls -lt

Example3 - Gzip with multiple jobs

CORES=$(grep -c '^processor' /proc/cpuinfo)
find /source -type f -print0 | xargs -0 -n 1 -P $CORES gzip -9

where

  • find -print0 / xargs -0 protects you from whitespace in filenames
  • xargs -n 1 means one gzip process per file
  • xargs -P specifies the number of jobs
  • gzip -9 means maximum compression

GNU Parallel

A simple trick without using GNU Parallel is run the commands in background.

Example: same command, different command line argument

Input from the command line (Synopsis about the triple colon ":::"):

parallel echo ::: A B C
parallel gzip --best ::: *.html # '--best' means best compression
parallel gunzip ::: *.CEL.gz

Input from a file:

parallel -a abc-file echo

Input is a STDIN:

cat abc-file | parallel echo

find . -iname "*after*" | parallel wc -l

Another similar example is to gzip each individual files


Example: each command containing an index

Instead of

for i in $(seq 1 100)
do
  someCommand data$i.fastq > output$i.txt &
done

, we can use

parallel --jobs 16 someCommand data{}.fastq '>' output{}.txt ::: {1..100}

Example: each command not containing an index

for i in *gz; do 
  zcat $i > $(basename $i .gz).unpacked
done

can be written as

parallel 'zcat {} > {.}.unpacked' ::: *.gz

Example: run several subscripts from a master script

Suppose I have a bunch of script files: script1.sh, script2.sh, ... And an optional master script (file ext does not end with .sh). My goal is to run them using GNU Parallel.

I can just run them using

parallel './{}' ::: *.sh

where "./" means the .sh files are located in the current directory and {} denotes each individual .sh file.

More detail:

$ mkdir test-par; cd test-par
$ echo echo A > script1.sh
$ echo echo B > script2.sh
$ echo echo C > script3.sh
$ echo echo D > script4.sh
$ chmod +x *.sh

$ cat > script    # master script (not needed for GNU parallel method)
./script1.sh
./script2.sh
./script3.sh
./script4.sh

$ time bash script
A
B
C
D

real	0m0.025s
user	0m0.004s
sys	0m0.004s

$ time parallel './{}' ::: *.sh    # No need of a master script
                                   # may need to add --gnu option if asked.
A
B
C
D

real	0m0.778s
user	0m0.588s
sys	0m0.144s     # longer time because of the parallel overhead

Note

  • When I run scripts (seqtools_vc) sequentially I can get the standard output on screen. However, I may not get these output when I use GNU parallel.
  • There is a risk/problem if all scripts are trying to generate required/missing files when they detect the required files are absent.

rush - cross-platform tool for executing jobs in parallel

Debugging Scripts

Run a shell script with -x option. Then each lines of the script will be shown on the stdout. We can see which line takes long time or which lines broke the code (it still runs through the script).

$ bash -x script-name
  • Use of set builtin command
  • Use of intelligent DEBUG function

To run a bash script line by line:

Geany

  • (Ubuntu 12.04 only): By default, it does not have the terminal tab. Install virtual terminal emulator. Run
sudo apt-get install libvte-dev
  • Step 1: Keyboard shortcut. Select a region of code. Edit -> >Commands->Send selection to Terminal. You can also assign a keybinding for this. To do so: go to Edit->Preferences and pick the Keybindings tab. See a screenshot here. I assign F12 (no any quote) for the shortcut. This is a complete list of the keybindings.
  • Step 2: Newline character. Another issue is that the last line of sent code does not have a newline character. So I need to switch to the Terminal and press Enter. The solution is to modify the <geany.conf> (find its location using locate geany.conf. On my ubuntu 14 (geany 1.26), it is under ~/.config/geany/geany.conf) and set send_selection_unsafe=true. See here.
  • Step 3: PATH variable.
$ tmpname=$(basename $inputVCF)
Command 'basename' is available in '/usr/bin/basename'
The command could not be located because '/usr/bin' is not included in the PATH environment variable.

The solution is to run PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin in the Terminal window before running our script.

  • Step 4 (optional): Change background color.

Another handy change to geany is to change its background to black. To do that, go to Edit -> Preferences -> Editor. Once on the Editor options level, select the Display tab to the far right of the dialog, and you will notice a checkbox marked invert syntax highlighting colors.

See this post about changing the default terminal in the Terminal window. The default is xterm (see the output of echo $TERM).

Examples

How to wrap a long linux command

Use backslash character. However, make sure the backslash character is the last character at a line. For example the first example below does not work since there is an extra space character after \.

Example 1 (not work)

sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \
   libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \ 
   libcups2-dev libpulse-dev libudev-dev

vs example 2 (work)

sudo apt-get install libcap-dev libbz2-dev libgcrypt11-dev libpci-dev libnss3-dev libxcursor-dev \
   libxcomposite-dev libxdamage-dev libxrandr-dev libdrm-dev libfontconfig1-dev libxtst-dev \
   libcups2-dev libpulse-dev libudev-dev

Command line path navigation

pushd and popd are used to switch between multiple directories without the copying nad posting of directory paths. Thy operate on a stack; a last in first out data structure (LIFO).

pushd /var/www
pushd /usr/src
dirs
pushd +2
popd

When we have only two locations, an alternative and easier way is cd -.

cd /usr/src
# Do something
cd /var/www
cd -     # /usr/src

bd – Quickly Go Back to a Parent Directory

Create log file

  • Create a log file with date
logfile="output_$(date +"%Y%m%d%H%M").log"
  • Redirect the error to a log file
logfile="output_$(date +"%Y%m%d%H%M").log"

module load XXX || exit 1

echo "All output redirected to '$logfile'"
set -ex

exec 2>$logfile

# Task 1
start_time=$(date +%s)
# Do something with possible error output
end_time=$(date +%s)
echo "Task 1 Started: tarted: "$start_date"; Ended: "$end_date"; Elapsed time: "$(($end_time - $start_time))" sec">>$logfile

# Task 2
start_time=$(date +%s)
# Do something with possible error output
end_time=$(date +%s)
echo "Task 1 Started: tarted: "$start_date"; Ended: "$end_date"; Elapsed time: "$(($end_time - $start_time))" sec">>$logfile

Text processing

tr (similar to sed)

It seems tr does not take general regular expression.

The tr utility copies the given input to produced the output with substitution or deletion of selected characters. tr abbreviated as translate or transliterate.

It will read from STDIN and write to STDOUT. The syntax is

tr [OPTION] SET1 [SET2]

If both the SET1 and SET2 are specified and ‘-d’ OPTION is not specified, then tr command will replace each characters in SET1 with each character in same position in SET2. For example,

# translate to uppercase
$ echo 'linux' | tr "[:lower:]" "[:upper:]"

# Translate braces into parenthesis
$ tr '{}' '()' < inputfile > outputfile

# Replace comma with line break
$ tr ',' '\n' < inputfile

# Split a long line using the space 
$ echo $line | tr ' ' '\n' 

# Translate white-space to tabs
$ echo "This is for testing" | tr [:space:] '\t'

# Join/merge all the lines in a file into a single line
$ tr -s '\n' ' ' < file.txt  
# note sed cannot match \n easily as tr command. 
# See 
# http://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed 
# https://unix.stackexchange.com/questions/26788/using-sed-to-convert-newlines-into-spaces

tr can also be used to remove particular characters using -d option. For example,

$ echo "the geek stuff" | tr -d 't'
he geek suff
$ tr -d "\15" < input > output # octal digit 15

A practical example

#!/bin/bash
echo -n "Enter file name : "
read myfile
echo -n "Are you sure ( yes or no ) ? "
read confirmation
confirmation="$(echo ${confirmation} | tr 'A-Z' 'a-z')"
if [ "$confirmation" == "yes" ]; then
   [ -f $myfile ] &&  /bin/rm $myfile || echo "Error - file $myfile not found"
else
   : # do nothing
fi

Second example

$ ifconfig | cut -c-10 | tr -d ' ' | tr -s '\n'
eth0
eth1
ip6tnl0
lo
sit0

# without tr -s '\n'
eth0


eth1


ip6tnl0


lo


sit0

where tr -d ' ' deletes every space character in each line. The \n newline character is squeezed using tr -s '\n' to produce a list of interface names. We use cut to extract the first 10 characters of each line.

Regular Expression and grep

echo -e "today is Monday\nHow are you" | grep Monday

grep -E "[a-z]+" filename
# or
egrep "[a-z]+" filename

grep -i PATTERN FILENAME # ignore case

grep -v PATTERN FILENAME # inverse match

grep -c PATTERN FILENAME # count the number of lines in which a matching string appears

grep -n PATTERN FILENAME # print the line number

grep -R PATTERN DIR      # recursively search many files and follow symbolic links
grep -r PATTERN DIR      # recursively search many files

grep -e "pattern1" -e "pattern2" FILENAME # multiple patterns OR operation (older Linux)
egrep 'pattern1|pattern2' FILENAME        # multiple patterns (newer Linux)
grep -f PATTERNFILE FILENAME # PATTERNFILE contains patterns line-by-line

grep -F PATTERN FILENAME # Interpret PATTERN as a  list  of  fixed  strings,  separated  by
                         # newlines,  any  of  which is to be matched.

grep -r --include \*.Rmd --include \*.R "file\.csv" ./   # search with only Rmd & R files

grep -r --exclude "README" PATTERN DIR               # excluding files in which to search

grep -o \<dt\>.*<\/dt\> FILENAME # print only the matched string (<dt> .... </dt>)

grep -w                  # checking for full words, not for sub-strings
grep -E -w "SRR2923335.1|SRR2923335.1999" # match in words (either SRR2923335.1 or SRR2923335.1999)
  • Extract the IP address from ifconfig command
$ ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:14:d1:b0:df:9f  
          inet addr:192.168.1.172  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::214:d1ff:feb0:df9f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29113 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:28561660 (28.5 MB)  TX bytes:3516957 (3.5 MB)

$ ifconfig eth1 | egrep -o "inet addr:[^ ]*" | grep -o "[0-9.]*"
192.168.1.172

where egrep -o "inet addr:[^ ]*" will match the pattern starting with inet addr: and ends with some non-space character sequence (specified by [^ ]*). Now in the next pipe, it prints the character combination of digits and '.'.

--include option

Bash Find Out IF a Variable Contains a Substring

grep returns TRUE or FALSE

Can grep return true/false or are there alternative methods

less -S: print long lines

Causes lines longer than the screen width to be chopped rather than folded. man less.

cut: extract columns or character positions from text files

http://www.thegeekstuff.com/2013/06/cut-command-examples/

cut -f 5-7 somefile  # columns 5-7. 
cut -c 5-7 somefile  # character positions 5-7

The default delimiter is TAB. If the field delimiter is different from TAB you need to specify it using -d:

cut -d' ' -f100-105 myfile > outfile
#
cut -d: -f6 somefile   # colon-delimited file
# 
grep "/bin/bash" /etc/passwd | cut -d':' -f1-4,6,7    # field 1 through 4, 6 and 7

cut -f3 --complement somefile # print all the columns except the third column

To specify the output delimiter, we shall use --output-delimiter. NOTE that to specify the Tab delimiter in cut, we shall use $'\t'. See http://www.computerhope.com/unix/ucut.htm. For example,

cut -f 1,3 -d ':' --output-delimiter=$'\t' somefile

If I am not sure about the number of the final field, I can leave the number off.

cut -f 1- -d ':' --output-delimiter=$'\t' somefile

A simple shell function to show the first 3 columns and 3 rows of the matrix

function show_matrix() {
    if [ -z "$1" ] || [ -z "$2" ]; then
        echo "Usage: show_matrix <filename> <delimiter>"
        return 1
    fi

    if [ "$2" != "tab" ] && [ "$2" != "comma" ]; then
        echo "Delimiter must be 'tab' or 'comma'"
        return 1
    fi

    if [ "$2" == "tab" ]; then
        cut -f1-3 "$1" | head -n 3
    elif [ "$2" == "comma" ]; then
        cut -d',' -f1-3 "$1" | head -n 3
    fi
}
# show_matrix data.txt tab
# show_matrix data.txt comma

awk: operate on rows and/or columns

awk is a tool designed to work with data streams. It can operate on columns and rows. If supports many built-in functionalities, such as arrays and functions, in the C programming language. Its biggest advantage is its flexibility.

Structure of an awk script

awk pattern { action }
awk ' BEGIN{ print "start" } pattern { AWK commands } END { print "end" } ' file

The three of components (BEGIN, END and a common statements block with the pattern match option) are optional and any of them can be absent in the script. The pattern can be also called a condition.

The default delimiter for fields is a space.

Some examples:

awk 'BEGIN { i=0 } { i++ } END { print i}' filename
echo -e "line1\nline2" | awk 'BEGIN { print "start" } { print } END { print  "End" }'

seq 5 | awk 'BEGIN { sum=0; print "Summation:" } { print $1"+"; sum+=$1 } END { print "=="; print sum }'

awk -F : '{print $6}' somefile   # colon-delimited file, print the 6th field (cut can do it)
#
awk --field-searator="\\t" '{print $6}' filename    # tab-delimited (cut can do it)
 
awk -F":" '{ print $1 " " $3 }' /etc/passwd  # (cut can do it)

awk -F "\t" '{OFS="\t"} {$1="mouse"$1; print $0}' genes.gtf > genescb.gtf 
# or
awk -F "\t" 'BEGIN {OFS="\t"} {$1="mouse"$1; print $0}' genes.gtf > genescb.gtf 
# replace ELEMENT with mouseELEMENT for data on the 1st column; tab separator was used for input (-F) and output (OFS)

awk 'NR % 4 == 1 {print ">" $0 } NR % 4 == 2 {print $0}' input > output
# extract rows 1,2,5,6,9,10,13,14,.... from input

awk 'NR % 4 == 0 {print ">" $0 } NR % 4 == 3 {print $0}' input > output
# extract rows 3,4,7,8,11,12,15,16,.... from input 

awk '(NR==2),(NR==4) {print $0}' input
# print rows 2-4.

awk '{ print ($1-32)*(5/9) }'
# fahrenheit-to-celsius calculator, http://www.hcs.harvard.edu/~dholland/computers/awk.html

# http://stackoverflow.com/questions/3700957/printing-lines-from-a-file-where-a-specific-field-does-not-start-with-something
awk '$7 !~ /^mouse/ { print $0 }' input # column 7 not starting with 'mouse'
awk '$7 ~ /^mouse/ { print $0 }' input  # column 7 starting with 'mouse'
awk '$7 ~ /mouse/ { print $0 }' input   # column 7 containing 'mouse'

It seems AWK is useful for finding/counting a subset of rows or columns. It is not most used for string substitution.

Print the string between two parentheses

https://unix.stackexchange.com/questions/108250/print-the-string-between-two-parentheses

$ awk -F"[()]" '{print $2}' file 

$ echo ">gi|52546690|ref|NM_001005239.1| subfamily H, member 1 (OR11H1), mRNA" | awk -F"[()]" '{print $2}'
OR11H1

$ echo ">gi|284172348|ref|NM_002668.2| proteolipid protein 2 (colonic epithelium-enriched) (PLP2), mRNA" | awk -F"[()]" '{print $2}'
colonic epithelium-enriched  # WRONG

Insert a line

https://stackoverflow.com/a/18276534

awk '/KEYWORDS/ { print; print "new line"; next }1' foo.input

Count number of columns in file

https://stackoverflow.com/a/8629351

awk -F'|' '{print NF; exit}' stores.dat  # Change '|' as needed

sed (stream editor): substitution of text

By default, sed only prints the substituted text. To save the changes along the substitutions to the same file, use the -i option.

sed 's/text/replace/' file > newfile
mv newfile file
# OR better
sed -i 's/text/replace/' file

The sed command will replace the first occurrence of the pattern in each line. If we want to replace every occurrence, we need to add the g parameter at the end, as follows:

sed -i 's/pattern/replace/g' file

To remove blank lines

sed '/^$/d' filename

To remove square brackets

# method 1. replace ] & [ by the empty string
$ echo '00[123]44' | sed 's/[][]//g'
0012344
# method 2 - use tr
$ echo '00[123]00' | tr -d '[]'
0012300

To replace all three-digit numbers with another specified word in a file

sed -i 's/\b[0-9]\{3\}\b/NUMBER/g' filename

echo -e "I love 111 but not 1111." | sed 's/\b[0-9]\{3\}\b/NUMBER/g'

where {3} is used for matching the preceding character thrice. \ in \{3\} is used to give a special meaning for { and }. \b is the word boundary marker.

Variable string and quoting

text=hello
echo hello world | sed "s/$text/HELLO/"

Double quoting expand the expression by evaluating it.

sed takes whatever follows the "s" as the separator

Suppose I like to replace "../jquery-ui.min.js" with "jquery-ui.js", I can use

echo '<script src="../jquery-ui.min.js"></script>' | sed 's|../jquery-ui.min.js|jquery-ui.js|g'
# <script src="jquery-ui.js"></script>
$ cat tmp
@SQ	SN:chrX	LN:155270560
@SQ	SN:chrY	LN:59373566
@RG	ID:NEAT
$ sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' tmp
@SQ	SN:chrX	LN:155270560
@SQ	SN:chrY	LN:59373566
@RG	ID:None	SM:None	LB:None	PL:Illumina
$ sed 's/^@RG.*/@RG\tID:None\tSM:None\tLB:None\tPL:Illumina/g' tmp
@SQ	SN:chrX	LN:155270560
@SQ	SN:chrY	LN:59373566
@RG	ID:None	SM:None	LB:None	PL:Illumina

Case insensitive

https://www.cyberciti.biz/faq/unixlinux-sed-case-insensitive-search-replace-matching/

# Newer version - add 'i' or 'I' after 'g'
sed 's/find-word/replace-word/gI' input.txt > output.txt
sed -i 's/find-word/replace-word/gI' input.txt

# Older version/macOS
sed 's/[wW][oO][rR][dD]/replace-word/g' input.txt > output.txt
sed 's/[Ll]inux/Unix/g' input.txt > output.txt

macOS

"undefined label" error on Mac OS X

$ sed -i 's/mkyong/google/g' testing.txt 
sed: 1: "testing.txt": undefined label 'esting.txt'

# Solution
$ sed -i '.bak' 's/mkyong/google/g' testing.txt 

Application: Get the top directory name of a tarball or zip file without extract it

dn=`unzip -vl filename.zip | sed -n '5p' | awk '{print $8}'` # 5 is the line number to print
echo -e "$(basename $dn)"

dn=`tar -tf filename.tar.bz2 | grep -o '^[^/]\+' | sort -u`  # '-u' means unique
echo -e $dn

dn=`tar -tf filename.tar.gz | grep -o '^[^/]\+' | sort -u`
echo -e $dn

# Assume there is a sub-directory called htslibXXXX
dn=$(basename `find -maxdepth 1 -name 'htslib*'`)
echo -e $dn

Application: Grab the line number from the 'grep -n' command output

Follow here

grep -n 'regex' filename | sed 's/^\([0-9]\+\):.*$/\1/'  # return line numbers for each matches
# OR
grep -n 'regex' filename | awk -F: '{print $1}'

echo 123:ABCD | sed 's/^\([0-9]\+\):.*$/\1/'             # 123

where \1 means to keep the substring of the pattern and \( & \) are used to mark the pattern. See http://www.grymoire.com/Unix/Sed.html for more examples, e.g. search repeating words or special patterns.

If we want to find the to directory for a zipped file (see wikipedia for the zip format), we can use

unzip -vl snpEff.zip | head | grep -n 'CRC-32' | awk -F: '{print $1}'

Application: Delete first few characters on each row

http://www.theunixschool.com/2014/08/sed-examples-remove-delete-chars-from-line-file.html

  • To remove 1st n characters of every line:
# delete the first 4 characters from each line
$ sed -r 's/.{4}//' file

Application: delete lines

Sed Command to Delete a Line

  • Delete a single line
  • Delete a range of lines
  • Delete multiple lines
  • Delete all lines except specified range
  • Delete empty lines
  • Delete lines based on pattern
  • Delete lines starting with a specific character
  • Delete lines ending with specific character
  • Deleting lines that match the pattern and the next line
  • Deleting line from the pattern match to the end

Application: comment out certain lines

https://unix.stackexchange.com/a/128595. To comment lines 2 through 4 of bla.conf:

sed -i '2,4 s/^/#/' bla.conf

This is useful when I need to comment out line 240 & 242 on shell scripts (related to pdf file) generated from BRB-SeqTools.

Substitution of text: perl

How to delete the first few rows of a text file

https://unix.stackexchange.com/questions/37790/how-do-i-delete-the-first-n-lines-of-an-ascii-file-using-shell-commands

Suppose we want to remove the first 3 rows of a text file

  • sed
$ sed -e '1,3d' < t.txt    # output to screen

$ sed -i -e 1,3d yourfile  # directly change the file
  • tail
$ tail -n +4 t.txt    # output to screen
  • awk
$ awk 'NR > 3 { print }' < t.txt    # output to screen

Delete the last row of a file

sed -i '$d' FILE

Show the first few characters from a text file

head -c 50 file   # return the first 50 bytes

Remove/Delete The Empty Lines In A File

https://www.2daygeek.com/remove-delete-empty-lines-in-a-file-in-linux/

sed -i '/KEYWORD/d' File

cat: merge by rows

cat file1 file2 > output

paste: merge by columns

paste -d"\t" file1 file2 file3 > output

paste file1 file2 file3 | column -s $'\t' > output

Web

Reference: Linux Shell Scripting Cookbook

Copy a complete webiste

wget --mirror --convert-links URL
# OR
wget -r -N -k -l DEPTH URL

HTTP or FTP authentication

wget --user username --password pass URL

Download a web page as plain text (instead of HTML text)

lynx URL -dump > TextWebPage.txt

cURL

curl http://google.com -o index.html --progress
curl http://google.com --silent -o index.html

# Cookies
curl http://example.com --cookie "user=ABCD;pass=EFGH"
curl URL --cookie-jar cookie_file

# Setting a user agent string
# http://www.useragentstring.com/pages/useragentstring.php
curl URL --user-agent "Mozilla/5.0"

# Authenticating 
curl -u user:pass http://test_auth.com
curl -u user http://test_auth.com

# Printing response headers excluding the data
# For example, to check whether a page is reachable or not
# by checking the 'Content-length' parameter.
curl -I URL

Image crawler and downloader

#!/bin/bash
#Desc: Images downloader
#Filename: img_downloader.sh

if [ $# -ne 3 ];
then
  echo "Usage: $0 URL -d DIRECTORY"
  exit -1
fi

for i in {1..4}
do
  case $1 in
  -d) shift; directory=$1; shift ;;
   *) url=${url:-$1}; shift;;
  esac
done

mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")

echo Downloading $url
curl -s $url | egrep -o "<img src=[^>]*>" | 
sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list

sed -i "s|^/|$baseurl/|" /tmp/$$.list

cd $directory;

while read filename;
do
  echo Downloading $filename
  curl -s -O "$filename" --silent

done < /tmp/$$.list

Find broken links in a website by lynx -traversal

#!/bin/bash 
#Desc: Find broken links in a website

if [ $# -ne 1 ]; 
then 
  echo -e "$Usage: $0 URL\n" 
  exit 1; 
fi 

echo Broken links: 

mkdir /tmp/$$.lynx 
cd /tmp/$$.lynx 

lynx -traversal $1 > /dev/null 
count=0; 

sort -u reject.dat > links.txt 

while read link; 
do 
  output=`curl -I $link -s | grep "HTTP/.*OK"`; 
  if [[ -z $output ]]; 
  then 
    echo $link; 
    let count++ 
  fi 
done < links.txt 

[ $count -eq 0 ] && echo No broken links found.

Track changes to a website

#!/bin/bash
#Desc: Script to track changes to webpage

if [ $# -ne 1 ];
then 
  echo -e "$Usage: $0 URL\n"
  exit 1;
fi

first_time=0
# Not first time

if [ ! -e "last.html" ];
then
  first_time=1
  # Set it is first time run
fi

curl --silent $1 -o recent.html

if [ $first_time -ne 1 ];
then
  changes=$(diff -u last.html recent.html)
  if [ -n "$changes" ];
  then
    echo -e "Changes:\n"
    echo "$changes"
  else
    echo -e "\nWebsite has no changes"
  fi
else
  echo "[First run] Archiving.."

fi
  
cp recent.html last.html

POST/GET

Look at a web site source and look for the 'name' field in a <input> tag.

http://www.w3schools.com/html/html_forms.asp

# -d is used for posting in curl
curl URL -d "postvar1=var1&postvar2=var2"
# OR the 'get' command with the 'post-data' option
get URL --post-data "postvar1=var1&postvar2=var2" -O out.html

Change detection of a website

Working with Files

iconv command

$ file test.R
test.R: ISO-8859 text, with CRLF line terminators
$ iconv -f ISO-8859 -t UTF-8 test.R  # 'ISO-8859' is not supported
$ iconv -t UTF-8 test.R              # partial conversion??
$ iconv -f ISO-8859-1 -T UTF-8 test.R # Works

nl command

Add line numbers to a text file

$ cat demo_file
THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.
this line is the 1st lower case line in this file.
This Line Has All Its First Character Of The Word With Upper Case.

Two lines above this line is empty.
And this is the last line.
$ nl demo_file
     1	THIS LINE IS THE 1ST UPPER CASE LINE IN THIS FILE.
     2	this line is the 1st lower case line in this file.
     3	This Line Has All Its First Character Of The Word With Upper Case.
       
     4	Two lines above this line is empty.
     5	And this is the last line.

file command

 
$ file thumbs/g7.jpg 
thumbs/g7.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=10, orientation=upper-left, xresolution=134, yresolution=142, resolutionunit=2, software=Adobe Photoshop CS Windows, datetime=2004:03:31 22:28:58], baseline, precision 8, 100x75, frames 3

$ file index.html
index.html: HTML document, ASCII text

$ file 2742OS_5_01.sh 
2742OS_5_01.sh: Bourne-Again shell script, ASCII text executable

$ file R-3.2.3.tar.gz 
R-3.2.3.tar.gz: gzip compressed data, last modified: Thu Dec 10 03:12:50 2015, from Unix

date

Displaying dates and times your way in Linux

print by skipping rows

http://stackoverflow.com/questions/604864/print-a-file-skipping-x-lines-in-bash

$ tail -n +<N+1> <filename>  # excluding first N lines
                             # print by starting at line N+1.
$ tail -n +11 /tmp/myfile    # starting at line 11, or skipping the first 10 lines

tail -f (follow)

When we use the '-f' (follow) option, we can monitor a growing file. For example, we can create a new file called tmp.txt and run 'tail -f tmp.txt'. Now we open another terminal and run 'for i in {0..100}; do sleep 2; echo $i >> ~/output.txt ; done'. We will see in the 1st terminal that the content of tmp.txt is changed.

A practical example is

  • Monitor system change
sudo tail -f /var/log/syslog
  • Monitor a process and terminate itself when a give process dies
PID=$(pidof Foo)
tail -f textfile --pid $PID

A process Foo (eg. gedit) is appending data to a file, the tail -f should be executed until the process Foo dies.

Low-level File Access

  • file descriptors: 0 means standard input, 1 means standard output, 2 means standard error.
  • size_t write(int fildes, const void *buf, size_t nbytes);
#include <unistd.h>
#include <stdlib.h>
int main()
{
  if ((write(1, "Here is some data\n", 18)) != 17)
    write(2, "A write error has occurred on file descriptor\n", 46);
  exit(0);
}
  • size_t read(int fildes, void *buf, size_t nbytes); returns the number of data bytes actually read. If a read call returns 0, it had nothing to read; it reached the end of the file. An error on the call will cause it to return -1.
  • To create a new file descriptor we use the open system call. int open(const char *path, int oflags, mode_t mode);
  • The next program will do file copy.
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
  char c;
  int in, out;
  in = open("file.in", O_RDONLY);
  out = open("file.out", O_WRONLY|O_CREAT, S_IRUSER|S_IWUSR);
  while(read(in,&c,1) == 1)
    write(out,&c,1)
  exit(0);
}

The Standard I/O Library

  • fopen, fclose
  • fread, fwrite
  • fflush
  • fseek
  • fgetc, getc, getchar
  • fputc, putc, putchar
  • fgets, gets
  • printf, fprintf and sprintf
  • scanf, fscanf and sscanf

Formatted Input and Output

  • prinf, fprintf and sprintf
  • scanf, fscanf and sscanf

Stream Errors

File and Directory Maintenance

Scanning Directories

  • opendir, closedir
  • readdir
  • telldir
  • seekdir

UNIX environment

Logging

Resources and Limits

Terminals

Fun command line utilities

Turn Your Terminal Into A Playground: 20+ Funny Linux Command Line Tools: cowsay, fortune, figlet, sl, ASCIIquarium, cmatrix, lolcat, ponysay, charasay, party parrot, ternimal, paclear, lavat, pond, cbonsai, dotacat, finger, pinky, no more secrets, hollywood, bucklespring, bb, toilet, sl-alt, fetch utilities, telehack, display star wars episode.

Reading from and Writing to the Terminal

The termios Structure

Terminal Output

Detecting Keystokes

Curses

A technique between command line and full GUI.

Example: vi.

Data Management

Development Tools

Books

Top Linux developers' recommended programming books

GNU Make and Makefiles

Writing a Manual Page

Distributing Software

The patch Program

Debugging

debug a bash shell

How To Debug a Bash Shell Script Under Linux or UNIX

gdb

Processes and Signals

Search a process ID by its name

Use pgrep https://askubuntu.com/questions/612315/how-do-i-search-for-a-process-by-name-without-using-grep. For example (tested on Linux and macOS),

$ pgrep RStudio  # assume RStudio is running
27043
$ pgrep geany     # geany is not running.     
$

POSIX threads

Inter-process Communication: Pipes

Sockets