Python
Basic
Resources
- https://www.tutorialspoint.com/python/
- https://docs.python.org/3/tutorial/index.html
- https://www.learnpython.org/ (contains a Run button for each example)
- https://www.w3schools.com/python/ (contains a Run button for each example)
- Think Python (Free Ebook)
- Python教程
- Python for R users
- Python Data Science Handbook by Jake VanderPlas
- The Hitchhiker’s Guide to Python!
- Python 3 for Scientists
- Another Book on Data Science Learn R and Python in Parallel
- Fluent Python. github for source code.
- A dozen ways to learn Python from opensource.com
- Introduction to Programming (with Python) - a webinar from NIAID
- Learn Python - Full Course for Beginners by freeCodeCamp.org
- Real Python Tutorials
- Primer on Python Decorators. A decorator takes a function, extends it and returns.
- Top articles for learning Python in 2020
Python end of life
https://endoflife.date/python or https://devguide.python.org/. By default, the end-of-life is scheduled 5 years after the first release, but can be adjusted by the release manager of each branch.
Install, setup
Multiple python versions
How to use pyenv to run multiple versions of Python on a Mac
Mac
- Installing Python 3 on Mac OS X, How to Fix Permissions on Home-brew on MacOS High Sierra. I use soft link to point python to pythons (/usr/local/bin). I need to quit and restart iTerm2.
- How to set up virtual environments for Python on MacOS
- pip installation
# check pip version python -m pip --version # install curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python get-pip.py # Upgrading pip python -m pip install -U pip
IDE
- PyCharm
- Pycharm was used by Learn Python - Full Course for Beginners (freeCodeCamp.org)
- Run a code line by line by changing the keyboard shortcuts from Settings -> Keymap -> Other.
- To run the current file, right click the tab and select Run XXX. (Frustrated)
- Thonny
- Spyder
- RStudio
- Create a file (xxx.py)
- Click the terminal tab. Type 'python' (or ipython3).
- Use Ctrl/CMD + Alt + Enter to run your python code line by line or a chunk.
IPython shell
- Why I love using the IPython shell and Jupyter notebooks
ipython # Shell jupyter notebook # auto open the browser
- Why switch to JupyterLab from jupyter-notebook?
- What is the difference between Jupyter Notebook and JupyterLab?
Cloud Jupyter
- https://jupyter.org/, What is the difference between Jupyter Notebook and JupyterLab? (Install R kernel for Jupyter Notebook)
pip3 install jupyterlab jupyter-lab # http://localhost:8888/lab # The current directory will be available on the file browser panel in JupyterLab.
- Six easy ways to run your Jupyter Notebook in the cloud
- Using R on Jupyter Notebook
- Cross-Methods are a Leak/Variance Trade-Off
- Journal five minutes a day with Jupyter
- How to Use Jupyter Notebook in 2020: A Beginner’s Tutorial
- Get Started With Jupyter Notebook: A Tutorial
- Jupyter Notebook Command Mode Keyboard Shortcuts
- Enter: edit mode
- Esc: command mode
- Ctrl-Enter: run cell
- Shift-Enter: run current cell, and select cell below
- Alt-Enter: run cell, insert a cell below
- Y: to code
- M: to markdown
- 1: to insert heading 1
- 2,3,4,5,6: to insert heading 2,3,4,5,6
Visual Studio Code
The ipynb file can contain figures.
This (Harmony Manuscript) has several notebook files where the code in ipynb files were written in R, not Python.
I can use vsc to open a ipynb file.
Conversion
- Rmd to ipynb
- rmd2jupyter package
- How to convert Rmd to ipynb notebook: Jupytext and notedown.
- Script of Scripts (SoS)
- ipynb to Rmd
nbdev
- nbdev
- Jupyter is now a full-fledged IDE Literate programming is now a reality through nbdev and the new visual debugger for Jupyter.
Emacs
Emacs Shell mode: how to send region to shell?
Cheat sheet
- http://datasciencefree.com/python.pdf
- https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf
The Most Frequently Asked Questions About Python Programming
https://www.makeuseof.com/tag/python-programming-faq/
Running
Interactively
Use Ctrl+d to quit.
How to run a python script file
python mypython.py
Run python statements from a command line
Use -c (command) option.
python -c "import psutil"
run python source code line by line
run python source code line by line
python -m pdb <script.py>
Install a new module
See an example of installing HTSeq.
pip
The Python Package Index (PyPI) is the definitive list of packages (or modules)
For example, motionEye can be installed by pip install or pip2 install; see its wiki and source code on Github.
sudo apt-get install python-pip pip --version pip install SomePackage pip show --files SomePackage pip install --upgrade SomePackage pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip pip install ‐‐upgrade pip # Upgrade itself pip uninstall SomePackage sudo apt install python3-pip pip3 --version
How to install pip to manage PyPI packages easily
List installed packages and their versions, location
pip3 list -v
On my Ubuntu 20.04, the packages installed by pip3 is located in ~/.local/lib/python3.8/site-packages/. It does not matter where I issued the pip3 install command.
The danger of upgrading pip
- Error after upgrading pip: cannot import name 'main'
- You should consider upgrading via the 'pip install --upgrade pip' command
Don't use sudo + pip
https://askubuntu.com/questions/802544/is-sudo-pip-install-still-a-broken-practice
"--user" option in pip
https://github.com/pypa/pip/issues/4186
$ pip install Pygments ... OSError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/Pygments-2.2.0.dist-info' /usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning $ pip install --user Pygments Collecting Pygments Using cached Pygments-2.2.0-py2.py3-none-any.whl Installing collected packages: Pygments Successfully installed Pygments-2.2.0
virtualenv
Python “Virtual Environments” allows us to install a Python package in an isolated location, rather than installing it globally.
- How To Manage Python Packages Using Pip
# Python 2 $ sudo pip install virtualenv $ virtualenv <DIR_NAME> $ source <DIR_NAME>/bin/activate (<DIR_NAME>) ~$ which python .... $ deactivate # Python 3, https://docs.python.org/3/tutorial/venv.html $ python3 -m venv <DIR_NAME> $ source <DIR_NAME>/bin/activate (<DIR_NAME>) ~$ which python .... $ deactivate
-
Python Tutorial: virtualenv and why you should use virtual environments. pip freeze.
pip list pip freeze --local > requirements.txt ... pip install -r requirements.txt pip list
- Learn Python by creating a video game
- How to use Python virtualenv
- A non-magical introduction to Pip and Virtualenv for Python beginners
- Alternative to virtualenv we need to add "--user" to the pip command. See the installation guide of lasagne or easy_install or pip as a limited user?
pipx
Pipx – Install And Run Python Applications In Isolated Environments
python setup.py
If a package has been bundled by its creator using the standard approach to bundling modules (with Python’s distutils tool), all you need to do is download the package, uncompress it and type:
python setup.py build sudo python setup.py install
For Python 2, the packages are installed under /usr/local/lib/python2.7/dist-packages/.
$ ls -l /usr/local/lib/python2.7/dist-packages/ total 12 -rw-r--r-- 1 root staff 273 Jan 12 13:45 easy-install.pth drwxr-sr-x 4 root staff 4096 Jan 12 13:45 HTSeq-0.6.1p1-py2.7-linux-x86_64.egg drwxr-sr-x 4 root staff 4096 Jan 12 13:42 pysam-0.9.1.4-py2.7-linux-x86_64.egg
Get a list of installed modules
http://stackoverflow.com/questions/739993/how-can-i-get-a-list-of-locally-installed-python-modules
pydoc modules
Not helpful. See the pip list command.
Check installed packages' versions
If you install packages through pip, use
$ pip list ... pyOpenSSL (0.13.1) pyparsing (2.0.1) pysam (0.10.0) python-dateutil (1.5) pytz (2013.7) rudix (2016.12.13) scipy (0.13.0b1) setuptools (1.1.6) singledispatch (3.4.0.3) six (1.4.1) tornado (4.4.2) vboxapi (1.0) xattr (0.6.4) zope.interface (4.1.1)
And more information about a package by using pip show PACKAGE.
$ pip show pysam Name: pysam Version: 0.10.0 Summary: pysam Home-page: https://github.com/pysam-developers/pysam Author: Andreas Heger Author-email: [email protected] License: MIT Location: /Users/XXX/Library/Python/2.7/lib/python/site-packages Requires:
The following method works whether the package is installed by source or binary package
>>> import pysam >>> print(pysam.__version__) 0.10.0 >>> print pysam.__version__ 0.10.0
See http://hammelllab.labsites.cshl.edu/tetoolkit-faq/
Install a specific version of package through pip
https://stackoverflow.com/questions/5226311/installing-specific-package-versions-with-pip
For example, pysam package was actively released. But the new release (0.11.2.2) may introduce some bugs. So I have to install an older version (0.10.0 works for me on Mac El Capitan and Sierra).
$ sudo -H pip uninstall pysam Uninstalling pysam-0.11.2.2: ...... $ sudo -H pip install pysam==0.10.0 Collecting pysam==0.10.0 Downloading pysam-0.10.0.tar.gz (2.3MB) 100% |████████████████████████████████| 2.3MB 418kB/s Installing collected packages: pysam Running setup.py install for pysam ... done Successfully installed pysam-0.10.0
warning: Please check the permissions and owner of that directory
I got this message when I use root to run the 'sudo pip install PACKAGE' command.
See
- http://stackoverflow.com/questions/27870003/pip-install-please-check-the-permissions-and-owner-of-that-directory
- http://askubuntu.com/questions/578869/python-pip-permissions
python3-pip installed but pip3 command not found?
sudo apt-get remove python3-pip; sudo apt-get install python3-pip
DeepSurv example
https://github.com/jaredleekatzman/DeepSurv
git clone https://github.com/jaredleekatzman/DeepSurv.git sudo cp /usr/bin/pip /usr/bin/pip.bak sudo nano /usr/bin/pip # See https://stackoverflow.com/a/50187211 more detail # Method 1 for Theano sudo pip install theano # Method 2 for Theano pip install --user --upgrade https://github.com/Theano/Theano/archive/master.zip pip install --user --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip cd DeepSurv/ pip install . --user sudo apt install python-pytest pip install h5py --user sudo pip uninstall protobuf # https://stackoverflow.com/a/33623372 pip install protobuf --user sudo apt install python-tk py.test ============ test session starts =========== platform linux2 -- Python 2.7.12, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 rootdir: /home/brb/github/DeepSurv, inifile: collected 7 items tests/test_deepsurv.py ....... ========== 7 passed in 5.77 seconds ========
How to list all installed modules
help('modules') # the output is not pretty
Comment
- Use the comment symbol # for a single line
- Use a delimiter “”” on each end of the comment. Attention: Don't use triple-quotes
Python Comments from zentut.com.
Docstring
- https://en.wikipedia.org/wiki/Docstring
- Python Developer's Guide Docstring Conventions
Try / Except
try: number = int(input("Enter a number: ")) print(number) except: print("Invalid Input")
if __name__ == "__main__":
How to Get the Current Directory in Python
How to Get the Current Directory in Python
Import a compiled C module
- An example based on SWIG compiler.
string and string operators
Reference:
- Python for Genomic Data Science from coursera.
- Python Hello World and String Manipulation
- Use double quote instead of single quote to define a string
- Use triple double quotes """ to write a long string spanning multiple lines or comments in a python script
- if dna="gatagc", then
dna[0]='g' dna[-1]='c' (start counting from the right) dna[-2]='g' dna[0:3]='gat' (the end always excluded) dna[:3]='gat' dna[2:]='tgc' len(dna)=6 type(dna) print(dna) dna.count('c') dna.upper() dna.find('ag')=3 (only the first occurrence of 'ag' is reported) dna.find('17', 2) (start looking from pos 17) dna.rfind('ag') ( search backwards in string) dna.islower() (True) dna.isupper() (False) dna.replace('a', 'A') print(dna.upper().isupper())
Format
Format Specification Mini-Language
Regular expression
The Beginner’s Guide to Regular Expressions With Python
User's input
dna=raw_input("Enter a DNA sequence: ") # python 2 dna=input("Enter a DNA sequence: ") # python 3
To convert a user's input (a string) to others
int(x, [, base]) flaot(x) str(x) #converts x to a string str(65) # '65' chr(x) # converts an integer to a character chr(65) # 'A'
Why is parenthesis in print voluntary in Python 2.7?
Fancy Output
print("THE DNA's GC content is ", gc, "%") # gives too many digits following the dot print("THE DNA's GC content is %5.3f %%" % " % gc) # the percent operator separating the formatting string and the value to # replace the format placeholder print("%d" % 10.6) # 10 print("%e" % 10.6) # 10.060000e+01 print("%s" % dna) # gatagc
List
A list is an ordered set of values
gene_expr=['gene', 5.16e-08, 0.001385, 7.33e-08] print(gene_expr[2] gene_expr[0]='Lif'
Slice a list (it will create a new list)
gene_expr[-3:] # [5.16e-08, 0.001385, 7.33e-08] gene_expr[1:3] = [6.09e-07]
Clear the list
gene_expr[]=[]
List functions
Size of the list
len(gene_expr)
Delete an element
del gene_expr[1]
Extend/append to a list
gene_expr).extend([5.16e-08, 0.00123])
Count the number of times an element appears in a list
print(gene_expr.count('Lif'), gene_expr.count('gene'))
Reverse all elements in a list
gene_expr.reverse() print(gene_expr) help(list)
Lists as Stacks
stack=['a', 'b', 'c', 'd'] stack.append('e')
Sorting lists
mylist=[3, 31, 123, 1, 5] sorted(mylist) mylist # not changed mylist.sort() mylist=['c', 'g', 'T', 'a', 'A'] mylist.sort()
Don't change an element in a string!
motif = 'nacggggtc' motif[0] = 'a' # ERROR
Tuples
A tuple consists of a number of values separated by commas, and is another standard sequence data type, like strings and lists.
t=1,2,3 t t=(1,2,3) # we may input tuples with or without surrounding parentheses
Sets
A set is an unordered collection with no duplicate elements.
brca1={'DNA repair', 'zine ion binding'} brca2={protein binding', 'H4 histone'} brca1 | brca2 brca1 & brca2 brca1 - brca2
Dictionaries
A dictionary is an unordered set of key and value pairs, with the requirement that the keys are unique (within on dictionary).
TF_motif = {'SP1' : 'gggcgg', 'C/EBP' : 'attgcgcaat', 'ATF' : 'tgacgtca', 'c-Myc' : 'cacgtg', 'Oct-1' : 'atgcaaat'} # Access print("The recognition sequence for the ATF transcription is %s." % TF_motif['ATF']) # Update TF_motif['AP-1'] = 'tgagtca' # Delete del TF_motif['SP1'] # Size of a list len(TF_motif) # Get a list of all the 'keys' in a dictionary list(TF_motif.keys()) # Get a list of all the 'values' list(TF_motif.values()) # sort sorted(TF_motif.keys()) sorted(TF_motif.values())
We can retrieve data from dictionaries using the items() method.
for name,seq in seqs.item(): print(name, seq)
In summary, strings, lists and dictionaries are most useful data types for bioinformatics.
if statement
dna=input('Enter DNA sequence: ') if 'n' in dna : nbases=dna.count('n') print("dna sequence has %d undefined bases " % nbases) if condtion 1: do action 1 elif condition 2: do action 2 else: do action 3
Logical operators
Use and, or, not.
dna=input('Enter DNA sequence: ') if 'n' in dna or 'N' in dna: nbases=dna.count('n')+dna.count('N') print("dna sequence has %d undefined bases " % nbases) else: print("dna sequence has no undefined bases)
Loops
while
dna=input('Enter DNA sequence:') pos=dna.find('gt', 0) while pos>-1 : print("Donar splice site candidate at position %d" %pos) pos=dna.find('gt', pos+1)
for
motifs=["attccgt", "aggggggttttttcg", "gtagc"] for m in motifs: print(m, len(m))
range
for i in range(4): print(i) for i in range(1,10,2): print(i)
Problem: find all characters in a given protein sequence are valid amino acids.
protein='SDVIHRYKUUPAKSHGWYVCJRSRFTWMVWWRFRSCRA' for i in range(len(protein)): if protein[i] not in 'ABCDEFGHIKLMNPQRSTVWXYZ': print("this is not a valid protein sequence!") break
continue
protein='SDVIHRYKUUPAKSHGWYVCJRSRFTWMVWWRFRSCRA' corrected_protein='' for i in range(len(protein)): if protein[i] not in 'ABCDEFGHIKLMNPQRSTVWXYZ': continue corrected_protein=corrected_protein+protein[i] print("COrrected protein seq is %s" % corrected_protein)
else Statement used with loops
- If used with a for loop, the else statement is executed when the loop has exhausted iterating the list
- If used with a while loop, the else statement is executed when the condition becomes false
# Find all prime numbers smaller than a given integer N=10 for y in range(2, N): for x in range(2, y): if y %x == 0: print(y, 'equals', x, '*', y//x) break else: // loop fell through without finding a factor print(y, 'is a prime number')
The pass statement is a placeholder
if motif not in dna: pass else: some_function_here()
Functions
Get modular with Python function
def function_name(arguments) : function_code_block return output
For example,
def gc(dna) : "this function computes the gc perc of a dna seq" nbases=dna.count('n')+dna.count('n') gcpercent=float(dna.count('c')+dna.count('C')+dna.count('g) +dna.count('G'))*100.0/(len(dna)-nbases) return gcpercent gc('AAAAGTNNAGTCC') help(gc)
SyntaxError: invalid syntax
https://stackoverflow.com/a/11890194
On the Python shell add an empty line at the end of function definition. Eg
>>> def fun(a): ... return a+1 ... >>> fun(9) 10 >>> exit()
On a python script
def fun(a): return a+1 print fun(9)
Debug functions
https://stackoverflow.com/a/4929267
You can launch a Python program through pdb by using pdb myscript.py or python -m pdb myscript.py
$ cat debug.py def fun(a): a= a*2 a= a*3 return a+1 print fun(5) $ python -m pdb debug.py > /home/pi/Downloads/debug.py(1)<module>() -> def fun(a): (Pdb) b fun Breakpoint 1 at /home/pi/Downloads/debug.py:1 (Pdb) c > /home/pi/Downloads/debug.py(2)fun() -> a= a*2 (Pdb) n > /home/pi/Downloads/debug.py(3)fun() -> a= a*3 (Pdb) > /home/pi/Downloads/debug.py(4)fun() -> return a+1 (Pdb) p a 30 (Pdb) n --Return-- > /home/pi/Downloads/debug.py(4)fun()->31 -> return a+1 (Pdb) exit
Boolean functions
Problem: checks if a given dna seq contains an in-frame stop condon
dna=input("Enter a dna seq: ") if (has_stop_codon(dna)) : print("input seq has an in frame stop codon.") else : print("input seq has no in frame stop codon.") def has_stop_codon(dna) : "This function checks if given dna seq has in frame stop codons." stop_codon_found=False stop_codons=['tga', 'tag', 'taa'] for i in range(0, len(dna), 3) : codon=dna[i:i+3].lower() if codon in stop_codons : stop_codon_found=True break return stop_codon_found
Function default parameter values
Suppose the has_stop_codon function also accepts a frame argument (equal to 0, 1, or 2) which specifies in what frame we want to look for stop codons.
def has_stop_codon(dna, frame=0) : "This function checks if given dna seq has in frame stop codons." stop_codon_found=False stop_codons=['tga', 'tag', 'taa'] for i in range(frame, len(dna), 3) : codon=dna[i:i+3].lower() if codon in stop_codons : stop_codon_found=True break return stop_codon_found dna="atgagcggccggct" has_stop_codon(dna) # False has_stop_codon(dna, 0) # False has_stop_codon(dna, 1) # True has_stop_codon(frame=0, dna=dna)
More examples
Reverse complement of a dna sequence
def reversecomplement(seq): """Return the reverse complement of the dna string.""" seq = reverse_string(seq) seq = complement(seq) return seq reversecomplement('CCGGAAGAGCTTACTTAG')
To reverse a string
def reverse_string(seq): return seq[::-1] reverse_string(dna)
Complement a DNA Sequence
def complement(dna): """Return the complementary sequence string.""" basecomplement = {'A':'T', 'C':'G', 'G':'C', 'T':'A', 'N':'N', 'a':t', 'c':'g', 'g':'c', 't':'a', 'n':'n'} # dictionary letters = list(dna) # list comprehensions letters = [basecomplement[base] for base in letters] return ''.join(letters)
Split and Join functions
sentence="enzymes and other proteins come in many shapes" sentence.split() # split on all whitespaces sentence.split('and') # use 'and' as the separator '-'.join(['enzymes', 'and', 'other', 'proteins', 'come', 'in', 'many', 'shapes'])
Variable number of function arguments
def newfunction(fi, se, th, *rest): print("First: %s" % fi) print("Second: %s" % se) print("Third: %s" % th) print("Rest... %s" % rest) return
Modules and packages
- Python Modules
- Python Modules from w3schools
- Python import: Advanced Techniques and Tips
- Python Module Index
Packages group multiple modules under on name, by using "dotted module names". For example, the module name A.B designates a submodule named B in a package named A. See What's the difference between a Python module and a Python package?
<dnautil.py>
#!/usr/bin/python """ dnautil module contains a few useful functions for dna seq """ def gc(dna) : blah blah return gcpercent
When a module is imported, Python first searches for a built-in module with that name.
If built-in module is not found, Python then searches for a file obtained by adding the extension .py to the name of the module that it's imported:
- in your current directory,
- the directory where Python has been installed
- in a path, i.e., a colon(':') separated list of file paths, stored in the environment variable PYTHONPATH.
You can use the sys.path variable from the sys built-in module to check the list of all directories where Python look for files
import sys sys.path
If the sys.path variable does not contains the directory where you put your module you can extend it:
sys.path.append("/home/$USER/python") sys.path
Using modules (from PACKAGE/DIRNAME/FILENAME import CLASS)
from math import * print(floor(3.7)) import dnautil dna="atgagggctaggt" gc(dna) # gc is not defined dnautil.gc(dna) # Good
Import Names from a Module
from dnautil import * gc(dna) # OK from dnautil import gc, has_stop_codon
Get modular with Python functions & Learn object-oriented programming with Python from opensource.com.
help
from AAA import BBB help(BBB) help(BBB.FunctionName) import BBB as CCC help(CCC)
Packages & __init__.py
Each package in Python is a directory which MUST contain a special file __init__.py. This file can be empty and it indicates that the directory it contains is a Python package, so it can be imported the same way a module can be imported. https://docs.python.org/2/tutorial/modules.html
Example: suppose you have several modules dnautil.py, rnautil.py , and proteinutil.py. You want to group them in a package called "bioseq" which processes all types of biological sequences. The structure of the package:
bioseq/ __init__.py dnautil.py rnautil.py proteinutil.py fasta/ __init__.py fastautil.py fastq/ __init__.py fastqutil.py
Loading from packages:
import bioseq.dnautil bioseq.dnautil.gc(dna) from bioseq import dnautil dnautil.gc(dna) from bioseq.fasta.fastautil import fastqseqread
Example
Building a Multiple Choice Quiz by freeCodeCamp.org
QuestionFile.py
class Question: def __init__(self, prompt, answer): self.prompt = prompt self.answer = answer
app.py
from QuestionFile import Question question_prompts = [ "What color are apples?\n(a) Red/Green\n(b) Purple\n(c) Orange\n\n", "What color are Bananas?\n(a) Teal\n(b) Magenta\n(c) Yellow\n\n", "What color are strawberries?\n(a) Yellow\n(b) Red\n(c) Blue\n\n" ] questions = [ Question(question_prompts[0], "a"), Question(question_prompts[1], "c"), Question(question_prompts[2], "b") ] def run_test(question): score = 0 for question in questions: answer = input(question.prompt) if answer == question.answer: score += 1 print("You got " + str(score) + " /" + str(len(questions))+ " correct") run_test(questions)
Run the program by python3 app.py
Files - Communicate with the outside
f=open('myfile', 'r') # read f=open('myfile') f=open('myfile', 'w') # write f=open('myfile', 'a') # append
Take care if a file does not exists
try: f = open('myfile') except IOError: print("the file myfile does not exist!!")
Reading
for line in f: print(line)
Change positions within a file object
f.seek(0) # go to the beginning of the file f.read()
Read a single line
f.seek(0) f.readline()
Write into a file
f=open("/home/$USER/myfile, 'a) f.write("this is a new line") f.close()
Importing large tab-delimited .txt file into Python
# R write.table(iris[1:10,], file="iris.txt", sep="\t", quote=F, row.names=F) # Python import csv with open('iris.txt') as f: reader = csv.reader(f, delimiter="\t") d = list(reader) print(d[0][2]) print(d[1][2]) # Shell $ python test_csv.py Petal.Length 1.4
If the data are all numerical, we can use the numpy package.
# R write.table(iris[1:10, 1:4], file="~/Downloads/iris2.txt", sep="\t", quote=F, row.names=F, col.names=F) # Python import numpy as np d = np.loadtxt('iris2.txt', delimiter="\t") print(d[0][2]) print(d[1][2]) # Shell $ python test_csv2.py 1.4 1.4
Read text file from a URL
import urllib.request url = "http://textfiles.com/adventure/aencounter.txt" file = urllib.request.urlopen(url) for line in file: print(line.decode('utf-8'))
- urllib.request — extensible library for opening URLs
- Python Internet Access using Urllib.Request and urlopen()
Command line arguments
Suppose we run 'python processfasta.py myfile.fa' in the command line, then
import sys print(sys.argv) # ['processfasta.py', 'myfile.fa']
More completely
#!/usr/bin/python """ processfasta.py builds a dictionary with all sequences from a FASTA file. """ import sys filename=sys.argv[1] try: f = open(filename) except IOError: print("File %s does not exist!" % filename)
Parsing command line arguments with getopt. Suppose we want to store in the dictionary the sequences bigger than a given length provided in the command line: 'processfasta.py -l 250 myfile.fa'
#!/usr/bin/python import sys import getopt def usage(): print """ processfasta.py: reads a FASTA file and builds a dictionary with all sequence bigger than a given length processfasta.py [-h] [-l <length>] <filename> -h print this message -l <length> filter all sequences with a length smaller than <length> (default <length>=0) <filename> the file has to be in FASTA format o, a = getopt.getopt(sys.argv[1:], '1:h') opts = {} # empty dictionary seqlen=0; for k,v in o: opts[k] = v if 'h' in opts.keys(): # he means the user wants help usage(); sys.exit() if len(a) < 1: usage(); sys.exit("input fasta file is missing") if 'l' in opts.keys(): if opts['l'] <0 : print("length of seq should be positive!"); sys.exit(0); seqlen=opts['l']
stdin and stdout
sys.stdin.read() sys.stdout.write("Some useful ouput.\n") sys.stderr.write("Warning: input file was not found\n")
Call external programs
import subprocess subprocess.call('["ls", "-l"]) # return code indicates the success or failure of the execution subprocess.call('["tophat", "genome_mouse_idx", "PE_reads_1.fq.gz", "PE_reads_2.fq.gz"])
Exceptions
5 Python Examples to Handle Exceptions using try, except and finally
Biopython & Pubmed
- Parsers for various bioinformatics file formats (FASTA, Genbank)
- Access to online services like NCBI Entrez or Pubmed databases
- Interfaces to common bioinformatics programs such as BLAST, Clustalw and others.
import Bio print(Bio.__version__)
Running BLAST over the internet
from Bio.Blast import NCBIWWW fasta_string = open("myseq.fa").read() result_handle = NCBIWWW.qblast("blastn":, "nt", fasta_string) # blastn is the program to use # nt is the database to search against # default output is xml help(NCBIWWW.qblast)
The BLAST record
from Bio.Blast import NCBIXML blast_record = NCBIXML.read(result_handle)
Parse BLAST output
len(blast_record.alignments) E_VALUE_THRESH = 0.01 for alignment in blas_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print('***Alignment***') print('sequence:', alignment.title) print('length:', alignment.length) print('e value:', hsp.expect) print(hsp.query) print(hsp.match) print(hsp.sbjct)
More help with Biopython
- Biopython tutorial and cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html
- Biopython FAQ: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc5
pubmed_parser
Parser for Pubmed Open-Access XML Subset and MEDLINE XML Dataset
pyTest
pyc file
What is the difference between .py and .pyc files? [duplicate]. I observe it can cause a problem when I want to modify a python file but it keeps using the old pyc file so my change is not used (Raspbery Pi e-ink example).
Shutdown or restart OS
Below is tested on Raspbian
import os os.system('sudo shutdown -h now')
Popular python libraries
20 Python libraries you can’t live without
psutil
- psutil.cpu_percent() examples. Inspired by the e-ink example from Raspberry Pi.
- https://github.com/arvydas/blinkstick-python/wiki/Example%3A-Display-CPU-usage
- https://www.liaoxuefeng.com/wiki/1016959663602400/1183565811281984
# pip install psutil --user for x in range(10): psutil.cpu_percent(interval=1)
numpy
- An introduction to Numpy and Scipy
- https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
- Cheat sheets
- Program to find the Sum of each Row and each Column of a Matrix
scipy
seaborn
- https://seaborn.pydata.org/
- Examples of performaing Explorator Data Analysis for few public clinical data sets
matplotlib
https://matplotlib.org/users/installing.html
Installation.
python -m pip install -U pip python -m pip install -U matplotlib # https://stackoverflow.com/a/50328517 sudo apt-get install python3.5-tk
Example.
from sklearn import datasets iris = datasets.load_iris() import matplotlib.pyplot as plt iris = iris.data # Scatterplot plt.scatter(iris[:,1], iris[:,2]) plt.show() # Boxplot plot.boxplot(iris[:,1]) plt.show() # Histogram plt.hist(iris[:,1]) plt.show()
scikit-learn
scikit-learn: Machine Learning in Python
Installation.
pip install -U scikit-learn
Example.
$ python >>> from sklearn import datasets >>> iris = datasets.load_iris() >>> digits = datasets.load_digits()
feedparser
Never miss a Magazine article — build your own RSS notification system
Boto
A Python interface to Amazon Web Services
- http://docs.pythonboto.org/en/latest/
- https://hpc.nih.gov/training/handouts/object_storage_class_2018_oct.pdf
PIL, Pillow
- Installation
sudo apt install python-imaging
- How I can load a font file with PIL.ImageFont.truetype without specifying the absolute path?
plotnine
Python and R – Part 2: Visualizing Data with Plotnine
nltk: Natural Language Toolkit
pygame
Learn Python by creating a video game
scanpy
- scanpy and the installation instruction
- mnnpy
Trouble shooting
ImportError: cannot import name main when running pip
https://stackoverflow.com/a/50187211
TypeError: ‘module’ object is not callable
I was trying to run "bbknn.py" from here.
Solve “TypeError: ‘module’ object is not callable” in Python, TypeError: 'module' object is not callable
The problem is I have a file called "bbknn.py" and I have "import bbknn" in the code. It will confuse python. The solution is to rename my script file "bbknn.py" (avoid MODULE.py) to other name like "bbknnDemo.py".
Illegal instruction
I got this error after I called python3 -c 'import scanpy'. Python on Biowulf.
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh TMPDIR=/tmp bash Miniconda3-latest-Linux-x86_64.sh -p ~/conda -b source ~/conda/etc/profile.d/conda.sh # ~/conda/condabin is added to PATH conda activate base python -V # Python 3.9.4 conda create -n project1 pandas numpy scipy -y conda activate project1 pip3 install scanpy bbknn ls ~/conda/envs/project1/lib/python3.9/site-packages # bbknn and scanpy are there python3 -c 'import scanpy' # Illegal instruction conda info --env conda deactivate conda remove --all -n project1 -y conda deactivate
No matching distribution found for XXX
Got an error No matching distribution found for lasagne==0.2.dev1 when I ran 'pip install .' on DeepSurv.
https://github.com/imatge-upc/saliency-salgan-2017/issues/29
Python AttributeError: 'module' object has no attribute 'SSL_ST_INIT'
See https://stackoverflow.com/a/52398193. I got this message after I ran sudo pip install --upgrade cryptography and pip show cryptography. The reason I try to upgrade cryptography is the following message
$ pip show protobuf /home/brb/.local/lib/python2.7/site-packages/pip/_vendor/requests/__init__.py:83: RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown. warnings.warn(warning, RequestsDependencyWarning) Name: protobuf ...
And OpenSSL & pyOpenSSL-0.15.1.egg-inf are under /usr/lib/python2.7/dist-packages directory on my Ubuntu 16.04.
Note the following solutions do not work
$ sudo pip uninstall pyopenssl $ sudo pip install pyOpenSSL==16.2.0
I always get an error message
... File "/usr/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 118, in <module> SSL_ST_INIT = _lib.SSL_ST_INIT AttributeError: 'module' object has no attribute 'SSL_ST_INIT'
And a quick solution is to do sudo rm -r /usr/local/lib/python2.7/dist-packages/OpenSSL. I also did sudo pip install pyopenssl but I did not follow this answer (sudo apt install --reinstall python-openssl).
/usr/bin/env: ‘python’: No such file or directory
On Ubuntu 20.04,
sudo apt-get install python-is-python3
This solved an error when I used youtube-dl.
Projects based on python
- pithos Pandora on linux
- Many Raspberry Pi GPIO projects
- GeneScissors It also requires pip and scikit-learn packages.
- KeepNote It depends on Python 2.X, sqlite and PyGTK.
- Zim It depends on Python, Gtk and the python-gtk bindings.
- Cherrytree It depends on Python2, Python-gtk2, Python-gtksourceview2, p7zip-full, python-enchant and python-dbus.
Send emails
How to Send Automated Email Messages in Python 3
GUI programming
New book: Create Graphical User Interfaces with Python
Qt for GUI development
- http://zetcode.com/gui/pyqt4/
- http://wiki.wildsong.biz/index.php/PyQt Create GUI in Qt Designer and convert/use it in PyQt.
Python 3
- Python 2.7 will not be maintained past 2020. See https://pythonclock.org/.
- Migrating to Python 3 with pleasure
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney.
pip3
Use pip3 instead of pip for Python 3. For example,
pip3 install --upgrade pip pip3 install -U scikit-learn pip3 install -U matplotlib
http.server
Edit Files With Workspaces. The 'http.server' module is contained in python3.
cd ~/website python3 -m http.server
R and Python: reticulate package
- https://cran.r-project.org/web/packages/reticulate/index.html, Github
- Using Python in R markdown
- Importing Python modules and call its functions directly from R — import() function
- Sourcing Python scripts — source_python() function
- Python REPL — The repl_python() function creates an interactive Python console within R.
- Python Version Configuration
- On my macOS, even I have python3 installed, it still asks to install miniconda (/Users/$USER/Library/r-miniconda). So I get another version of Python3 in /Users/$USER/Library/r-miniconda/envs/r-reticulate/bin/python.
- I found RStudio IDE is better than PyCharm and Thonny editors.
- Install Python packages https://rstudio.github.io/reticulate/articles/python_packages.html
- Better to have anaconda3 installed. 2.26G space is required on macOS.
- Direct running py_install("pandas") would ask me to upgrade virtualenv
- Running virtualenv_create("r-reticulate") and then py_install("pandas") works
- reticulate: R interface to Python JJ Allaire
- Cheat sheet
- R or Python? Why not both? Using Anaconda Python within R with {reticulate}
- Run Python from R
- R and Python: Using reticulate to get the best of both worlds. Note
- RStudio v1.2 preview release includes support for using reticulate to execute Python chunks within R Notebooks
- Error from my execution: ValueError: 'RBF' is not in list
- The reticulate package solves the hardest problem in data science: people
- reticulate, virtualenv, and Python in Linux
- Bugs
- Pass Python objects to R: Works. Or use py_run_string()
- Cannot pass R variables to Python: use source_python()
- R vs Python for data science by Norm Matloff.
- RvsPython #5.1: Making the Game even with Python’s Best Practices
- RvsPython #5: Using Monte Carlo To Simulate π
- How to Run Python's Scikit-Learn in R in 5 minutes
- Test python and markdown files
def add_three(x): z = x + 3 return z
--- title: "R Notebook" output: html_notebook --- ```{r} library(reticulate) py_discover_config() x <- 5 source_python("test.py") y <- add_three(x) print(y) ``` Pass R variables to Python. Works ```{python} a = 7 print(r.x) ``` Pass python variables to R. Works. ```{r} py$a py_run_string("y = 10"); py$y ```
How to quit python
Type exit and hit Enter. See https://rstudio.github.io/reticulate/.
Conda, Anaconda, miniconda
- Docker
- Python on Biowulf. Users who need stable, reproducible environments are encouraged to install miniconda in their data directory and create their own private environments.
- The Definitive Guide to Conda Environments
- Introduction to Anaconda. Simplifies installation of Python packages
- Platform-independent package manager
- Doesn’t require administrative privileges
- Installs non-Python library dependencies (MKL, HDF5, Boost)
- Provides ”virtual environment” capabilities
- Many channels exist that support additional packages
- Install Anaconda on macOS. Better to use the command line method in order to install it to the user's directory. The new python can be manually loaded into the shell by using source ~/.bash_profile. Like Ubuntu, ananconda3 is installed under ~/ directory. In addition, Anaconda-Navigator is available under Finder -> Applications.
- How To Install the Anaconda Python Distribution on Ubuntu 16.04. As we can see Anaconda3 will be installed under /home/$USER/anaconda3.
- Download Anaconda3-2018.12-Linux-x86_64.sh from https://www.anaconda.com/distribution/#download-section
- bash Anaconda3-2018.12-Linux-x86_64.sh
- There is a question: Do you wish the installer to initialize Anaconda3. If you answer Yes, it will modify ~/.bashrc file. # This will overwrite system's Python. So the default python/python3 will now be in /home/$USER/anaconda3/bin/.
Do you wish the installer to initialize Anaconda3 by running conda init? [yes|no] [no] >>> yes no change /home/brb/anaconda3/condabin/conda no change /home/brb/anaconda3/bin/conda no change /home/brb/anaconda3/bin/conda-env no change /home/brb/anaconda3/bin/activate no change /home/brb/anaconda3/bin/deactivate no change /home/brb/anaconda3/etc/profile.d/conda.sh no change /home/brb/anaconda3/etc/fish/conf.d/conda.fish no change /home/brb/anaconda3/shell/condabin/Conda.psm1 no change /home/brb/anaconda3/shell/condabin/conda-hook.ps1 no change /home/brb/anaconda3/lib/python3.8/site-packages/xontrib/conda.xsh no change /home/brb/anaconda3/etc/profile.d/conda.csh modified /home/brb/.bashrc ==> For changes to take effect, close and re-open your current shell. <== If you'd prefer that conda's base environment not be activated on startup, set the auto_activate_base parameter to false: conda config --set auto_activate_base false
If I choose not to modify .bashrc file,
Do you wish the installer to initialize Anaconda3 by running conda init? [yes|no] [no] >>> no You have chosen to not have conda modify your shell scripts at all. To activate conda's base environment in your current shell session: eval "$(/home/brb/anaconda3/bin/conda shell.YOUR_SHELL_NAME hook)" To install conda's shell functions for easier access, first activate, then: conda init If you'd prefer that conda's base environment not be activated on startup, set the auto_activate_base parameter to false: conda config --set auto_activate_base false Thank you for installing Anaconda3!
- Anaconda-Navigator (including jupyter notebook, Spyder IDE, ...) can be launched by typing anaconda-navigator in a terminal
- Getting started with Anaconda Python for data science
- Differences:
- Comparions:
- Conda: an open source package management system and environment management system
- Miniconda, which is a smaller alternative to Anaconda that is just conda and its dependencies. Once you have Miniconda, you can easily install Anaconda into it with conda install anaconda.
- Anaconda: Anaconda is a set of about a hundred packages including conda, numpy, scipy, ipython notebook, and so on.
- Getting started with conda. More details are in Tasks.
conda --version # Manage environment conda info --envs # see a list of environments. # The active environment is the one with an asterisk (*) # create a new environment conda create --name myenv # remove an environment conda remove --name myenv --all # Manage Python conda create --name snakes python=3.5 conda activate snowflakes # activate conda info --envs python --version conda activate # Change your current environment back to the default (base) conda deactivate # exit any python virtualenv # Managing packages conda search beautifulsoup4 conda install beautifulsoup4 conda list # Updating Anaconda or Miniconda conda update conda
- Used in pdxBlacklist
Miniconda
https://docs.conda.io/en/latest/miniconda.html As you can see miniconda installers were separated by the Python version.
How To Install Miniconda In Linux 2021. It includes Install Miniconda interactively, unattended installation, Update Miniconda, and Uninstall Miniconda. If you've chosen the default location, the installer will display “PREFIX=/var/home/<user>/miniconda3”. To manually activate conda's base environment, do /home/<user>/miniconda3/etc/profile.d/conda.sh where we assume miniconda is installed under /home/<user>/miniconda3 directory.
Miniconda Installation for macOS users 2019. At the end of installation, we see if we don't want conda's base environment to be activated on start up, we can do conda config --set auto_activate_base false
See also Python on Biowulf about how to specify prefix.
We can add a module to an existing environment. See Miniconda: Python(s) in a convenient setup.
conda install -n <env_name> <package>
Uninstall miniconda
- rm -rf ~/miniconda3
- nano ~/.bash_profile and delete conda initialize block
What's the purpose of the “base” (for best practices) in Anaconda?
https://stackoverflow.com/a/56504279
Does Conda replace the need for virtualenv?
Yes. Conda is not limited to Python but can be used for other languages too.
Using R language with Anaconda
-
The Definitive Guide to Conda Environments, Using R language with Anaconda. Environments created with conda create live by default in the envs/ folder of your Conda directory, whose path will look something like /Users/user-name/miniconda3/envs or /Users/user-name/anaconda3/envs.
Activate conda base Create a new env Activate a new env Deactivate an env ----------------------------> (base) -----------------> -------------------> (r-env) -----------------> (base) eval $(conda shell.bash hook)" conda create r-env conda activate r-env conda deactivate
$ eval "$(/home/brb/anaconda3/bin/conda shell.bash hook)" (base) $ mkdir mypythonproj; cd mypythonproj # This step seems not necessary (base) $ conda create -n r-env r-base ... # # To activate this environment, use # # $ conda activate r-env # # To deactivate an active environment, use # # $ conda deactivate (base) $ conda activate r-env (r-env) $ ls anaconda3/envs r-env (r-env) $ conda install r-essentials (r-env) $ which R /home/brb/anaconda3/envs/r-env/bin/R (r-env) $ ls -la # Still Empty (r-env) $ R --version R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree" # Note that the current R version should be 4.0.3 (r-env) $ conda env list base /home/brb/anaconda3 r-env * /home/brb/anaconda3/envs/r-env (r-env) $ conda deactivate (base) $
It seems to be better to save the environment inside a project directory. So using python -m venv /path/to/new/environment method is preferred. You can also use conda create --prefix /path/to/new/environment. Placing environments outside of the default env/ folder comes with some drawbacks. Read the document of 'The Definitive Guide to Conda Environments'.
- conda-forge channel, A brief introduction, https://anaconda.org/conda-forge/r-base. Following the instruction seems to mess things up though the conda-forge says the latest version is 4.0.3 (3 years late).
$ eval "$(/home/brb/anaconda3/bin/conda shell.bash hook)" (base) $ conda install -c conda-forge r-base ... ## Package Plan ## environment location: /home/brb/anaconda3 added / updated specs: - r-base ... Downloading and Extracting Packages r-base-3.2.2 ... (base) $ R --version /home/brb/anaconda3/lib/R/bin/exec/R: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory (base) $ which R /home/brb/anaconda3/bin/R
Run R with Jupyter notebook
- How To Set Up Jupyter Notebook with Python 3 on Ubuntu 18.04
- Using the R programming language in Jupyter Notebook
- Setup Jupyter Notebook for R (Windows OS, no conda)
- How to Add R to Jupyter Notebook (full steps) using Anaconda
- How to install R on a Jupyter notebook using homebrew
- ggplot2: Mastering the basics & Jupyter Notebook. To set up the Jupyter environment, see the Docker method.
docker run --rm -p 8888:8888 \ -e JUPYTER_ENABLE_LAB=yes \ -v "$PWD":/home/jovyan \ jupyter/datascience-notebook:r-4.0.3
We first have to use "git clone https://github.com/rlbarter/ggplot2-thw.git" to download the repo and "cd ggplot2-thw". Then after opening http://IP:8888/?token=XXXXXXX we will see "ggplot2.ipynb" on the left panel. Double click the file will open it on the Notebook.
Example 1: GEO2RNAseq
Web framework
Flask
- Flask (web framework) Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries.
- https://palletsprojects.com/p/flask/
- How to Install Flask with Python 3 on Ubuntu 18.04
- Data Science for Startups: Containers Building reproducible setups for machine learning
- Raspberry Pi
- Raspberry Pi System Stats. To access the web page, use for example http://192.168.1.104:5000/cpu or http://192.168.1.104:5000/disk or http://192.168.1.104:5000/memory.
- Build a Python Web Server with Flask
- Python WebServer With Flask and Raspberry Pi