Python: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 98: Line 98:
print("%s" % dna)  # gatagc
print("%s" % dna)  # gatagc
</syntaxhighlight>
</syntaxhighlight>
== List ==
A list is an ordered set of values
<syntaxhighlight lang='python'>
gene_expr=['gene', 5.16e-08, 0.001385, 7.33e-08]
print(gene_expr[2]
gene_expr[0]='Lif'
</syntaxhighlight>
Slice a list (it will create a new list)
<syntaxhighlight lang='python'>
gene_expr[-3:]  # [5.16e-08, 0.001385, 7.33e-08]
gene_expr[1:3] = [6.09e-07]
</syntaxhighlight>
Clear the list
<syntaxhighlight lang='python'>
gene_expr[]=[]
</syntaxhighlight>
Size of the list
<syntaxhighlight lang='python'>
len(gene_expr)
</syntaxhighlight>
Delete an element
<syntaxhighlight lang='python'>
del gene_expr[1]
</syntaxhighlight>
Extend/append to a list
<syntaxhighlight lang='python'>
gene_expr).extend([5.16e-08, 0.00123])
</syntaxhighlight>
Count the number of times an element appears in a list
<syntaxhighlight lang='python'>
print(gene_expr.count('Lif'), gene_expr.count('gene'))
</syntaxhighlight>
Reverse all elements in a list
<syntaxhighlight lang='python'>
gene_expr.reverse()
print(gene_expr)
help(list)
</syntaxhighlight>
Lists as Stacks
<syntaxhighlight lang='python'>
stack=['a', 'b', 'c', 'd']
stack.append('e')
</syntaxhighlight>
Sorting lists
<syntaxhighlight lang='python'>
mylist=[3, 31, 123, 1, 5]
sorted(mylist)
mylist  # not changed
mylist.sort()
mylist=['c', 'g', 'T', 'a', 'A']
mylist.sort()
</syntaxhighlight>
Don't change an element in a string!
<syntaxhighlight lang='python'>
motif = 'nacggggtc'
motif[0] = 'a'    # ERROR
</syntaxhighlight>
== Tuples ==
A tuple consists of a number of values separated by commas, and is another standard sequence data type, like strings and lists.
<syntaxhighlight lang='python'>
t=1,2,3
t
t=(1,2,3)  # we may input tuples with or without surrounding parentheses
</syntaxhighlight>
== Sets ==
A set is an unordered collection with no duplicate elements.
<syntaxhighlight lang='python'>
brca1={'DNA repair', 'zine ion binding'}
brca2={protein binding', 'H4 histone'}
brca1 | brca2
brca1 & brca2
brca1 - brca2
</syntaxhighlight>
== Dictionaries ==
A '''dictionary''' is an unordered set of ''key'' and ''value'' pairs, with the requirement that the keys are unique (within on dictionary).
<syntaxhighlight lang='python'>
TF_motif = {'SP1' : 'gggcgg',
            'C/EBP' : 'attgcgcaat',
            'ATF' : 'tgacgtca',
            'c-Myc' : 'cacgtg',
            'Oct-1' : 'atgcaaat'}
# Access
print("The recognition sequence for the ATF transcription is %s." % TF_motif['ATF'])
# Update
TF_motif['AP-1'] = 'tgagtca'
# Delete
del TF_motif['SP1']
# Size of a list
len(TF_motif)
# Get a list of all the 'keys' in a dictionary
list(TF_motif.keys())
# Get a list of all the 'values'
list(TF_motif.values())
# sort
sorted(TF_motif.keys())
sorted(TF_motif.values())
</syntaxhighlight>
In summary, '''strings''', '''lists''' and '''dictionaries''' are most useful data types for bioinformatics.


= Projects based on python =
= Projects based on python =

Revision as of 11:23, 26 February 2016

Basic

Think Python (Free Ebook)

http://www.greenteapress.com/thinkpython/

How to run a python code

python mypython.py

Install a new module

The Python Package Index (PyPI) is the definitive list of packages (or modules)

sudo apt-get install python-pip
pip install SomePackage
pip show --files SomePackage
pip install --upgrade SomePackage
pip uninstall SomePackage

If a package has been bundled by its creator using the standard approach to bundling modules (with Python’s distutils tool), all you need to do is download the package, uncompress it and type:

python setup.py install

How to list all installed modules

help('modules')

How to find the location of installed modules

There are different ways

  1. python -v
  2. import MODULENAME
  3. help('MODULENAME')

Using this way, I find the 'RPi' module is installed under /usr/lib/python2.7/dist-packages.

if __name__ == "__main__":

http://stackoverflow.com/questions/419163/what-does-if-name-main-do

Import a compiled C module

string and string operators

Reference: Python for Genomic Data Science from coursera.

dna[0]='g' 
dna[-1]='c' (start counting from the right)
dna[-2]='g'
dna[0:3]='gat' (the end always excluded)
dna[:3]='gat'
dna[2:]='tgc'
len(dna)=6
type(dna)
print(dna)
dna.count('c')
dna.upper()
dna.find('ag')=3  (only the first occurrence of 'ag' is reported)
dna.find('17', 2) (start looking from pos 17)
dna.rfind('ag')   ( search backwards in string)
dna.islower()    (True)
dna.isupper()    (False)
dna.replace('a', 'A')

User's input

dna=raw_input("Enter a DNA sequence: ")  # python 2
dna=input("Enter a DNA sequence: ")      # python 3

To convert a user's input (a string) to others

int(x, [, base])
flaot(x)
str(x) #converts x to a string
str(65) # '65'

chr(x)  # converts an integer to a character
chr(65) # 'A'

Fancy Output

print("THE DNA's GC content is ", gc, "%") # gives too many digits following the dot
print("THE DNA's GC content is %5.3f %%" % " % gc) 
# the percent operator separating the formatting string and the value to
# replace the format placeholder
print("%d" % 10.6)  # 10
print("%e" % 10.6)  # 10.060000e+01
print("%s" % dna)   # gatagc

List

A list is an ordered set of values

gene_expr=['gene', 5.16e-08, 0.001385, 7.33e-08]
print(gene_expr[2]
gene_expr[0]='Lif'

Slice a list (it will create a new list)

gene_expr[-3:]  # [5.16e-08, 0.001385, 7.33e-08]
gene_expr[1:3] = [6.09e-07]

Clear the list

gene_expr[]=[]

Size of the list

len(gene_expr)

Delete an element

del gene_expr[1]

Extend/append to a list

gene_expr).extend([5.16e-08, 0.00123])

Count the number of times an element appears in a list

print(gene_expr.count('Lif'), gene_expr.count('gene'))

Reverse all elements in a list

gene_expr.reverse()
print(gene_expr)
help(list)

Lists as Stacks

stack=['a', 'b', 'c', 'd']
stack.append('e')

Sorting lists

mylist=[3, 31, 123, 1, 5]
sorted(mylist)
mylist  # not changed
mylist.sort()

mylist=['c', 'g', 'T', 'a', 'A']
mylist.sort()

Don't change an element in a string!


motif = 'nacggggtc'
motif[0] = 'a'    # ERROR

Tuples

A tuple consists of a number of values separated by commas, and is another standard sequence data type, like strings and lists.

t=1,2,3
t
t=(1,2,3)  # we may input tuples with or without surrounding parentheses

Sets

A set is an unordered collection with no duplicate elements.

brca1={'DNA repair', 'zine ion binding'}
brca2={protein binding', 'H4 histone'}
brca1 | brca2
brca1 & brca2
brca1 - brca2

Dictionaries

A dictionary is an unordered set of key and value pairs, with the requirement that the keys are unique (within on dictionary).

TF_motif = {'SP1' : 'gggcgg', 
            'C/EBP' : 'attgcgcaat',
            'ATF' : 'tgacgtca',
            'c-Myc' : 'cacgtg',
            'Oct-1' : 'atgcaaat'}
# Access
print("The recognition sequence for the ATF transcription is %s." % TF_motif['ATF']) 
# Update
TF_motif['AP-1'] = 'tgagtca'
# Delete
del TF_motif['SP1']
# Size of a list
len(TF_motif)
# Get a list of all the 'keys' in a dictionary
list(TF_motif.keys())
# Get a list of all the 'values'
list(TF_motif.values())
# sort
sorted(TF_motif.keys())
sorted(TF_motif.values())

In summary, strings, lists and dictionaries are most useful data types for bioinformatics.

Projects based on python

  • pithos Pandora on linux
  • Many Raspberry Pi GPIO projects
  • GeneScissors It also requires pip and scikit-learn packages.
  • KeepNote It depends on Python 2.X, sqlite and PyGTK.
  • Zim It depends on Python, Gtk and the python-gtk bindings.
  • Cherrytree It depends on Python2, Python-gtk2, Python-gtksourceview2, p7zip-full, python-enchant and python-dbus.

Qt for GUI development