GEO: Difference between revisions

From 太極
Jump to navigation Jump to search
No edit summary
Line 33: Line 33:
Non-coding RNA profiling by high throughput sequencing 1,478
Non-coding RNA profiling by high throughput sequencing 1,478
Third-party reanalysis 135
Third-party reanalysis 135
</pre>
The [http://rpubs.com/seandavi/GEOMetadbSurvey2014 R code] to query this information is
<pre>
library(tidyr)
gse_type = select(tgse,gse,type) %>%
  transform(type = strsplit(type,';\\t')) %>%
  unnest(type)
type_count = select(gse_type,type) %>%
  group_by(type) %>%
  summarize(count=n()) %>%
  arrange(desc(count))
pander(type_count,justify=c('left','right'))
</pre>
</pre>



Revision as of 10:28, 10 June 2015

Gene Expression Omnibus (GEO) website is located at http://www.ncbi.nlm.nih.gov/geo/. GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.

Browse Content

Repository Browser/Summary

Click on 'Browse Content' > 'Repository Browser' to go to the summary page. It has 4 tabs.

Series/Series Type

Expression profiling by array	40,319
Expression profiling by genome tiling array	639
Expression profiling by high throughput sequencing	4,772
Expression profiling by SAGE	242
Expression profiling by MPSS	20
Expression profiling by RT-PCR	329
Expression profiling by SNP array	13
Genome variation profiling by array	596
Genome variation profiling by genome tiling array	1,068
Genome variation profiling by high throughput sequencing	63
Genome variation profiling by SNP array	826
Genome binding/occupancy profiling by array	174
Genome binding/occupancy profiling by genome tiling array	2,114
Genome binding/occupancy profiling by high throughput sequencing	3,940
Genome binding/occupancy profiling by SNP array	12
Methylation profiling by array	556
Methylation profiling by genome tiling array	718
Methylation profiling by high throughput sequencing	764
Methylation profiling by SNP array	9
Protein profiling by protein array	167
Protein profiling by Mass Spec	6
SNP genotyping by SNP array	514
Other	1,147
Non-coding RNA profiling by array	2,166
Non-coding RNA profiling by genome tiling array	104
Non-coding RNA profiling by high throughput sequencing	1,478
Third-party reanalysis	135

The R code to query this information is

library(tidyr)
gse_type = select(tgse,gse,type) %>%
  transform(type = strsplit(type,';\\t')) %>%
  unnest(type) 
type_count = select(gse_type,type) %>%
  group_by(type) %>%
  summarize(count=n()) %>% 
  arrange(desc(count))
pander(type_count,justify=c('left','right'))

Platform/Technology

Technology	Count
in situ oligonucleotide	5,657
spotted oligonucleotide	2,852
spotted DNA/cDNA	2,869
antibody	24
MS	17
SAGE NlaIII	67
SAGE Sau3A	4
SAGE RsaI	1
SARST	2
MPSS	18
RT-PCR	277
other	174
oligonucleotide beads	227
mixed spotted oligonucleotide/cDNA	16
spotted peptide or protein	110
high-throughput sequencing	2,073

Samples/Samples Type

Sample type	Count
RNA	1,017,959
genomic	244,511
protein	12,860
SAGE	1,763
mixed	3,976
other	7,509
SARST	9
MPSS	207
SRA	135,247

Organism

A partial list:

Organism	Series	Platforms	Samples
Homo sapiens	22,477	4,590	792,844
Mus musculus	15,758	1,959	240,935
Rattus norvegicus	2,358	475	68,583
Saccharomyces cerevisiae	1,790	550	37,435
Arabidopsis thaliana	2,416	331	30,709
Drosophila melanogaster	2,422	317	23,601
Sus scrofa	405	107	9,809
Caenorhabditis elegans	1,154	183	8,898
Zea mays	265	91	8,667
Bos taurus	462	147	7,780
Oryza sativa	493	173	5,616
Glycine max	179	41	5,863
Gallus gallus	375	105	5,509
Escherichia coli	508	127	5,056
Macaca mulatta	245	40	4,504
Xenopus laevis	111	25	1,054

Series, Samples, Platforms, DataSets

Geo series.png Geo samples.png Geo platform.png Geo datasets.png

R packages

GEOmetadb

GEOquery

SRAdb