|
ULYSSES -
Project Outline
The elucidation of
gene function on a whole genome scale is the central objective of
functional genomics. The focus of our attention is directed
towards
human genes, in particular to understand the biology of our species and
to identify candidate genes for potential drug targets.
Practically, it
is difficult to carry out complex experiments in human cells, for both
technical and ethical reasons. Diverse model organisms have
played a
central role in shedding light on gene function by circumventing many
of
the obstacles associated to research in human model systems. The
vital
assumption is that genes sharing a common ancestor through duplication
before speciation (orthologs) typically occupy the same functional
niche
in different species. Most of the data from experiments in
organisms as
diverse as yeast, worm, and fly have been collected in
large data sources and are available to the public. Here we
propose an
integrated bioinformatics platform to suggest potential function for
human genes based on observed data for orthologous genes in model
organisms.
As a first step, one needs to
identify orthologs for each human gene in a defined set of organisms by
sequence comparison. The National Institute for Biotechnology
Information (NCBI) developed
a system, HomoloGene,
for the automated detection of orthologs in 12 completely
sequenced eukaryotes. This evolutionary classification of genes based
on homologous relationships is the core of our integrated
annotation system.
 |
 |
|
| A |
B |
|
|
| Figure. Orthologous genes are defined
across several species (planes, A). Interactions are identified within
each species (lines, A), associations are correlated (B).
|
|
|
The next stage is the identification of protein
interaction data for genes from the core organisms (C. elegans, D.
melanogaster, S. cerevisiae). Hereby, we distinguish
between direct (yeast-two-hybrid) and indirect (complex purification)
protein interactions. Our aim is to attribute various
labels to
sets of orthologous genes in order to increase confidence in results
from high-throughput experiments by complex data integration across
species. Many of the datasets amenable for integration are
captured in
individual databases and rendered accessible to the public.
Our goal was to identify major
data repositories and to link their
content in a central platform. We coordinated this project with UBiC
(UBC Bioinformatics Centre)
to fully capitalize on the ongoing
Integrated Database Project (Atlas, Shah et al., submitted), which is an
effort to integrate
multiple forms of biological, publication and ontological data under
one
query space for data mining. The challenge of our approach,
compared to
other efforts, is a gene network comparison based on the orthologous
relationships between individual genes from different species, the
inclusion of multiple techniques and datasets, and the statistical
evaluation of the significance of individual interactions.
References
[1] H. Yu et al., Annotation
transfer between genomes: protein-protein
interologs and protein-DNA regulogs. Genome Res. 2004 Jun;14(6):1107-18.
[2] P. Bork et al., Protein
interaction networks from yeast to human.
Curr Opin Struct Biol. 2004 Jun;14(3):292-9.
[3] L.R. Matthews et al.,
Identification of potential interaction
networks using sequence-based searches for conserved protein-protein
interactions or "interologs". Genome Res, 11 (2001) 2120-6.
Project Members:
Wyeth W. Wasserman, CMMT (PI, TFs)
B.F. Francis Ouellette, UBiC (PI, Atlas)
Danielle Kemmer, CMMT, Karolinska Institute (Graduate student, project
leader, data selection)
Sohrab P. Shah, UBiC (Chief, high-throughput bioinformatics, database
design, data integration)
Jochen Brumm, CMMT (Graduate student, statistical advice, pathways)
Jonathan Lim, CMMT (Software developer, software for interfacing)
Raf Podowski, Karolinska Institute (Graduate student, text analysis and
literature)
Yong Huang, UBiC (Database administrator, Atlas, database integration)
John Ling, UBiC (Software developer, Atlas, data retrieval)
Team e-mail: ilg@cmmt.ubc.ca
|