Gene Characterization Index Estimation Form

Random gene sample set for raf@home.

Welcome back raf@home. Thanks for returning.

Please assign a single score between 1 (poorly characterized) and 10 (fully characterized) for each gene in the set. For some rough guidelines, please read the information at the bottom of the page. In short, we hope you will assign a score based on your assessment of the information accessible through the Entrez Gene system. There are too many resources to look at everything, but we hope you can take a few minutes to explore each gene. WE APPRECIATE YOUR EFFORT.

GeneID GCI Score Gene Description
388 RHOB; ras homolog gene family, member B
8498 RANBP3; RAN binding protein 3
5581 PRKCE; protein kinase C, epsilon
3212 HOXB2; homeo box B2
10782 ZNF274; zinc finger protein 274
55635 DEPDC1; DEP domain containing 1
57154 SMURF1; SMAD specific E3 ubiquitin protein ligase 1
51300 C3orf1; chromosome 3 open reading frame 1
84331 MGC15416; hypothetical protein MGC15416
4519 CYTB; cytochrome b

I will come back later. Please store the values I have specified so far.
Reset all GCI scores to values retrieved at the beginning of this session.
I am all done and submit my final estimates of GCI scores.

I quit or do not wish to participate. I wish to stop evaluating this test set and return it to the unassigned pool. Note: This test set will be assigned to the next registering evaluator.

Back to GCI Training Home Page

Gene Characterization Index Scale

Feature Suggested Score Range
<= UNKNOWN... ...CHARACTERIZED =>
1 2 3 4 5 6 7 8 9 10
Motif Matches                    
Expression                    
Structure                    
Genetic Information                    
Sequence Similarity                    
Pathway Defined                    
Biochemical Function                    

Novelty: Degree of completeness of understanding the functional aspects of a gene.

GUIDE TO ASSIGNING A SCORE

In order to fully describe the level of characterization, we need to look at a variety of gene descriptions/data. The Entrez Gene system allows users to find most of the pertainent links to information about specific genes. These links will take you to diverse kinds of data. We have provided a general guideline for how different types of data may influence a score. These are not hard rules and we want you to express your own opinions in assigning scores. Within the guide, the suggestions are based on each kind of data being the "best" or "most convincing" available. Existance of of functional information from one category may automatically imply an existance of facts in another category. Furthermore, there will often be cases where there is strong evidence in multiple categories. In such cases, please use your own judgement as to whether the fusion of the data is sufficient to convince you that the score should exceed the suggested ranges.

Brief descriptions of the terms used in the guidelines follow.

Motif Matches: A motif or domain match for some portion of the protein. The number and characteristics of each motif or domain should effect the impact of this feature on the characterization index.

Expression: The breadth of understanding of expression characteristics of a gene. The source may range from counts of ESTs to advanced information from microarray studies. Information may reveal a narrow tissue specificity or that the gene is induced by certain stimuli (e.g. cell stress or growth factors). Protein expression studies may suggest sub-cellular localizations.

Structure: While protein structure does not always reveal much about the function of a gene, such information is often highly informative. Structural information may be obtained based on a solved structue or indirectly by strong sequence similarity to a protein with known structure.

Genetic Information: Genes can be associated with specific phenotypes at the cellular or organism level. Strong genetic evidence of such an effect can suggest function. You are likely to find such data via OMIM or model organism links.

Sequence Similarity: A level of similarity to other sequences with varying levels of characterization. A broad range of similarity is possible with different implications - from a limited similarity to weakly characterized genes, to a strong similarity to well characterized genes indicating functional paralogs or orthologs. The value can be affected by presence of motif, structural, expression and genetic characteristics. The HomoloGene resource can provide such data.

Pathway: Association with a specific biochemical process or pathway is is highly indicative of a well-defined gene. Such evidence may be a found in a pathway database (e.g. KEGG) or the assigment of an enzyme classification code (EC number).

Biochemical Function: For many genes, we have a strong understanding of their specific biochemical function. In some cases functional annotation can be partially informative (e.g. tyrosine kinase), while others can be decisive (e.g. citrate synthase). Such annotations may be found as GeneOntology-associated labels or in the description lines from sequence databases.

REMINDER: We want a raw response from you, so please do not dwell for long on the suggested ranges.