Banner
Home


Supplemental Tables Referenced in A Comprehensive Transcript Index of the Human Genome Generated Using Microarrays and Computational Approaches

Table S1. Complete list of 48,614 transcripts in the Primary Transcript Index (PTI) described in the main text that were represented on the set of predicted transcript arrays (PTA) also described in the main text. The columns in this table are: 1)Rosetta Locus Projection (RLP) id specific to the custom annotations associated with the PTI described in the main text, 2) category or class assigned to the custom annotations, as described in the main text, 3) RefSeq or Unigene (Build 138) accession number associated with the RLP (transcript and EST sequences from RefSeq and/or Unigene supporting the RLP), 4) official gene symbol, if available, and 5) chromosome on which the RLP is located.

Excel format
Text format

Table S2. Complete list of 60 tissues and cell lines hybridized to the predicted transcript arrays described in the main text.

  Organism Sample Type Sample Description
1. Human Tissue Adrenal Cortex
2. Human Tissue Adrenal Medulla
3. Human Tissue Bladder
4. Human Tissue Fetal Liver
5. Human Tissue Kidney
6. Human Tissue Prostate
7. Human Tissue Skeletal Muscle
8. Human Tissue Adrenal Gland
9. Human Tissue Bone Marrow
10. Human Tissue Brain Amygdala
11. Human Tissue Brain Caudate Nucleus
12. Human Tissue Brain Cerebellum
13. Human Tissue Brain Corpus Callosum
14. Human Tissue Brain
15. Human Tissue Brain Thalamus
16. Human Tissue Brain Cerebral Cortex
17. Human Tissue Brain Hippocampus
18. Human Tissue Brain Postcentral Gyrus
19. Human Cell Line Colorectal Adenocarcinoma (SW480)
20. Human Tissue Descending Colon
21. Human Tissue Duodenum
22. Human Tissue Epididymus
23. Human Tissue Fetal Brain
24. Human Tissue Fetal Kidney
25. Human Tissue Fetal Lung
26. Human Tissue Heart
27. Human Tissue Hela
28. Human Tissue Ileocecum
29. Human Tissue Ileum
30. Human Tissue Interventricular Septum
31. Human Tissue Jejunum
32. Human Cell Line Leukemia Chronic Myelogenous (K562)
33. Human Cell Line Leukemia Lymphoblastic (MOLT-4)
34. Human Cell Line Leukemia Promyelocytic (HL-60)
35. Human Tissue Liver
36. Human Tissue Liver Left Lobe
37. Human Cell Line Lung Carcinoma (A549)
38. Human Tissue Lung
39. Human Tissue Lymph Node
40. Human Cell Line Lymphoma Burkitt's (Daudi)
41. Human Cell Line Lymphoma Burkitt's (Raji)
42. Human Cell Line Melanoma (G361)
43. Human Tissue Pancreas
44. Human Tissue Placenta
45. Human Tissue Rectum
46. Human Tissue Retina
47. Human Tissue Salivary Gland
48. Human Tissue Small Intestine
49. Human Tissue Spinal Cord
50. Human Tissue Spleen
51. Human Tissue Stomach
52. Human Tissue Testis
53. Human Tissue Thyroid
54. Human Tissue Tongue
55. Human Tissue Tonsil
56. Human Tissue Trachea
57. Human Tissue Transverse Colon
58. Human Tissue Uterus
59. Human Tissue Uterus Corpus
60. Human Tissue Thymus

Table S3. List of 6 tissues and cell lines hybridized to the chromosome 20 genomic tiling arrays described in the main text.

  Organism Sample Type Sample Description
1. Human Tissue Brain Thalamus
2. Human Cell Line Jurkat
3. Human Cell Line Leukemia Chronic Myelogenous (K562)
4. Human Tissue Testes
5. Human Tissue Thymus
6. Human Tissue Uterus


Table S4. List of 8 tissues and cell lines hybridized to the chromosome 22 genomic tiling arrays described in the main text.

  Organism Sample Type Sample Description
1. Human Tissue Brain Thalamus
2. Human Tissue Jurkat
3. Human Cell Line Leukemia Chronic Myelogenous (K562)
4. Human Cell Line Testes
5. Human Tissue Thymus
6. Human Tissue Uterus
7. Human Tissue  
8. Human Tissue  

 

Comparing the EVG Set with the Current Set of RefSeq Genes

To further validate the expression verified genes identified in the analysis presented in the main text, all probes from the Primary Transcript Index were mapped to the most recent set of RefSeq genes. For a probe to be assigned to a RefSeq sequence, 56 out of 60 bases had to match the positive strand of the RefSeq sequence with no gaps. Probes with hits to multiple RefSeqs from different Locus Link records were discarded. All locus projections containing probes mapping to the current RefSeq set were then summarized based on the EVG detection status and original locus projection category presented in the main text. The results of this summary are given in Supplemental Table S5.

Slightly more than 85% of the locus projections that mapped to the latest RefSeq gene set were detected as expression verified genes. This percentage is higher than expected based on the false negative predictions from the main manuscript, which suggests the estimates provided were somewhat conservative. However, because transcripts that are more highly expressed over a broader range of tissues are over represented in RefSeq and, therefore, are the easiest to detect using the microarray-based approach described in the main text, the conservative estimate provided in the main text is still warranted.

We also note that there is an increased percent of EVGs in the known category (based on RefSeq alignments to the genome) for those locus projections matching current RefSeqs (87%), compared with the percentage of all locus projectsions referenced in the paper (75%). This is likely due to the removal of incorrect provisional RefSeq sequences during the review process and again highlights the value of microarray validation pending full characterization of the human transcriptome. The percentage of EVGs drops as the reliance on gene model predictions for the locus projection increases. This is likely due to cases where the structure of the gene model was incorrect. Since one of the major criteria for determining an EVG is co-regulation of probes across conditions, probes designed against incorrect portions of a partially correct gene model will reduce the power to detect that gene.

Table S5. Comparison of Expression Validated Gene (EVG) predictions with RefSeq sequences (March, 2004). The first column represents the predicted gene categories as described in the main text. The second column gives the counts of PTI genes mapping to the current RefSeq set by category that were detected as expression verified genes (EVG). The third column gives the counts of those PTI genes mapping to the current RefSeq set that were not detected as EVGs. The fourth and fifth columns are simply the percentages associated with the second and third column, respectively.

Support Category EVG Non-EVG EVG Percentage Non-EVG Percentage
Known (originally contained RefSeq gene)
9672
1480
86.7
13.3
cDNA, Protein, and Gene Model Supported
2862
520
84.6
15.4
Protein and EST Supported
82
10
89.1
10.9
cDNA Supported
703
102
87.3
12.7
Protein Supported
39
4
90.7
9.3
cDNA and Gene Model Supported
319
102
75.8
24.2
Protein and Gene Model Supported
113
77
59.5
40.5
Gene Model Supported by 2 Predictions
48
53
47.5
52.5
Gene Model Supported by 1 Prediction
46
20
69.7
30.3
Total
13884
2368
85.4
14.6
Total Excluding Known Category
4212
888
82.6
17.4

Acknowledgements for Supplemental Material
The authors thank Julja Burchard for mapping the PTI probes to the current RefSeq sequences.


Privacy PolicyTrademarksTerms of UseCopyright 2003 Rosetta Inpharmatics LLCMerck & Co., Inc. (USA)