Figure 4. Quality of the transcriptome datasets.
(A) To assess the correlation between expression level and confidence, we compared the transcriptome datasets to a gold standard, namely UniProtKB. We quantified the quality of the datasets in terms of its fold enrichment for correct gene–tissue associations compared to random chance. The comparison shows that higher expression values imply higher quality and that the three confidence cutoffs (vertical dotted lines) used correspond to equivalent quality in all datasets. (B) The distribution of expression breadth for UniProtKB is strongly skewed towards tissue-specific proteins, contrary to what was seen for transcriptome datasets. (C) We thus constructed a consensus mRNA reference set; its expression breadth distribution is in line with that of the individual mRNA datasets. (D) The mRNA reference set is highly complementary to the UniProtKB gold standard, providing 7,384 gene–tissue association that are not in the latter.