Are you positive it’s positive?
As genomes have been sequenced over the past few decades scientists have looked for new ways to analyze and interpret the wealth of information. They’ve developed numerous algorithms with goals ranging from organizing evolutionary family trees (inspired by plagiarism detecting software) to aligning genetic sequences. All of this to answer the numerous questions that can now be asked thanks to sequence databases. One of the many things scientists have attempted to study is positive selection in protein-coding genes.
Positive selection of advantageous gene mutation is particularly interesting to scientists as it can provide insight into the function of new genes. However, positive selection is difficult to detect and analyze as neutral and deleterious mutations predominate advantageous mutations in frequency. Initially scientists looked for positive selection by simply comparing the ratio (/omega) of nonsynonymous nucleotide substitutions (dN) to the number of synonymous nucleotide substitutions (dS) between homologous protein-coding gene sequences while utilizing Fisher exact tests to accept or reject a null hypothesis of neutral selection1.
Over the years scientists developed additional statistical analyses to infer positive selection. Two of the most popular methods are the branch-site method (BSM) and site-specific method. The BSM utilizes a likelihood ratio test to detect positive selection within a given phylogenic branch. The site-specific method on the other hand utilizes /omega to look for specific amino acid substitutions that are positively selected. Both of these methods have been utilized in hundreds of papers and seemingly provided a great deal of insight into potential points of positive selection within various genomes. What would you say then when told that both of these methods contain significant flaws which provide an inordinate number of false positives?

Bovine Rhodopsin protein with predicted sites in red and experimentally determined in blue. (Adapted from Yokoyama et al. 2008 PNAS)
That’s exactly what Masatoshi Nei and his group believe to have shown in a recent paper evaluating the reliability of the branch-site and site-specific methods. Nei’s group utilized several controlled computer simulations as well as data collected by Shozo Yokoyama, at Emory University, on dim-light vision opsins in vertebrates2 in their studies determining that both the branch-site and site-specific methods yielded far too many false positives. Nei and his group contend:
This low rate of predictability occurs because most of the current statistical methods are designed to identify codon sites with high /omega values, which may not have anything to do with functional changes. The codon sites showing functional changes generally do not show a high /omega value. To understand adaptive evolution, some form of experimental confirmation is necessary.
From this paper it looks like scientists looking for high /omega values may have been chasing ghosts by assuming that amino acid changes result in functional changes indicating proof of positive selection. The potential impact this will have on hundreds of papers is stunning. In the end the take home message is that statistical analyses, no matter how elegant, have their limits and ought to be utilized in conjunction with experimental data as much as possible.
(Sources: 1 – Reliabilities of identifying positive selection by the branch-site and the site-prediction methods , 2 – Elucidation of phenotypic adaptations: Molecular analyses of dim-light vision proteins in vertebrates )
updated: Had to change all the &omega to /omega because WordPress kept changing it into ? for some reason…bah
-
Chris Tan
-
AnthonyPhan