Bench Press

The Crossroads of Science and Tech

Archive for the ‘human genome’ tag

To Stimulate Open Science

without comments

A lot of scientific circles are talking about how best to spur collaboration, and that’s spawned a number of movements, such as “open access” and “open science” — both inspired by the “open source” movement in programming — that fight to end the fencing of science into proprietary, commercial enclaves that require fees to access. Clearly, in terms of fostering the trade of knowledge, an open, free highway is better than a highway with a large toll.

Although much of this movement towards open science has focused on journals and their large subscription fees, there’s another area of open science that’s drawn my attention: Gene Ontology (GO) annotations, which are a set of standardized annotations to classify genes according to their biological, such as “amino acid metabolism.” These annotations are, as of now, curated by experts. What I’ve noticed in particular is that GO has thrived in one community, and withered in another, and I’m curious as to why.

The yeast community is famous amongst all the molecular biology communities as being open and collaborative, to the extent that almost all gene names have been systematized, annotations for genes are very extensive and well-structured, a strain is available for the deletion of every gene, many genes are available fused to a fluorescent marker for easy microscopy, and so on. Just go to the Saccharomyces Genome Database, and there’s a wealth of all this sort of information at your fingertips, centralized, standardized, interconnected, and easy to use. In particular, the Gene Ontology annotations are considered superb and accurate, allowing for easy computational interpretation of large-scale experiments involving hundreds and thousands of genes and their interactions. Yeast genomicists use GO all the time, and contribute to its development very often.

In contrast, the human Gene Ontology annotations are considered sparse and relatively uninformative, and generally they aren’t quite as useful for interpreting things like gene expression microarrays. Instead, one of the most successful and popular sets of biological function annotations is called Ingenuity, which is a commercial software package, well developed by the large amount of money poured into it by pharmaceutical companies and other health science research and development.

Why did the two communities end up going in two directions, one towards a more collaborative, “open science”-friendly annotation system, and the other towards a proprietary, commercial annotation platform? Undoubtedly, part of the reason is the structure of financial incentives; human biology has unique opportunities for direct commercialization via drug or health research, and so people would naturally focus their efforts on things that can win them fortune. But the first yeast biology research done by Louis Pasteur was probably related to budding (pun intended) commercial R&D on reproducible bread/wine/beer recipes, so what prevented the yeast community from, say, balkanizing yeast research because of incentives from the beer brewing and bread-making industries?

Perhaps it is because the yeast community arrived at common standards and nomenclature for information sharing long before it got very large. After all, yeast doesn’t nearly have the same problem of having multiple names for the same genes that humans do (just look at the gene RANKL, which is also known as OPGL, ODF, CD254, TNFSF11, TRANCE, and hRANKL2). They also don’t have nearly as much of a problem with the explosion of gene database IDs (humans have, as a small sample: RefSeq, HGNC, Ensembl, EMBL/GenBank, Entrez, MIM, Unigene, UniProt/SwissProt, and UCSC). Perhaps having a common, universal standards-making institution is the answer, to make sure all the railroad tracks are the same width, to use an analogy.

Or perhaps its the size of the community. There are many, many more labs studying human biology than yeast biology, not only because of the financial incentives, but also because of the huge size of the human genome (1000 times bigger than the yeast genome). Maybe it’s just easier to coordinate fewer people into one community.

I think as the scientific community moves forward, especially in embracing new collaborative methods on the internet, we should closely examine what’s worked so far and what hasn’t, so that we don’t end up fording through endless patents, fees, and proprietary, non-interoperable data structures to get what we need.