Archive for the ‘fold.it’ tag
Phylo
A few years ago, I blogged about an ingenious crowdsourced game called Fold.It. The concept was pretty simple:
- Use human intuition to help solve complicated three-dimensional protein folding challenges which is oftentimes as effective but significantly faster & cheaper than computational algorithms
- Pool together lots of human volunteers
- Turn the whole experience into a game to get more volunteers to spend more time
The result was a nifty little game which contributed findings which have made it, to date, into a number of peer-reviewed publications (see PNAS paper here and Nature Structure & Molecular Biology paper here)!
Well some researchers at McGill University in Canada want to take a page out of this playbook with a game they built called Phylo (HT: MedGadget) to help deal with another challenging issue in bioinformatics: multiple sequence alignment. In a nutshell, to better understand DNA and how it impacts life, we need to see how stretches of DNA line up with one another. Now, computers are extremely good at taking care of this problem for short stretches of DNA and for “roughly” aligning longer stretches of DNA – but its fairly difficult and costly to do it accurately for long stretches using computer algorithms.
People, however, are curiously intuitive about patterns and shapes. So, the researchers turned the multiple sequence alignment problem into a puzzle game they’ve called Phylo (see image below) where the goal is to line up multiple colored blocks. Players tackle the individual puzzles (in a browser or even on their mobile phone) and the researchers aggregate all of this into improved sequence alignments which help them better understand the underlying genetics of disease.
And how has it been doing? According to the McGill University press release:
So far, it has been working very well. Since the game was launched in November 2010, the researchers have received more than 350,000 solutions to alignment sequence problems. “Phylo has contributed to improving our understanding of the regulation of 521 genes involved in a variety of diseases. It also confirms that difficult computational problems can be embedded in a casual game that can easily be played by people without any scientific training,” Waldispuhl said. “What we’re doing here is different from classical citizen science approaches. We aren’t substituting humans for computers or asking them to compete with the machines. They are working together. It’s a synergy of humans and machines that helps to solve one of the most fundamental biological problems.”
With the new games and platforms, the researchers are hoping to encourage even more gamers to join the fun and contribute to a better understanding of genetically-based diseases at the same time.
Try it out – I have to admit I’m not especially good with puzzle games, so I haven’t been doing particularly well, but the researchers have done a pretty good job with the design of the game (esp. relative to many other academic-inspired gaming programs that I’ve seen) – and who knows, you might be a key contributor to the next big drug treatment!
What to Do as Science Gets Older and More Crowded
A recent NBER paper (gated) by Benjamin Jones from Northwestern conducts a systematic review of trends in scientific research and made a couple of conclusions that won’t come as a surprise to anyone in science (HT: Inside Higher Ed):
As science advances and knowledge accumulates, ensuing generations of innovators spend longer in training and become more narrowly expert, shifting key innovations (i) later in the life cycle and (ii) from solo researchers toward teams
As evidence to this, the average age at which a scientist made a discovery which later qualified for a Nobel prize has increased by 6 years over the course of the 20th century. When looking at publications, the researchers found that the average author list on a publication grew, on average, by 15-20% per decade!
We’ve discussed before the “decline of the Lone Ranger model of science”, but Jones’ paper focuses on looking at the policy implications for such a change. He concludes that the government (and, probably, the academic and private institutions which support researchers) need to adapt policy to reflect this new reality by:
- Tailoring funding and messaging to help keep young researchers interested despite the longer and more difficult training period
- Finding new ways to evaluate the worthiness of proposals as scientist’s expertise becomes more and more specialized
- Altering incentive structures as the team of collaborators replaces the Lone Ranger scientist model of discovery
These policy suggestions are definitely good ones, and are certainly necessary to adapt to a new scientific environment, but one dimension of this which Jones doesn’t discuss as much are the technological (the focus of this blog!) innovations which can help further research in this brave new world.
- Improving science communication with the public. We’ve made multiple mentions of this in the past, but they are no less true here. Active public communications management not only helps secure funding and raise public awareness of the good scientists can do, but it also helps attract the interest of future generations of researchers and policymakers.
- Embracing new collaboration tools. To really kick-start collaboration between scientists across geographies and specialties, we need tools that go beyond just email and fax machines. Tools like Google Wave, wikis, distributed version control, and social media forums like Friendfeed are an early taste of the sort of live collaboration that new web technology can bring about.
- Leveraging cloud computing and heterogeneous compute. One of the reasons discoveries are taking longer and are more expensive is that there is so much more data to collect and to analyze than before. One technological innovation which we’ve talked about at lengths here is the ability of graphics cards/GPUs to make supercomputer-level processing power more readily accessible to research labs. Another is the use of new cloud computing services like Amazon’s to rapidly increase the computational resources that a lab/company has access to. Neither are panaceas for all the data analysis issues which scientists face, but they are definitely ways to make things easier for research groups who have stringent IT budgets.
- Using crowdsourcing to speed innovation. Who says research has to take longer and be more expensive? Perhaps its time to pull on new technological levers which let scientists borrow on the resources and brains of a wider group of people. While new platforms like ChemBioConnect, distributed computing systems like Folding@Home, and volunteer crowdsourcing initiatives like Fold.IT are far from perfect, they hint at a future where researchers can call on resources beyond what their personal computers and brains are capable of.
- Building new research attribution models. When I say new attribution models, I’m referring to two things. The first is embodied by new standards like ORCID which make it easier to understand which person is the author/researcher in question (something which will become more and more important as more people with the same initials/names enter the sciences). The second, and more substantive, is finding new ways to understand who contributed what to a particular study. In today’s digital age, I find it laughable that we still rely on simple author list order to determine the relative roles and positions of the researchers listed on a publication. Employing metadata and other graphical cues can help scientists achieve the recognition they deserve, as well as provide appropriate incentives for teams of researchers to contribute.
- Contributing negative and after-publication results to open repository. While I can understand the hesitation for most research groups to pursue a pure open access strategy, those concerns should not hold with negative or post-publication experimental data. While opening up access to data from failed/negative experiments does little to hurt a lab’s ability to publish first, it can be a dramatic boon for other research groups (especially new labs or labs with interdisciplinary focuses) who can not only use the data for their own analyses and experimental designs, but avoid committing resources to experiments which have already been conducted. If it can work for biotechs and pharma companies, then there’s no reason it should be any different for non-commercial groups.
These suggestions only scratch the surface of what new technologies and policies can do to help scientists in a world where scientific training takes longer and where scientific discoveries need to be more collaborative. If anyone else has any other suggestions, feel free to leave them in the comments!
Playing the crowd
We’ve written before about the ability of scientists to use distributed computing to pool the computing power of millions of users over the internet to solve sophisticated mathematical problems. But imagine if we could actually pool the brainpower of volunteers — but in a way which doesn’t involve jacking our brains into the Matrix.
Now, imagine if it could be fun for the volunteers.
Imagine no longer. Fold.It was created less than a year ago at the University of Washington to do just that. Instead of pooling the computational power of millions of machines, it seeks to pool the “human intuition” of volunteers to solve challenging protein folding problems.
The basic scientific concept behind Fold.It is that nature will “push” chains of amino acids to adopt a folded structure which minimizes free energy. But, while free energy calculations can be done relatively easily, finding the structure that minimizes free energy is not so easy to do and requires immense computational power (which is why Folding@Home uses distributed computing).
But, humans have a gift which computers do not: the gift of intuition. While we may not be able to compute the free energies in our head, we have the ability to make logical jumps and do complex reasoning. While we might not necessarily understand how to calculate the strength of a hydrophobic interaction, we know enough that we should place two hydrophobic (non-polar) leucine amino acids near one another. While we may not be able to write a mathematical equation to describe the arc of a polypeptide chain, we can conceptualize and visualize that a chain should be more “scrunched up” or “stretched out”.
And that type of “soft reasoning” is the processing power Fold.It seeks to capture. Fold.It created a game which literally depicts a “raw” protein chain in all its unfolded glory and asks human players to fold it. And, by deploying another unique characteristic of human beings, our competitiveness, the game encourages users to try to aim for the protein structure with the lowest free energy. The current aim is to see if the gift of human logic and competition is enough to solve complicated protein folding problems which currently require massive brute force calculations by supercomputers/distributed systems, and if so, if human 3D intuition can be “taught” to computers.
A quick overview of the game:
The novelty of this approach is striking. Interestingly, if Fold.It is successful, it will have done three very impressive (and very difficult) things:
- Successfully used crowdsourcing by pooling the wisdom of volunteers to solve problems which traditional brute-force computation finds nearly intractable
- Successfully use machine learning to copy the pooled wisdom of the volunteers to create smarter machines capable of solving the important protein folding questions which may underlie disease processes like cancer and Alzheimer’s
- Developed a new avenue with which to mobilize the public – by giving the public a tangible way to actively connect with and help an important scientific endeavor in a fun and easy-to-understand way
Check it out!