Bench Press

The Crossroads of Science and Tech

Archive for the ‘crowdsourcing’ tag

Phylo

with one comment

A few years ago, I blogged about an ingenious crowdsourced game called Fold.It. The concept was pretty simple:

  • Use human intuition to help solve complicated three-dimensional protein folding challenges which is oftentimes as effective but significantly faster & cheaper than computational algorithms
  • Pool together lots of human volunteers
  • Turn the whole experience into a game to get more volunteers to spend more time

The result was a nifty little game which contributed findings which have made it, to date, into a number of peer-reviewed publications (see PNAS paper here and Nature Structure & Molecular Biology paper here)!

Well some researchers at McGill University in Canada want to take a page out of this playbook with a game they built called Phylo (HT: MedGadget) to help deal with another challenging issue in bioinformatics: multiple sequence alignment. In a nutshell, to better understand DNA and how it impacts life, we need to see how stretches of DNA line up with one another. Now, computers are extremely good at taking care of this problem for short stretches of DNA and for “roughly” aligning longer stretches of DNA – but its fairly difficult and costly to do it accurately for long stretches using computer algorithms.

People, however, are curiously intuitive about patterns and shapes. So, the researchers turned the multiple sequence alignment problem into a puzzle game they’ve called Phylo (see image below) where the goal is to line up multiple colored blocks. Players tackle the individual puzzles (in a browser or even on their mobile phone) and the researchers aggregate all of this into improved sequence alignments which help them better understand the underlying genetics of disease.

image

And how has it been doing? According to the McGill University press release:

So far, it has been working very well. Since the game was launched in November 2010, the researchers have received more than 350,000 solutions to alignment sequence problems. “Phylo has contributed to improving our understanding of the regulation of 521 genes involved in a variety of diseases. It also confirms that difficult computational problems can be embedded in a casual game that can easily be played by people without any scientific training,” Waldispuhl said. “What we’re doing here is different from classical citizen science approaches. We aren’t substituting humans for computers or asking them to compete with the machines. They are working together. It’s a synergy of humans and machines that helps to solve one of the most fundamental biological problems.

With the new games and platforms, the researchers are hoping to encourage even more gamers to join the fun and contribute to a better understanding of genetically-based diseases at the same time.

Try it out – I have to admit I’m not especially good with puzzle games, so I haven’t been doing particularly well, but the researchers have done a pretty good job with the design of the game (esp. relative to many other academic-inspired gaming programs that I’ve seen) – and who knows, you might be a key contributor to the next big drug treatment!

Written by ben

December 24th, 2011 at 3:46 pm

What to Do as Science Gets Older and More Crowded

without comments

A recent NBER paper (gated) by Benjamin Jones from Northwestern conducts a systematic review of trends in scientific research and made a couple of conclusions that won’t come as a surprise to anyone in science (HT: Inside Higher Ed):

As science advances and knowledge accumulates, ensuing generations of innovators spend longer in training and become more narrowly expert, shifting key innovations (i) later in the life cycle and (ii) from solo researchers toward teams

As evidence to this, the average age at which a scientist made a discovery which later qualified for a Nobel prize has increased by 6 years over the course of the 20th century. When looking at publications, the researchers found that the average author list on a publication grew, on average, by 15-20% per decade!

We’ve discussed before the “decline of the Lone Ranger model of science”, but Jones’ paper focuses on looking at the policy implications for such a change. He concludes that the government (and, probably, the academic and private institutions which support researchers) need to adapt policy to reflect this new reality by:

  • Tailoring funding and messaging to help keep young researchers interested despite the longer and more difficult training period
  • Finding new ways to evaluate the worthiness of proposals as scientist’s expertise becomes more and more specialized
  • Altering incentive structures as the team of collaborators replaces the Lone Ranger scientist model of discovery

These policy suggestions are definitely good ones, and are certainly necessary to adapt to a new scientific environment, but one dimension of this which Jones doesn’t discuss as much are the technological (the focus of this blog!) innovations which can help further research in this brave new world.

  • Improving science communication with the public. We’ve made multiple mentions of this in the past, but they are no less true here. Active public communications management not only helps secure funding and raise public awareness of the good scientists can do, but it also helps attract the interest of future generations of researchers and policymakers.
  • Embracing new collaboration tools. To really kick-start collaboration between scientists across geographies and specialties, we need tools that go beyond just email and fax machines. Tools like Google Wave, wikis, distributed version control, and social media forums like Friendfeed are an early taste of the sort of live collaboration that new web technology can bring about.
  • Leveraging cloud computing and heterogeneous compute. One of the reasons discoveries are taking longer and are more expensive is that there is so much more data to collect and to analyze than before. One technological innovation which we’ve talked about at lengths here is the ability of graphics cards/GPUs to make supercomputer-level processing power more readily accessible to research labs. Another is the use of new cloud computing services like Amazon’s to rapidly increase the computational resources that a lab/company has access to. Neither are panaceas for all the data analysis issues which scientists face, but they are definitely ways to make things easier for research groups who have stringent IT budgets.
  • Using crowdsourcing to speed innovation. Who says research has to take longer and be more expensive? Perhaps its time to pull on new technological levers which let scientists borrow on the resources and brains of a wider group of people. While new platforms like ChemBioConnect, distributed computing systems like Folding@Home, and volunteer crowdsourcing initiatives like Fold.IT are far from perfect, they hint at a future where researchers can call on resources beyond what their personal computers and brains are capable of.
  • Building new research attribution models. When I say new attribution models, I’m referring to two things. The first is embodied by new standards like ORCID which make it easier to understand which person is the author/researcher in question (something which will become more and more important as more people with the same initials/names enter the sciences). The second, and more substantive, is finding new ways to understand who contributed what to a particular study. In today’s digital age, I find it laughable that we still rely on simple author list order to determine the relative roles and positions of the researchers listed on a publication. Employing metadata and other graphical cues can help scientists achieve the recognition they deserve, as well as provide appropriate incentives for teams of researchers to contribute.
  • Contributing negative and after-publication results to open repository. While I can understand the hesitation for most research groups to pursue a pure open access strategy, those concerns should not hold with negative or post-publication experimental data. While opening up access to data from failed/negative experiments does little to hurt a lab’s ability to publish first, it can be a dramatic boon for other research groups (especially new labs or labs with interdisciplinary focuses) who can not only use the data for their own analyses and experimental designs, but avoid committing resources to experiments which have already been conducted. If it can work for biotechs and pharma companies, then there’s no reason it should be any different for non-commercial groups.

These suggestions only scratch the surface of what new technologies and policies can do to help scientists in a world where scientific training takes longer and where scientific discoveries need to be more collaborative. If anyone else has any other suggestions, feel free to leave them in the comments!

A Grand Experiment

with one comment

Here at Bench Press we’re always interested in new initiatives that harness the advantages of the internet. We’ve covered various powerful distributive computing initiatives as well as breakthrough collaborative endeavors in scientific research. So I was intrigued when I saw buzz on Twitter about the Obama administration’s attempt to crowd source suggestions for scientific policy.

Through the American Association for the Advancement of Science (AAAS) and associated non-profit Expert Labs, the Obama administration wants to hear what grand challenges scientists envision taking on.

Expert Labs has a nice video explaining the reasoning behind this grand experiment in policy crowd sourcing.

After a quick search on Twitter I’m a bit curious as to how Expert Labs plans to parse all the data they’re going to get from this call to arms, but I’m optimistic that some interesting insights can be gleaned as to the opinions of Americans on the directions science should be headed in. More data never hurt right? If you’re interested in submitting an idea follow the directions here, you’ve got until April 15th!

Written by Anthony

April 14th, 2010 at 3:15 am

ChemBioDrawCrowdsource

without comments

One challenge with getting scientists to collaborate over the internet is the difficulty of representing scientific data in a way that can be readily manipulated and analyzed. Take Chemistry for an example. How does one share information about pathways and chemical structures in a way which allows for an entire group to collaborate on particular problems (e.g. synthesis pathways)?

Imaginitik, a startup specializing in software to help companies and institutions use crowdsourcing, and partially funded by Pfizer, has one such idea (HT: VentureBeat). Most scientists who are working or have worked in chemistry or biology are familiar with software company CambridgeSoft’s scientific software products like ChemDraw or BioDraw. What Imaginitik did was combine CambridgeSoft’s software with the collaborative features of Imaginitik’s Idea Central software to create ChemBioConnect, a crowdsourcing platform for a company or institution to deploy.

The idea is pretty simple. Imagitinik’s Idea Central platform creates web portal where scientists and management can list topics that can benefit from a multi-person collaborative approach and organize responses/suggestions/workflow and to rate individual ideas and contributions. But what differentiates ChemBioConnect from other life sciences collaboration solutions or more generic crowdsourcing platforms is integration with ChemBioDraw’s interface which provides more features than a standard collaboration platform (which will only let you share pictures/text) and a more familiar and robust user-interface than other life sciences-targeted solutions. Interestingly, Imagitinik’s platform also allows the creation of personality profiles (e.g. “creative” or “inquisitive”) to better help scientists network and target the right set of people to solve these problems. Not surprisingly, Imagitinik’s funder Pfizer has been rolling out this solution since Spring 2009!

A poorly scripted demo video is below (I personally think the speaker focuses too much time on basic ChemDraw functionality and less time on how this ties together with the collaborative features for my taste):

I, unfortunately, haven’t had the chance to actually try out the software (although reasonable pricing for enterprise software, I don’t have $50,000 – $500,000 to shell out to evaluate the software), but I think this is a great look into what a prototype for scientific collaborative software:

  • Web-based: The need for ease of access across many machines and locations and the need for a central repository with which to organize a group’s information generally means that collaborative platforms should be web-based or, if not, sufficiently web-like as to not be an issue.
  • Social networking features: It doesn’t have to be a full-fledged version of Facebook or MySpace, but a collaborative tool should encourage its users to network with one another and allow people to show off what projects they’ve contributed to. Not doing this fails to create the sense of community and personal attachment that crowdsourcing/community collaboration need
  • Integration with existing tools: It’s a sad fact of life that inertia is a big factor when people are deciding whether or not to use something. But it’s a fact nevertheless. The best way to encourage quality adoption is to make sure that tools that are commonly used by the target user base tie in nicely for two reasons. First, new users won’t have to learn a new set of techniques, interfaces, and processes to adopt. And secondly, the tools that currently exist oftentimes support features that are harder to develop and more useful than developers of new platforms would like to admit. Sure, lots of people (including this humble commentator) have bashed ChemDraw as clunky and awkward, but someone developing a chemistry crowdsourcing platform is likely to skimp on things like NMR-simulation or smooth rotation of a structure.
  • Managed workflow: Collaboration, even face-to-face, can be very difficult because information and suggestions and ideas are not organized effectively. It’s not enough to let people share their information and insights. You have to organize them and create tools with which to evaluate and encourage action on them.

As I haven’t actually put my hands on the software, I’m not sure if ChemBioConnect already supports these, but there are two additional features that I’d strongly suggest a collaboration platform to have:

  • Easy way to export work: Too often, developers of a platform or tool forget that there is a world beyond their innovations. This is especially true when people are testing out a piece of software for the first time – it’s important that they can quickly move a piece of work off the tool to integrate with the rest of their work schedule, whether it be in printed form, in the form of a presentation, on a PDF, in web page/HTML form, or even just as a industry file format to share with another. Going the extra mile to make this easy makes it easier for someone to try out your software as well as provides a valuable service that just may win an adopter over.
  • Semantics: This is harder to describe, but many web-based tools are very rigid, requiring a user to identify exactly what they want to do and figure out what part of the website is best suited for that particular type of work. Better, instead, to apply semantics/language processing to figure this out for the user. One example of a product that has done this is Google Calendar. Instead of requiring a user to try to figure out which fields correspond to what data when trying to create a calendar entry, a user can simply enter “Lunch with Jenny at Chez Carla on Sept 9, 2009 from 9 PM to 11 PM”. Google will decode the string and fill in the appropriate data. This feature is especially powerful for a collaborative tool where a user doesn’t want to have to figure out if something is a “task” or an “event” or an “idea” and doesn’t want to have to memorize what each of the tool’s special quirks and vocabulary are.

Does anyone else have any thoughts on ChemBioConnect or on other principles of good collaborative tool design?

Written by ben

August 10th, 2009 at 7:00 am

Playing the crowd

with 3 comments

We’ve written before about the ability of scientists to use distributed computing to pool the computing power of millions of users over the internet to solve sophisticated mathematical problems. But imagine if we could actually pool the brainpower of volunteers — but in a way which doesn’t involve jacking our brains into the Matrix.

Now, imagine if it could be fun for the volunteers.

Imagine no longer. Fold.It was created less than a year ago at the University of Washington to do just that. Instead of pooling the computational power of millions of machines, it seeks to pool the “human intuition” of volunteers to solve challenging protein folding problems.

image The basic scientific concept behind Fold.It is that nature will “push” chains of amino acids to adopt a folded structure which minimizes free energy. But, while free energy calculations can be done relatively easily, finding the structure that minimizes free energy is not so easy to do and requires immense computational power (which is why Folding@Home uses distributed computing).

But, humans have a gift which computers do not: the gift of intuition. While we may not be able to compute the free energies in our head, we have the ability to make logical jumps and do complex reasoning. While we might not necessarily understand how to calculate the strength of a hydrophobic interaction, we know enough that we should place two hydrophobic (non-polar) leucine amino acids near one another. While we may not be able to write a mathematical equation to describe the arc of a polypeptide chain, we can conceptualize and visualize that a chain should be more “scrunched up” or “stretched out”.

And that type of “soft reasoning” is the processing power Fold.It seeks to capture. Fold.It created a game which literally depicts a “raw” protein chain in all its unfolded glory and asks human players to fold it. And, by deploying another unique characteristic of human beings, our competitiveness, the game encourages users to try to aim for the protein structure with the lowest free energy. The current aim is to see if the gift of human logic and competition is enough to solve complicated protein folding problems which currently require massive brute force calculations by supercomputers/distributed systems, and if so, if human 3D intuition can be “taught” to computers.

A quick overview of the game:

 

The novelty of this approach is striking. Interestingly, if Fold.It is successful, it will have done three very impressive (and very difficult) things:

  • Successfully used crowdsourcing by pooling the wisdom of volunteers to solve problems which traditional brute-force computation finds nearly intractable
  • Successfully use machine learning to copy the pooled wisdom of the volunteers to create smarter machines capable of solving the important protein folding questions which may underlie disease processes like cancer and Alzheimer’s
  • Developed a new avenue with which to mobilize the public – by giving the public a tangible way to actively connect with and help an important scientific endeavor in a fun and easy-to-understand way

Check it out!

Written by ben

February 10th, 2009 at 5:00 am