Bench Press

The Crossroads of Science and Tech

Distribute compute

View Comments

As the problems scientists solve become more and more complex, so do their demands for computational power. One approach to addressing this has been to build faster, more powerful computers, potentially with chips better suited to performing advanced calculations (like graphics cards or IBM’s Cell processor). But, this approach has serious limitations — mainly that it’s expensive to build and to maintain these supercomputers.

Some researchers, however, have turned to a radically different approach. Instead of building a bigger, better mousetrap to deal with more mice, the distributed computing approach takes the approach of placing many small, cheap mousetraps. The result is cheap "supercomputers" which are able to “pool” the computing power of many computers connected over a network.

This approach has been used by projects like Folding@Home and SETI@Home which are able to combine computing power from volunteers over the internet to do the number-crunching needed to simulate protein folding or scan deep space for extraterrestrial life. SETI@Home was the first such large-scale distributed computing platform. This platform, now the Berkeley Open Infrastructure for Network Computing (BOINC), is today used for many other distributed computing projects such as attempts to search for gravitational waves, do climate modeling, and simulate particle collisions in the Large Hadron Collider.

image

Folding@Home, a project started by the Pande group at Stanford to use distributed computing to study protein folding uses a similar approach, albeit with different underlying software (is it any wonder that a Stanford group doesn’t use Berkeley’s distributed computing platform?! :-D ) . It has probably been the most successful distributed computing approach to date, and, as a testament to the power of distributed computing, has become known as the first computing system to break the petaFLOPS barrier – e.g. capable of one quadrillion floating point calculations per second! This has enabled the team to do protein-folding simulations on a scale of ~10 micro-seconds.

But, as impressive as the science achieved by distributed computing projects is, what impresses me the most is that projects like Folding@Home and SETI@Home have defined some brilliant new ways to do science:

  • Use the internet – It’s a common theme on Bench Press, but with more and more people having faster and faster access to the internet, the potential for distributed computing becomes greater and greater. As Folding@Home demonstrated, such approaches can produce computing systems as powerful (or potentially more powerful) as leading supercomputer systems at a fraction of the cost.
  • Mobilize the public – We’ve discussed ways for the scientific community to reach out to the public like using social media and creating interactive applications/tools for the public to use, but efforts like Folding@Home illustrate a way to not only reach out to the public but to get them vested in science. In a world where high school science teachers find it difficult to get teens interested in science, initiatives like Folding@Home have created a system where teams of individuals compete on who can contribute the most to the effort! Instead of simply hoping that the public will continue to fund and listen, why not borrow a page from the many existing cancer-walk-a-thons and make it easy for the public to get involved?
  • Leverage new technology – It may not come as a surprise to our readers that a significant amount of the computational power at Folding@Home comes from graphics cards and Playstation 3’s. But, while many “mainstream” supercomputers ignored the new power afforded by these new chip types, Folding@Home developed software so that volunteers could quickly and easily use these powerful chips to boost their Folding@Home scores. The Folding@Home initiative also developed software to take advantage of innovations AMD and Intel included in their chips (new multi-core architectures and special instructions to speed up calculations). Is it any wonder, then, that Sony, NVIDIA, and AMD have all publically announced support for the initiative with their products?

image

I don’t pretend that every scientific problem is amenable to a distributed computing initiative, but to some extent, I believe that every scientific endeavor has something valuable to learn from the success of Folding@Home and SETI@Home and their brethren. To that end, I sincerely hope to see an open-source distributed computing architecture like BOINC but with:

  • Support for new chip technologies – To provide greater value to the scientific effort, the architecture should support new chip technologies like Intel’s SSE extensions, SMP, or stream processing
  • Client contribution tracking – To make it easier for volunteers to know how much they’ve contributed and/or have contests on how much they’ve contributed, a simple system to enable users/administrators to track the effort is needed
  • Better security – Medical initiatives and volunteer privacy concerns demand that very fine and specialized security controls are necessary. Support for sophisticated encryption and authentication are a must.
  • Linkage to social media – This probably seems extraneous, but since distributed computing efforts depend on motivated volunteers actively seeking out new volunteers, a successful architecture needs to make it easy for volunteers to share their progress with their friends whether it be via blog, or social network, or Twitter, or anything.
  • Tie-in with new cloud computing systems – Along the theme of cutting costs, it is reasonable to assume that as offerings like Google’s App Engine and Amazon’s EC2 and technologies like MapReduce become better developed, we will see cash-strapped research groups using the power of “Clouds” to hold their computing power – after all, what is distributed/grid computing other than a specific variant of cloud computing (de-localized, pooled computing)? It’s probably necessary, then, for the new distributed computing architecture to more easily link with EC2 or MapReduce or App Engine.

Anyone else have any thoughts?

(Image Credit – picture of the internet) (Image Credit – Folding@Home computing power)

  • Brandon Sos
    Don't remember how I stumbled on this blog, but that's pretty interesting. I recently started learning visualization techniques for gene expression data/gene networks/pathways etc, and was thinking o sweet! giant gene network! Though I realized these examples need a lot more computational power after reading the article. I remember in 03... or 04 using a distributing computing method by SETI. It was a screen saver that utilized CPU power when the screen saver went on. I uninstalled it though, seemed to slow down my comp to much after running a bit. Other then that I've seen a distributed computing program for brute force recovery of passwords haha

    If your curious here's a couple good progs for network visualization.
    http://cytoscape.org/
    http://www.ingenuity.com/
  • I would like to comment on BOINC portion. For the average person to not feel daunted by the process and make it as easy as checking your email I would recommend GridRepublic (www.gridrepublic.org). GridRepublic is a nonprofit working in collaboration with BOINC and is known as an account manager. They make using BOINC very simple by only having one log in to control all the projects you contribute to and help you find more projects and allow you to manage one to an unlimited amount of computer you own from a single website.

    Also as of December 17, 2008 BOINC has added GPU computing for CUDA enabled graphic cards.
  • Ben
    Jonathan, thank you for commenting. That is very interesting and a good resource to share. And I was not aware that BOINC had just begun support of CUDA. Will they be extending this to cover ATI's CTM/Brook/OpenCL (whatever standard it is they're using these days?) standard soon?
  • From my knowledge BOINC is in discussion of how to accommodate future additions of support of other processing types and my guess is they would be supporting AMD(ATI) GPU's and standards at the request or interest of scientists using/wanting to use BOINC and/or as they get more programmers contributing.

    They are always looking for more people to help code in new features, enhancements, and testing/debugging. See here for development projects: http://boinc.berkeley.edu/trac/wiki/DevProjects
  • Yeah I definitely agree that there are a lot of wasted CPU cycles out there. I think Folding@Home does some cool stuff to utilize this power, but there should be ways to leverage this effect even more.

    One interesting, albeit, kind of ridiculous, startup idea I read about a couple months ago was a startup that leverages Flash to do distributed computing when users play their games or watch their videos. Stuff like this is pretty exciting and interesting, there will definitely someone with a smart idea to start a killer business idea around this in the next couple of years.
  • If only we could harness the power of people utilizing social networking sites like Facebook/Myspace.
  • Ben
    That's a pretty awesome idea actually...
  • Lena
    There's the other very important factor:
    If you're going to be optimizing your calculations for PS3s, you need to have a bunch of them set up in the lab. Which means your grad students can use them for videogame tournaments. Good research comes from happy grad students. Happy grad students come from the Pande group...
  • Ben
    Do they really have PS3s in-house at the Pande group!?
  • Lena
    Yup. They're primarily running / displaying actual simulations. But there at least used to be a tradition of Friday evening gaming on them as well :-)
blog comments powered by Disqus