Frank Würthwein, physicist

When an upgraded proton smasher gushes data, this physicist will help colleagues find new particles in the torrent. Interview by Chris Cesare

April 07, 2015

Physicist Frank Würthwein stands in front of a San Diego supercomputer, part of the shared network that makes up the Open Science Grid. Credit: Ben Tolo/SDSC
Violent collisions at the world’s largest particle accelerator will soon produce as much as 10 petabytes of data every second—the equivalent of about 10 million hours of high-definition video. Scientists store only a fraction of the deluge and crunch the remaining flood on powerful supercomputers, looking for hints of new particles.

Frank Würthwein, a physicist at the University of California, San Diego, is among the thousands of particle hunters that scrutinize the output of the Large Hadron Collider (LHC) in Geneva, Switzerland. He is also executive director of the Open Science Grid, a consortium of universities and national labs that share their computers—a paradigm known as distributed computing. Scientists from the LHC collaboration are the grid's biggest customers, but biologists who study protein structure and economists who forecast markets also bring mountains of data.

The LHC has been offline since 2013 to receive an upgrade. Soon, it will hurl protons together harder and more frequently. Würthwein will scour the new data for signs of dark matter, the mysterious and invisible gravitational glue that binds together clusters of galaxies. He’ll also continue to build the networks of trust that let scientists share clusters of computers.

Würthwein spoke about the challenges and opportunities for distributed computing at the February 2015 meeting of the American Association for the Advancement of Science in San Jose. He sat down with SciCom’s Chris Cesare after his talk to discuss what’s in store for the LHC restart.

Does the massive amount of data or the complexity of the LHC give you a sense of wonderment?

Yes, definitely! The detectors are technological jewels of gigantic proportions. The volume of data is awe-inspiring. The last comparable step-up in energy and beam brightness was when I was still in high school. Participating in the LHC is literally a once-in-a-lifetime opportunity. Not all people are born at the right time to make the most of such an opportunity.

Will all that data be stored in Switzerland?

Let's start at the collision. I have a 100-million-pixel camera. If you do the numbers you get something like 10 petabytes per second of information. It's insane, right? When the genomics people tell me of their explosion of data volume, I always chuckle because we're so far post-explosion it's not funny.

But that's sort of a nonsense number. It gets reduced to something like a gigabyte per second, or a little less. Then it magically arrives at my computing center in San Diego, or close to 100 other places like it worldwide, and scientists can go to town analyzing the data. At that level, everybody can do whatever they please in terms of their scientific objectives. The full creativity of the global collaboration gets let loose.

Could you walk me through what you mean by distributed computing?

Think about three layers of distributed computing. When you open your laptop, it has multiple cores in the CPU [central processing unit]. That’s the first layer—the fact that you can run multiple applications at the same time without having them interfere with one another.

Now, imagine putting a whole bunch of these laptops in a room. They’re still just laptops, but they’re packaged differently, like pizza boxes. These boxes are the height of your thumb or so. In my computer room there’s 16 racks, and each rack has about 42 pizza boxes. You have to use software to make them all work in unison on some problem. That’s what we call a cluster.

The third layer of distributed computing is clusters all over the planet. My vision is that every single cluster at every university in the country allows sharing via a national cyber infrastructure. If we could ever accomplish this, it would be an enormously powerful resource for science.

Where does the Open Science Grid fit in?

The main bread and butter is creating the sharing infrastructure so that different universities can offer each other access to the clusters they operate. If I own a cluster, I should have full control over who’s allowed and when they are allowed to use my resources. At the same time I should have learned in kindergarten or in preschool that sharing is a powerful thing. I should be allowed to share, and it should be easy for me to share. Then, the moment I want it back, I should get it back. That's the business of the Open Science Grid.

"I should have learned in kindergarten or in preschool that sharing is a powerful thing. That's the business of the Open Science Grid."

What percentage of university clusters are connected?

I would guess we are somewhere between 1 and 10 percent today.

Are there obstructions? Do the funding agencies say you can’t share?

No, no. Not so much. The obstruction is that some people haven't learned in kindergarten how to share. And there are lots of technical issues that make it hard. Sharing is ultimately all about trust, but if I had to vet every single person I want to share with that would be unmanageable. I don't want to share with Joe Blow or Jane Doe. I want to share with friends. I want to have a few entities that I trust and that trust Jane and Joe. Then Jane and Joe can share my resources. That’s the kind of structure we’ve created in the Open Science Grid. It is our contribution to science.

How are those networks of trust built?

Science splits up in different domains of inquiry. There’s the domain of genomics, the domain of protein structure, the domain of astrophysics, the domain of particle physics. Different scientists who do similar things go to the federal agencies and say, “We need an organization that supports us in distributed computing.” The agency then employs a small team that takes care of all the heavy lifting. That small team works with the Open Science Grid to provide transparent access to the shared national infrastructure.

Your talk at AAAS ranged from physics to viruses. Besides LHC scientists, who else uses the Open Science Grid?

The Protein Data Bank has a repository in San Diego, and they are a customer. They took all of their protein structures and compared each protein to every other protein.

What do those comparisons tell them?

It allows them to say, “If you’re interested in this protein, maybe you want to look at that protein because they differ only by this much.” I’m told this is a crucial thing in understanding structure and function.

Are you more passionate about your research or your position at the Open Science Grid?

I’ve oscillated back and forth. I like the idea that I can have both. When you're an experimentalist, you don’t just do experiments. You develop tools. I love the idea that the tools I develop for my science are actually useful and relevant for others. I ask people to cite in the acknowledgments whenever the infrastructure that my group helped create has been useful. It gives me a great thrill to see the diversity of journals that I’m cited in.

That’s cool.

That’s just cool. I see I’m a toolmaker and my tools are useful. My hammer is actually useful to hammer many nails.

You got your Ph.D. at Cornell studying high-energy physics. What are you working on now?

It’s all been LHC physics. My group was part of the Higgs [boson] discovery. Most of the other things we’ve done revolve around the idea of creating dark matter in the laboratory. We know that dark matter exists, and it’s copiously around us. If we can find a way to produce it in the laboratory, we can reproduce it and figure out how it’s related to other things.

So you’re interested in creating dark matter in the lab. What theory allows for that?


What is supersymmetry?

Quantum mechanics distinguishes two types of particles, bosons and fermions. For some reason nature uses bosons as mediators of forces, and fermions as constituents of matter. The force of electricity and magnetism acts via the exchange of bosons, while the stuff it interacts with—atoms, molecules—is all made out of fermions. Supersymmetry postulates that for every boson there is a fermionic partner. It symmetrizes the concepts of force and matter in some sense.

You can think of this as an analogy to the famous E = mc2 of Einstein that relates matter and energy. With supersymmetry we relate matter and force at some very fundamental level.

What’s your take on supersymmetry as a theory?

The predictions are so rich. It allows me to express any search that's experimentally viable in terms of a search for supersymmetry.

Can you give me an example?

We’re colliding protons on protons, and we have this detector. At the collision all hell breaks loose and junk flies in all directions. When you sum it all up, you should have momentum conservation in the transverse plane [directions at right angles to the protons’ paths]. Any imbalance tells you something weird has happened. Supersymmetry gives you predictions for these kinds of weird phenomena.

And I can turn that around. When I look for this and I don't find it, I can express it as a constraint on supersymmetry. It's this process that makes supersymmetry attractive.

A lot of physicists like supersymmetry because they say the math is beautiful.

Yeah, and I couldn't care less.

You see it more as a big tent.

Exactly. It's a huge tent that provides me a legitimacy to search for anything I want. As an experimentalist, I’m ultimately in the business of trying to find new things at the LHC. If I find something, I'm made. [Brushes hands clean.] I'm done. But if I don't find something, I still have to make a career out of it. I have to have a legitimacy that it was worth looking in this direction, and supersymmetry can provide it.

Say the LHC comes on in March, and it’s working well but it doesn’t produce any new particles. What happens to supersymmetry? When do you give it up?

Probably never. Some fraction of the theoretical community will never give up. They are driven by the beauty on some level. But they will give up on supersymmetry as the explanatory framework in the regime that we can probe today.

The LHC ran at lower energies from 2008 to 2013. What should we be looking for over the next few years?

There is an expectation that the LHC will show us radical new features. As of right now, it looks like we’ve seen a phase transition—we’ve seen the Higgs—but we have not seen the complete elucidation of the physics that makes it all hang together. There is a little bit of puzzlement. If the LHC doesn’t find anything other than the Higgs, then that puzzlement will turn into a theoretical crisis. People will be saying “Huh? How can this be?”

© 2015 Chris Cesare. Sift through data about SciCom graduate student Chris Cesare at