UC Santa Cruz helps address massive data demands from Large Hadron Collider as part of $25 million NSF project

Computer scientist Carlos Maltzahn will work with Princeton University on an NSF-funded project to establish the Institute for Research and Innovation in Software for High Energy Physics

LHC data visualization

A data visualization from a simulation of collision between two protons that will occur at the High-Luminosity Large Hadron Collider (HL-LHC). On average, up to 200 collisions will be visible in the collider's detectors at the same time. Shown here is a design for the Inner Tracker of the ATLAS detector, one of the hardware upgrades planned for the HL-LHC. (Credit: ATLAS Experiment © 2018 CERN)

Carlos Maltzahn, adjunct professor of computer science and engineering in the Baskin School of Engineering, will collaborate on a $25 million NSF-funded project led by Princeton University to establish the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP).

IRIS-HEP will address the unprecedented amount of data that will come from the High-Luminosity Large Hadron Collider (HL-LHC), the world's most powerful particle accelerator. When the HL-LHC reaches full capability in 2026, it will produce more than 1 billion particle collisions every second, from which only a small number will reveal new discoveries. The collider's increased luminosity will necessitate an increase in data processing and storage, including tools to capture, identify, and record relevant events and enable scientists to efficiently analyze the results.

“To fully explore this data, we need much more powerful software tools and algorithms. We also need to maximally exploit the evolving high-performance computing landscape and new tools like machine learning, in which computers study existing data sets to learn rules that they can apply to new data and new situations,” said Princeton University computational physicist and IRIS-HEP principal investigator Peter Elmer.

“Even now, physicists just can't store everything that the LHC produces,” said Bogdan Mihaila, the NSF program officer overseeing the IRIS-HEP award. “Sophisticated processing helps us decide what information to keep and analyze, but even those tools won't be able to process all of the data we will see in 2026. We have to get smarter and step up our game. That is what the new software institute is about.”

To address the challenges of HL-LHC data storage and processing, Maltzahn will explore new declarative data access interfaces that allow physicists to express what they want while hiding the mechanics of how this is accomplished.

“By expressing data needs in a language that makes sense to physicists, they can retrieve the specific information they want without wasting compute and storage resources scanning through unnecessary data to find what they’re looking for,” Maltzahn said.

This approach would remove the current burden on physicists of spending a large amount of their time learning how to efficiently store and retrieve data, and would allow them to spend more time on physics research.

The IRIS-HEP project will be co-funded within NSF by the Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering, and by the Division of Physics in the Mathematical and Physical Sciences Directorate.

In addition to UC Santa Cruz, the IRIS-HEP project will include participants from Cornell University, Indiana University, Massachusetts Institute of Technology, New York University, Princeton University, Stanford University, UC Berkeley, UC San Diego, University of Chicago, University of Cincinnati, University of Illinois at Urbana-Champaign, University of Michigan-Ann Arbor, University of Nebraska-Lincoln, University of Puerto Rico-Mayaguez, University of Washington, and University of Wisconsin-Madison.

Prior to being selected by Princeton to work on IRIS-HEP, Maltzahn pioneered Programmable Storage (with funding from NSF, DOE, and industry), which provides the foundation for declarative data access. Shortly after Maltzahn joined UC Santa Cruz, he became a key mentor of Sage Weil who created the successful open-source storage system Ceph as his Ph.D. project. In 2015 Maltzahn established the Center for Research in Open Source Software (CROSS) at UC Santa Cruz with the support of a one-time donation by Sage Weil and yearly memberships by Toshiba, Micron, Seagate, Western Digital, Huawei, and Samsung.

As technology continues to advance at increasing speed and scientific communities rush to keep up, Maltzahn’s declarative data access research will become increasingly relevant. “The future will have many more architectural disruptions, and we need to have a much more declarative way of accessing data if we’re going to keep up,” he said.