Complex, computational work with huge sets of data is now common practice in fields such as genomics, economics, and astrophysics. Researchers in these and similarly data intensive fields depend on their computers to access and move data around while also storing backup copies elsewhere in case one device crashes. Subfields of computer science and engineering are devoted to making this possible through networking multiple devices together. But when something goes wrong on one of these networked devices, it can create problems too complex for those without a speciality in programming distributed systems.
One solution to this problem is distributed shared memory (DSM), where a computer is programmed to locate any required data on the computer or elsewhere by following a “pointer,” which can refer to data on a different computer.
Past efforts to create this “holy grail” solution – a robust and efficient DSM system – have failed, but UC Santa Cruz Associate Professor of Computer Science and Engineering Peter Alvaro believes that advancements in technology mean that the time is right to revisit this idea. With the support of a new NSF grant, Alvaro is embarking on a project, called Memory at Scale On Networks (MaSON), to achieve a bold vision for a new operating system and network, each designed to best serve the other, to support an overall model for programming big data systems as if all of the data fit in one computer's memory. This system would make executing programs for scientific researchers faster and more robust.
“There's all these ways in which the idea of what memory is is expanding,” Alvaro said. “There's this opportunity to say, ‘Is now the right time to achieve that holy grail?’ The goal is allowing experts in a domain to focus on their data and use their tools, without having to manage the supporting technology when a research problem arises that far outstrips what their one computer could do.”
The new operating system, called Twizzler, is already well underway, spearheaded by Alvaro’s Ph.D. student Daniel Bittman. Alvaro’s group will further develop hardware and software needed to make this vision a reality.
Importantly, in the new system, the context for interpreting the data is paired with the data itself, rather than paired with the relatively short-lived computational process that manipulates it. Alvaro says this is vastly different from current programming models. This context would eliminate some of the current complications of distributed networks such as when and how much data to cache, or store as a backup, and when and how much data to prefetch, or preemptively gather in anticipation that it could be needed.
“Up until now, we've lived in what you might call a compute-centric world, dominated by processes, and in that world data is very much a second-class citizen,” Alvaro said. "The MaSON project completely turns that idea on its head. In the data-centric world that we envision, information is the primary citizen, outlasting any computation that may interact with it. Processes (and for that matter, computers) are just temporary places where data may live and change. If successful, this project will change the way we think about programming giant-scale systems."
Initially, Alvaro and his students hope to create an “appliance” that will enable the connection of a few dozen computers to operate as one. Eventually he hopes to employ MaSON in a larger data center setting.
Alvaro will collaborate on this project with Robert Soulé, an associate professor of Computer science and electrical engineering at Yale University who works on robust memory on programmable networks. Alvaro plans for this project to take his lab group, which has grown significantly in past years, in a new direction that focuses more on this longer-term project and researching operating systems more broadly. Alvaro has now raised more than $2.5 million in combined funding from the NSF, the Defense Advanced Research Projects Agency (DARPA), and Intel in the past two years to support this research.