Hybrid Cloud Enables CERN’s Breakthrough Research

Rob Johnson
6 min readJun 10, 2019

--

Innovative, scalable technology infrastructure unlocks the mysteries of our universe

CERN’s Large Hadron Collider (LHC) — Photo credit: CERN Virtual Tour of the LHC

In the distant past, scientists identified protons, neutrons, and electrons as basic building blocks for atoms. However, progress in modern particle physics has unveiled a much more complicated story. For many years, the European Organization for Nuclear Research, commonly known as CERN, has conducted groundbreaking scientific work to understand the composition of matter and energy better. Innovative approaches at CERN have uncovered the existence of much smaller components, including their profound confirmation of the Higgs boson. Their hard work continues leading modern science toward deeper insights into the nature of our universe.

Ironically, CERN’s quest for understanding these exceedingly-tiny particles requires breathtakingly-enormous hardware. Their aptly-named Large Hadron Collider (LHC), near Geneva, is the most immense and powerful particle accelerator on the planet. The LHC’s 27-kilometer ring lined with superconducting electromagnets hones two particle beams which travel in opposite directions, at a temperature near absolute zero, and a pace nearing the speed of light. When particles within the beams collide with each other, the super-speed impacts break them into their smallest, elementary components. Highly sensitive detectors track these collisions and generate data for further evaluation by CERN researchers.

To provide CERN scientists with the insights they need from collisions, a massive compute infrastructure must offer the capability to capture tremendous volumes of real-time data. That job falls to experts like Tim Bell, Compute and Monitoring Group Leader at CERN. Noted Bell, “CERN is a publicly-funded physics lab. My job is ensuring our scientists have the compute resources they need for breakthrough research and implement them within a finite budget.”

Years ago, the effort behind CERN’s first collision experiments was dubbed Run One. Between Run One and 2017’s Run Two resided an interim period Bell refers to as The Long Shutdown. “Ironically, my team’s busiest time is between the data-taking phases. During Long Shutdowns between LHC runs, we can make major transformations to our IT infrastructure, so we are prepared for data taking during the accelerator’s next, more complex run.” Continued Bell, “Experiments using accelerator detectors can create a petabyte of data each second. While filtering decreases that data volume before storage in our data center, we still require a system capable of managing and storing tens of gigabytes each second.” Gathering and storing all that data creates unique challenges for the CERN team. While storage speed represents a persistent concern, they must also manage their budget carefully. CERN datacenters on-site employ solid state drives (SSD) alongside cheaper hard disk drives to derive the greatest value from each technology and optimize the tenuous balance between storage rate and cost-per-gigabyte. In preparation for Run Two in 2017, Bell’s team implemented several system enhancements. CERN’s compute infrastructure ties together the compute power of 15,000 servers with a total of 230,000 Intel Xeon processor cores from different generations — including the latest Intel® Xeon® Scalable processors — among CERN’s data centers in Switzerland and Budapest. According to Bell, this major rollout benefitted from OpenStack APIs which replaced extensive hands-on work with large-scale automation.

As the volume of research data increases, so does the process of analysis, modeling, and simulation. Therefore, CERN’s compute and storage need to expand along with it. While their on-premise data center infrastructure is needed to capture real-time data from particle collisions, the CERN team increasingly leans to hybrid cloud infrastructure to supplement it. After the data filtering process using on-premise compute power, cloud solutions prove highly adept at reconstructing the collected data and performing detailed simulations. In doing so, the hybrid cloud alleviates on-site compute demands and offers extended scale. The team no longer needs to undergo an on-premise hardware re-provisioning process for additional compute power. By using cloud-based virtual machines, the process of distributing workloads is more straightforward and much more elastic.

Bob Jones, Leader of Science Cloud Initiative for CERN, has focused extensive effort on selecting a robust hybrid solution to meet CERN’s ever-expanding needs for computing power. His team initiated the search for in ideal CSP through an extensive request for proposal (RFP) process. Each responding CSP needed to offer a cloud infrastructure design to accommodate criteria like scale, flexibility, security, and virtual machine (VM) capability. Another important criterium for the proposals is what Jones refers to as “transparent data access,” meaning that researchers using the cloud interface could access the same data that they could with an on-premise solution. Doing so also necessitates federated identity and single sign-on support to simplify researchers’ access to cloud-hosted data using a common web interface. While the hybrid cloud is specified to serve CERN’s needs, the same cloud solution will support nine other European groups focused on other scientific disciplines like life sciences and astronomy. As Jones puts it, “We are focused on the ‘long tail’ of science. While CERN’s research will surely benefit from the hybrid cloud, we want to help other research groups to experience its benefits as well.”

The RFP’s Cloud Service Provider (CSP) finalists are in the process of prototyping, testing, and refining their respective hybrid cloud solutions for use in coming research. While each CSP finalist offers large-scale compute infrastructure, bandwidth between CSPs and CERN facility has practical limits. For this reason, data resulting from real-time LHC runs is captured on-premise first and then transferred to the cloud for further modeling and analysis.

For their on-premise infrastructure, the CERN team chose OpenStack software solutions for key reasons. First, OpenStack provides the proven flexibility and scale CERN needs to enable their massive data centers. Secondly, the research community’s embracement of OpenStack leads to valuable shared insights across multiple scientific disciplines. Bell described how collaboration with others furthers CERN’s work. “It is wonderful working with the supportive, open source community and talking to others who lead the journey of IT transformation. We regularly discuss our work with other scientific institutions that have implemented large-scale hybrid cloud solutions. In doing so, we can share our best practices, and we can also learn from others using OpenStack to improve our processes.” Bell’s work with the cloud-focused IT team supporting the Square Kilometer Array telescope project provides an example. “While our teams support different types of research, we face many of the same challenges. We work together to find common open source solutions supporting cloud infrastructure for science.”

For those institutions undertaking large-scale IT deployment, Bell offers additional perspective. “Working through cultural changes alongside the technology changes are factors we advise other organizations to consider. That cultural change is as big — if not a bigger challenge — than the technology side. While cloud-based, software-defined infrastructure offers usage flexibility on its own, the IT staff must have training for just-in-time deployments, and development approaches like Agile and Scrum Methodology.”

While the innovative use of software, hardware, and hybrid cloud solutions meet the needs of CERN scientists today, Bell, Jones, and their teams can never rest on their laurels. They face new challenges ahead in preparation to support forthcoming LHC Run Three and Run Four. The CERN IT team plans to double compute capacity for Run Three by 2021, assisted onsite by the latest generation of Intel Xeon processors, Intel storage technologies, and additional software optimizations. Run Four in 2025 — for which the CERN team hopes to boost its compute capacity by a full order of magnitude — will require another leap in technology to extend their overall capacity. Combining this on-premise technology with cloud infrastructure allows the team to tap the strengths of each for a holistic, optimal, and cost-effective solution.

Ultimately, CERN’s hybrid solution will help unlock even more secrets of our universe. Looking toward the future, Bell speaks with optimism. “Our team continually seeks ways to improve on our current infrastructure at CERN, whether that means implementing the latest technologies for greater efficiency, optimizing applications, and learning from others’ successes. Every day, it is our job to make sure that IT does not limit physics!”

# # #

This article was produced as part of Intel’s HPC editorial program, highlighting cutting-edge science, research, and innovation driven by the HPC community through advanced technology.

--

--

Rob Johnson

Marketing & public relations consultant, freelance writer, technology & gadget geek, TONEAudio gear reviewer, hopeful artist, foodie, and fan of the outdoors.