Skip to main content

As the educational landscape rapidly shifts toward data-driven and computational learning, one expert’s unconventional journey into distributed computing is helping shape the future of STEM education. Frank Wuerthwein, Ph.D., Director of the San Diego Supercomputer Center (SDSC), Executive Director of the Open Science Grid (OSG), Professor of Physics and Data Science at UC San Diego, describes himself as an experimental particle physicist by training and unexpectedly found his way into large-scale distributed computing more than 25 years ago.

This shift in focus has been both deliberate and deeply practical. Around three years ago, Wuerthwein noticed a growing and largely unmet need, not just in research institutions, but across the wider landscape of higher education. “Among the nearly 4,000 accredited institutions of higher learning in the U.S., fewer than 200 are classified as research-dominant (R1),” says Wuerthwein. “The 3,800 that are not R1s have education as a much stronger focus than research.”
Wuerthwein also observed how STEM education was evolving from chalkboards to cloud infrastructure. “More and more STEM education requires the students to at least have a laptop, and sometimes even that is not sufficient,” shares Wuerthwein. “They use their laptop as an entry point to Jupyter through institutional compute infrastructure. And in the age of AI and large language models (LLMs), this dependency has only grown. AI is fundamentally an experimental science, and educating the next generation in this field requires not just theory, but hands-on experience with both data and computing infrastructure.”

Wuerthwein says many institutions are undergoing a radical shift in their needs, particularly those that have traditionally not required large-scale computing systems, but now find such infrastructure essential to modern STEM education. “My work in distributed computing is following this shift, which presents both challenges and opportunities. It creates an innovation space for people like me—one that serves the country in ways nothing else I’ve done in my career ever has. It’s an exciting time to be in this field.”

 “Academia has an opportunity to provide value to industry in ways that go beyond just educating people. If we align workforce development with real-world challenges, then the people we educate will be that much more valuable and effective in industry roles. There’s a partnership model waiting to be built—one where research, education, and industry innovation all feed each other. In the years ahead, the financial footing of academia can’t rely solely on tuition, federal and state funding, or philanthropy. We need a new model where industry directly funds collaborative problem-solving and, in turn, derives real value. That’s how we ensure that academic research and education remain not only relevant, but essential to society’s future.”

Frank Wuerthwein, Ph.D.,
Director of the San Diego Supercomputer Center (SDSC), Executive Director of the Open Science Grid (OSG), Professor of Physics and Data Science, UC San Diego

Wurthwein_Headshot

Supporting Social Mobility

When it comes to the future of education and the national workforce, Wuerthwein notes that the spotlight shouldn’t just be on elite research universities. “There are 20 million students in post-secondary education in the U.S. alone. Looking at California specifically, over 2 million students are enrolled in California’s community colleges and serve more than twice as many students as the California State University (CSU) and University of California (UC) systems combined. When you add up private institutions like Caltech, Stanford, and others, they barely register in comparison to these numbers.”

This reality, Wuerthwein says, is crucial for understanding where educational investments in data and computing infrastructure should be made. “The majority of higher education students and the future of the workforce are in community colleges. And if they all need to support data and compute infrastructure to teach generative AI, that presents a significant and urgent challenge.”

In California, a well-designed educational ecosystem connects public high schools, community colleges, and the state’s public university systems. “The state has a very strong organizational principle that provides clear, intentional pathways for student advancement,” says Wuerthwein. “Many high schools typically offer community college courses either on-site or allow high school students to enroll directly, and are designed to augment AP coursework.”
California community colleges also feed directly into CSU and UC campuses through a structured and long-standing transfer system. “Half of the incoming students at SDSU come from community colleges, and roughly 30% of UC San Diego’s student body enters through the community college system,” shares Wuerthwein. “This structure creates real social mobility. Students can complete two years at a community college and then transfer to CSU or UC to finish their bachelor’s degree. It’s a very well-established program.”

However, this structure isn’t without its challenges, particularly when it comes to academic continuity. “When you explore how the CSUs or UCs integrate these students, it requires an impedance matching between what they learn in the first two years and what they need to know for the final two years,” explains Wuerthwein. “That impedance matching has been a persistent challenge. For the system to fully serve students, especially in emerging fields like data science and AI, it must not only enable access, but ensure alignment in curriculum, skills, and technological infrastructure across all tiers.”

Developing Shared AI Infrastructure

By developing shared AI infrastructure, Wuerthwein says they can significantly ease the transition for students moving through the education pipeline. “The structured nature of California’s public education system creates a strong incentive for faculty at institutions like UC San Diego to actively engage with community colleges. It’s essential that transfer students arrive prepared for the advanced coursework they will encounter in their third and fourth years. San Diego is building a unified system that allows students from high school through community college, CSU, and UC to access and learn on the same computing infrastructure. This shared environment supports education in AI, data-intensive computing, programming, and related fields, creating consistency and continuity across institutions. We’re just at the beginning of this journey, but the goal is to develop infrastructure that supports a common curriculum, smooths academic transitions, and ultimately advances social mobility across the state.”

For Wuerthwein, one of the key lessons learned over more than two decades working in distributed computing is that the most significant challenges are often not technical, but social. “In computer science, there’s a saying that every problem can be solved with another layer of indirection. Technical problems can usually be addressed. Social problems are much harder. Creating a truly unified system, one that spans high schools, community colleges, CSUs, and UCs, requires more than just technology. It demands collaboration, alignment, and shared purpose among institutions that have traditionally operated in silos. To make this work, you have to build around something common that all these institutions can align with.”

“We’ve created a social network of educators through this infrastructure. Why not use that network to connect with experts in agriculture, align priorities, and figure out how to bring real-world problems into the classroom? In this way, the AI infrastructure is more than a technical backbone, it becomes a facilitator for curriculum development, workforce alignment, and cross-sector collaboration. That’s what makes it so exciting. We’re not just moving bytes around anymore. We’re building an ecosystem—one that’s socially and economically meaningful.”

Frank Wuerthwein, Ph.D.,
Director of the San Diego Supercomputer Center (SDSC), Executive Director of the Open Science Grid (OSG), Professor of Physics and Data Science, UC San Diego

Wurthwein_Headshot

“To help advance education and research statewide, we have the Corporation for Education Network Initiatives in California (CENIC),” continues Wuerthwein. “CENIC serves educational institutions, as well as public libraries. In the long term, we want to integrate public libraries as well, because of their potential to be AI makerspaces. Many libraries already host physical makerspaces for hands-on learning, but I think they could evolve into digital hubs for AI education and experimentation. Ten years from now, I think we’ll see AI makerspaces in public libraries, in high schools, in community colleges, and in state universities. These shared environments would offer consistent tools and infrastructure and enable learners to move seamlessly through various stages of education and into careers.”

In California, where educational systems are structurally divided, such as the separation between the UC and CSU systems, a unified AI infrastructure could help create a continuum that supports not just degree programs, but also career training, lifelong learning, and workforce development. “A student could begin in high school, transition into a job, come back for certificates, or pursue ongoing learning,” Wuerthwein explains. “The whole pipeline would be built on a common, cost-effective infrastructure, and by scaling these systems efficiently and borrowing strategies from hyperscalers to optimize resources, there is an opportunity to drive down costs while expanding access. That’s the kind of system we’re building right now. One that supports education, equity, and innovation at scale.”

Wuerthwein says as they roll out the infrastructure, its broader social and educational implications have come into sharp focus. “At first, I thought of this purely as a distributed computing problem, but the more we roll out the infrastructure, the more we realize we’re building something bigger and creating social cohesion among educators. We’re building a community, and that community is becoming an interesting target for people who want to develop curriculum for different things.”

At a recent annual meeting, Wuerthwein collaborated with General Atomics and its partners in the fusion energy sector. “Their industry is preparing for a dramatic transformation: moving from research-focused efforts to commercial-scale fusion within the next 20 years, with hopes of becoming a trillion-dollar industry,” shares Wuerthwein. “That kind of scale-up creates a massive workforce challenge. Suddenly we had a platform where educators from community colleges, CSUs, and UCs were all in one place and could engage with fusion experts who are interested in education. This new ecosystem makes it possible to collaboratively develop curricula that respond directly to the needs of emerging industries.”

Looking ahead, Wuerthwein plans to bring agricultural technology into the fold. With California’s agricultural sector facing growing challenges, there’s a pressing need to incorporate ag-tech themes into STEM education across the state. “We’ve created a social network of educators through this infrastructure,” Wuerthwein says. “Why not use that network to connect with experts in agriculture, align priorities, and figure out how to bring real-world problems into the classroom? In this way, the AI infrastructure is more than a technical backbone, it becomes a facilitator for curriculum development, workforce alignment, and cross-sector collaboration. That’s what makes it so exciting. We’re not just moving bytes around anymore. We’re building an ecosystem—one that’s socially and economically meaningful.”

Solving Scalability Challenges

As director of the SDSC, Wuerthwein leads initiatives that advance high-performance computing, distributed cyberinfrastructure, and collaborative research across scientific disciplines, and sees today’s challenges through the lens of scalability. “Any kind of scalability problem is exciting to the people at SDSC; it speaks to our native skillset and is in our DNA,” says Wuerthwein. “From scaling user support and training to expanding systems across thousands of institutions, SDSC staff view these demands not just as logistical hurdles, but as compelling research questions in their own right. The scale of the educational challenge of serving 20 million students across nearly 4,000 institutions nationwide offers a fertile testing ground for innovation. This mission aligns perfectly with our core expertise, and our team specializes in solving scalability challenges, and this is exactly the kind of problem we’re built to tackle. For SDSC, the intersection of societal need and technical ambition creates a rare opportunity to make meaningful contributions to education and workforce development while advancing the science of distributed computing itself.”
As Principal Investigator (PI) of the National Research Platform (NRP), Wuerthwein sees the initiative as central to enabling the large-scale transformation of education and research that he’s championing. “At its core, the NRP is designed to support scalable, distributed cyberinfrastructure across a wide range of institutions and use cases,” explains Wuerthwein. “The platform operates with three core goals: enabling educational access at scale, reducing institutional costs, and fostering innovation in heterogeneous computing.”

One of the NRP’s foundational contributions is the development of a conceptual software stack, from networking and console layers to higher-level services such as JupyterHub and AI development tools. “The platform is being designed with both vertical and horizontal openness,” states Wuerthwein. “Horizontally, it aims to reach over a thousand institutions of higher education across the country. Currently, about 70 institutions participate. Vertically, the platform is built as an open environment where both academic and commercial developers can build tools and services, especially those focused on affordable, scalable AI education. Commercial entities are interested in building on NRP, because it provides a more cost-effective alternative to commercial cloud services, especially for educational institutions.”

NRP also addresses a pressing technical reality of the end of Moore’s Law, which predicted that computers would become more powerful and cheaper at a steady pace, roughly doubling in processing capability every couple of years. “The traditional computing model where all intelligence resides in the central processing unit (CPU) is being replaced by a new paradigm where peripherals themselves are becoming programmable,” explains Wuerthwein. “As the regular doubling of CPU performance slows, hardware architectures are becoming increasingly diverse and bringing new challenges in system design, integration, and programming. We’re building a garden of heterogeneous architecture. Using Kubernetes rather than traditional batch systems, NRP can support a wide variety of computing devices, including Field-Programmable Gate Arrays (FPGAs), programmable Network Interface Cards (NICs), and even programmable network switches. This flexibility turns NRP into a playground for experimentation, where computer scientists and domain researchers can collaborate.”

“By bringing together technologists and domain scientists on the same platform, we’re creating the opportunity for serendipity,” adds Wuerthwein. “The goal is to accelerate the adoption of emerging architectures while also helping researchers in fields like biology, physics, and astronomy make sense of, and take advantage of, these new tools. Part of the NRP’s mission is to scale out infrastructure for education affordably and inclusively and enable institutions of all sizes to access the tools they need to teach and research AI. Secondly, we want to foster innovation at the infrastructure level and create a collaborative space where new computing paradigms can be tested, adapted, and adopted.”

Dr. Forough Ghahramani, Assistant Vice President for Research, Innovation, and Sponsored Programs at Edge shares, “We are deeply grateful for the partnership with the National Research Platform under Dr. Frank Wuerthwein’s leadership. Through the National Science Foundation (NSF)-funded CRISPIE project—Connectivity through Regional Infrastructure for Scientific Partnerships, Innovation, and Education—we are working together to improve equitable access to advanced research networks and innovation.”

Growing Importance of Data-Intensive Science

In looking at scientific advancement over the past two decades, Wuerthwein says this growth has significantly been powered by progress in computing and data capabilities. “Much of this progress stems from how advances in computing have enabled both the collection and consumption of larger, more complex datasets. As instrumentation has improved, added more sensors, and sampled at faster rates, these tools have generated exponentially more data. But the ability to make sense of that data also hinges on computational advances. Moore’s Law has allowed for exponential growth in hardware performance at constant cost, but that exponential gain, however, is slowing. The only way capability can keep growing is either through radical changes in hardware architecture or through radical advances in algorithms.”

As institutions across the country look to scale data infrastructure for AI, science, and education, research and education networks (RENs) such as CENIC and Edge are proving to be indispensable collaborators in that effort. “Research and education networks play an incredibly crucial role in data-driven and computational learning. They have created a social network of all institutions in their region that provide valuable collaboration opportunities. In the layered cake of technology, we lay on top of each other, we’re not competitors. Together, we can provide services that neither of us could offer alone, and I view the entire REN community as a natural partner in helping us achieve our mission and drive national progress in education and research.”

Frank Wuerthwein, Ph.D.,
Director of the San Diego Supercomputer Center (SDSC), Executive Director of the Open Science Grid (OSG), Professor of Physics and Data Science, UC San Diego

Wurthwein_Headshot

This is where Wuerthwein says AI becomes a pivotal factor: “AI is not just a trend, but a strategic lever for future scientific breakthroughs. It has the potential to give you exponential growth in algorithmic capability for the same amount of money and may be the most prominent path forward for maintaining the pace of scientific progress, particularly as hardware improvements plateau.”

With the growing importance of AI and data-intensive science, researchers are increasingly relying on scalable computing power. “Traditionally, researchers needed to rely on physical allocations from university computing centers or federally funded supercomputing facilities,” says Wuerthwein. “Access was limited, required planning, and was often restricted to select institutions. In contrast, cloud providers like AWS, Google Cloud, and Azure now offer on-demand scalability, but managing those resources in a federally accountable way posed a new challenge. CloudBank 2 is a program funded by the NSF and provides commercial cloud resources to the nation’s science and engineering researchers and educators.”
CloudBank allows the NSF to allocate cloud credits to researchers in a way that maintains full visibility and accountability. “This system is the cloud-computing equivalent of what the NSF has long done through its supercomputing allocation program, ACCESS,” explains Wuerthwein. “A researcher can be awarded, for example, $10,000 in cloud credits to run experiments on commercial platforms. CloudBank ensures the NSF can track that usage, including who accessed it, how it was used, and what results it supported, and offer a transparent structure for reporting back to Congress and the public. Ultimately, CloudBank is the interface between the cloud providers, the community, and the NSF, and helps democratize access to advanced computing, especially for data-intensive research and AI development.”

Creating a Seamless Compute Ecosystem

As a longtime advocate for distributed computing, Wuerthwein sees the Open Science Grid and the NRP not as separate entities, but as a unified way to build global distributed systems. “In my mind, OSG is part of a continuum of distributed computing,” Wuerthwein explains. “NRP actually shows up in OSG as a single cluster, so we’re effectively the cluster provider for institutions that don’t want or can’t afford to run their own. For large research universities, which often maintain their own high-performance computing clusters, there’s a need to retain tight control over access, security, and identity management. These institutions integrate with national infrastructure like OSG at higher layers, contributing resources while maintaining autonomy.”

“Smaller institutions, such as community colleges, typically lack the resources or need to manage their own clusters,” Wuerthwein continues. “For them, NRP acts as a turnkey solution and provides compute and data services run by experts at places like SDSC, without requiring in-house infrastructure or staff. If you want full control, you need your own people to manage systems, networks, storage, and that requires scale and money. If you don’t have that scale, outsourcing it to NRP makes more sense.”

SDSC

Wuerthwein says the key difference between OSG and the NRP is where each integrates into the stack and how much control institutions prefer. “Research universities tend to run their own infrastructure and want to retain more control, so they connect at higher layers. But for community colleges or institutions without large-scale research needs, it may make more sense to rely on a platform like NRP to handle computing for them. What we’re doing is essentially providing a cluster on their behalf, where NRP appears inside OSG as a single cluster. That means we take care of the heavy lifting while giving these institutions access to national-scale resources.”

One challenge in connecting these systems is ensuring that they communicate and account consistently. “Making this ecosystem work requires solving a range of technical coordination issues, like agreeing on naming conventions, tracking usage, and aligning systems across layers,” explains Wuerthwein. “In the end, this layered model allows us to scale access to advanced computing in a cost-effective and equitable way, support institutions of all sizes, and enable scientific collaboration and education at scale.”

Advancing Science through Partnership

As institutions across the country look to scale data infrastructure for AI, science, and education, research and education networks (RENs) such as CENIC and Edge are proving to be indispensable collaborators in that effort. “Research and education networks play an incredibly crucial role in data-driven and computational learning,” says Wuerthwein. “They have created a social network of all institutions in their region that provide valuable collaboration opportunities. In the layered cake of technology, we lay on top of each other, we’re not competitors. Together, we can provide services that neither of us could offer alone, and I view the entire REN community as a natural partner in helping us achieve our mission and drive national progress in education and research.”

UCSD_logoAs AI and data-driven discovery continue to shape science, education, and industry, Wuerthwein remains energized by the opportunities that lie ahead. His motivation is rooted not only in solving complex technical challenges, but in connecting people, domains, and ideas. “What I am most excited about personally is the exposure my job provides me to so many different, exciting opportunities,” he says. “I love to learn about new things and put structures together that serve different domains and problems. Solving problems that are intellectually interesting and impactful inspires me to get up in the morning.”

“Academia has an opportunity to provide value to industry in ways that go beyond just educating people,” continues Wuerthwein. “If we align workforce development with real-world challenges, then the people we educate will be that much more valuable and effective in industry roles. There’s a partnership model waiting to be built—one where research, education, and industry innovation all feed each other. In the years ahead, the financial footing of academia can’t rely solely on tuition, federal and state funding, or philanthropy. We need a new model where industry directly funds collaborative problem-solving and, in turn, derives real value. That’s how we ensure that academic research and education remain not only relevant, but essential to society’s future.”