View all on-demand sessions from the Intelligent Security Summit here.
Foundation models are changing the way artificial intelligence (AI) and machine learning (ML) can be used. However, all that power comes at a cost, as building basic AI models is a labor-intensive task.
IBM announced today that it has built its own AI supercomputer to serve as the literal foundation for its basic model training research and development initiatives. Called Vela, it is designed as a cloud-native system using industry-standard hardware, including x86 silicon, Nvidia GPUs, and Ethernet-based networking.
The software stack that enables basic model training uses a range of open source technologies, including Kubernetes, PyTorch, and Ray. Although IBM is only now officially revealing the existence of the Vela system, it has been online in various capacities since May 2022.
“We really think this technology concept around basic models has huge, huge disruptive potential,” Talia Gershon, director of hybrid cloud infrastructure research at IBM, told VentureBeat. “So as a division and as a company, we are investing heavily in this technology.”
The AI and budget-friendly foundation within Vela
IBM is no stranger to the world of high-performance computing (HPC) and supercomputers. One of the fastest supercomputers in the world is the Summit supercomputer, built by IBM and currently in use at Oak Ridge National Laboratory.
However, the Vela system is not like other supercomputer systems IBM has built to date. For starters, the Vela system is optimized for AI and uses standard x86 hardware, as opposed to the more exotic (and expensive) equipment typically found in HPC systems.
Unlike Summit, which uses the IBM Power processor, each Vela node has a pair of Intel Xeon Scalable processors. IBM is also charging Nvidia GPUs, with each node in the supercomputer crammed with eight 80GB A100 GPUs. In terms of connectivity, each of the compute nodes is connected through multiple 100 gigabit-per-second Ethernet network interfaces.
Vela is also purpose-built for cloud native, meaning it runs Kubernetes and containers to power application workloads. More specifically, Vela relies on Red Hat OpenShift, Red Hat’s Kubernetes platform. Vela is also optimized to run PyTorch for ML training and uses Ray to help scale workloads.
IBM has also developed a new workload scheduling system for its new cloud-native supercomputer. IBM has long used its own for many of its HPC systems Spectrum LSF (load-sharing facility) for scheduling, but that system is not what the new Vela supercomputer uses. IBM has developed a new scheduler called MCAD (multicluster app dispatcher) to handle cloud-native job scheduling for basic model AI training.
IBM’s growing foundation model portfolio
All that hardware and software that IBM put together for Vela is already being used to support IBM’s base model efforts.
“All research and development on all of our base models is done cloud-native on that stack on the Vela system and IBM Cloud,” said Gershon.
Last week, IBM announced a partnership with NASA to help develop basic climate science models. IBM is also working on a base model called Molformer XL for life sciences that can help create new molecules in the future.
The base model work also extends to enterprise IT with the Project Wisdom effort announced in October 2022. Project Wisdom is being developed to support Red Hat Ansible IT configuration technology. Typically, the configuration of IT systems can be a complicated exercise that requires domain knowledge to do it right. Project Wisdom aims to provide Ansible with a natural language interface, where users simply type what they want and understand the basic model and then help it perform the desired task.
Gershon also hinted at a new IBM base cybersecurity model that has not yet been made public and is being developed using the Vela supercomputer.
“We haven’t said much about it externally, I think on purpose,” Gershon said of the basic cybersecurity model. “We truly believe this technology will be transformational in terms of detecting threats.”
While IBM is building out a portfolio of base models, it doesn’t intend to compete directly with some of the well-known generic base models, such as OpenAI’s GPT-3.
“We’re not necessarily focused on building general AI, while some other players may be more focused on that,” Gershon said. “We’re interested in basic models because we think it has tremendous business value for enterprise use cases.”
VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.