Machine Learning on VMware Platform – Part 3 – Training versus Inference

Machine Learning on VMware Cloud Platform – Part 1 covered the three distinct phases: concept, training, and deployment, part 2 explored the data streams, the infrastructure components needed and vSphere can help with increasing resource utilization efficiency of ML platforms. In this part, I want to go a little bit … [Read more...]

Unexplored Territory Podcast Episode 18 – Not just artificially intelligent featuring Mazhar Memon

In this week’s Unexplored Territory Podcast, we have Mazhar Memon as our guest. Mazhar is one of the founders of VMware Bitfusion and the principal inventor of Project Radium. In this episode, we talk to him about the start of Bitfusion, what challenges Project Radium solves, and what role the CPU has in an ML world. … [Read more...]

MACHINE LEARNING ON VMWARE CLOUD PLATFORM – PART 2

Resource Utilization Efficiency Machine learning, especially deep learning, is notorious for consuming large amounts of GPU resources during training. However, as the last part already highlighted, machine learning is more than just training a model. And these components within the machine learning workflow require … [Read more...]

Machine Learning on VMware Cloud Platform – Part 1

Machine Learning is reshaping modern business. Most VMware customers look at machine learning to increase revenue or decrease cost. When talking to customers, we mainly discuss the (vertical) training and inference stack details. The stack runs a machine learning model inside a container or a VM, preferably onto an … [Read more...]

Stop designing your server platform with solely the CPU roadmap in mind

Over the last 20 years, we designed our core data center platform following the CPU roadmap. But in today’s world, the devices attached to the processor make radical and revolutionary improvements, catering to the needs of the new workloads. I’m talking about devices like the GPU, the network adapter, and … [Read more...]

Unexplored Territory #005: AI Enterprise, DPUs, and NVIDIA Launchpad with Luke Wignall

Episode 004 is out! This time we talk to Luke Wignall, who is the Director of Technical Product Marketing at NVIDIA. We talk about some of the announcements made during the NVIDIA GTC Conference. Luke discusses NVIDIA Launchpad, AI Enterprise, and of course, we touch on DPUs aka SmartNICs. A great conversation with A … [Read more...]

Exciting Sessions from NVIDIA GTC Fall 2021

Over the last few weeks, I watched many sessions of the NVIDIA Fall version of GTC. I created a list of interesting sessions for a group of people internally at VMware, but I thought the list might interest some outside VMware. It’s primarily focused on understanding NVIDIA’s product and services suite and … [Read more...]

Project Monterey and the need for Network Cycles Offload for ML Workloads.

VMworld has started, and that means a lot of new announcements. One of the most significant projects VMware is working on is project Monterey. Project Monterey allows the use of SmartNICS, also known as Data Processing Units, of various VMware partners within the vSphere platform. Today we use the CPU inside the ESXi … [Read more...]

Machine Learning Infrastructure from a vSphere Infrastructure Perspective

For the last 18 months, I’ve been focusing on machine learning, especially how customers can successfully deploy machine learning infrastructure on vSphere infrastructure. This space is exciting as it has so many great angles to explore. Besides the model training, a lot of stuff happens with the data. Data is … [Read more...]

Deep Learning Technology Stack Overview for the vAdmin – Part 1

Introduction We are amid the AI “gold rush.” More organizations are looking to incorporate any form of machine learning (ML) or deep learning in their services to enhance customer experience, drive efficiencies in their processes or improve quality of life (healthcare, transportation, smart … [Read more...]