Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
AI 搜索 | 作者 industry_watcher | 发布于6 小时前 | | 阅读数:66微软 Adds DRA-Backed 英伟达 vGPU Support to AKS
原文标题: Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
来源: InfoQ | 分类: news
原文链接: Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
📰 中文摘要
Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
[
DevOps
](/Devops/)
Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
Mar 19, 2026
3
Listen to this article - 0:00
Audio ready to play
0:00
0:00
Normal1.25x1.5x
Like
The Azure Kubernetes Service team shared a detailed guide on how to use [Dynamic Resource Allocation (DRA)](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource...
🔍 内容解读
段落 1
Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
💡 行业趋势:这反映了当前技术发展的前沿方向,值得持续关注其后续进展。
段落 2
Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS
💡 行业趋势:这反映了当前技术发展的前沿方向,值得持续关注其后续进展。
段落 3
The Azure Kubernetes Service team shared a detailed guide on how to use Dynamic Resource Allocation (DRA) with NVIDIA vGPU technology on AKS. his update improves control and efficiency for shared GPU use in AI and media tasks.
🤖 AI技术解读:这体现了人工智能技术在垂直领域的深入应用,展示了AI如何改变传统行业的工作方式。
段落 4
Dynamic Resource Allocation (DRA) is now the standard for GPU resource use in Kubernetes. Instead of static resources like nvidia.com/gpu, GPUs are allocated dynamically using DeviceClasses and ResourceClaims. This change enhances scheduling and improves integration with virtualization technologies like NVIDIA vGPU.
🤖 AI技术解读:这体现了人工智能技术在垂直领域的深入应用,展示了AI如何改变传统行业的工作方式。
段落 5
The reason for combining these technologies is clear: virtual accelerators like NVIDIA vGPU often handle smaller tasks. They allow one physical GPU to be split among many users or applications. This setup is helpful for enterprise AI/ML development, fine-tuning, and audio/visual processing. vGPU offers predictable performance while still providing CUDA capabilities to containerized workloads.
🤖 AI技术解读:这体现了人工智能技术在垂直领域的深入应用,展示了AI如何改变传统行业的工作方式。
段落 6
On the infrastructure side, this feature relies on Azure's NVadsA10_v5 virtual machine series. Instead of assigning the whole GPU to one VM, vGPU technology partitions it into multiple fixed-size slices at the hypervisor layer. From Kubernetes' view, each VM shows one clear GPU device. The hypervisor sets capacity and memory limits, not the software.
🚀 产品解读:新产品的推出填补了市场空白,为目标用户群体提供了更便捷的解决方案。
段落 7
The setup requires Kubernetes 1.34 or newer. At this point, DRA primitives like deviceclasses and resourceslices are available. Teams provision a node pool with NVadsA10_v5 instances and apply a label (nvidia.com/gpu.present=true) for the NVIDIA DRA kubelet plugin as its node selector. They then deploy the NVIDIA DRA driver via Helm. The post highlights three important Helm flags for vGPU scenarios. The gpuResourcesEnabledOverride=true flag skips a check that prevents the NVIDIA DRA driver from installing with the legacy device plugin due to different GPU names. FeatureGates.IMEXDaemonsWithDNSNames=false disables an IMEX feature that requires a newer GRID driver version than what's supported on the A10 series in Azure.
🤖 AI技术解读:这体现了人工智能技术在垂直领域的深入应用,展示了AI如何改变传统行业的工作方式。
📄 完整原文
点击展开查看完整原文
Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS [ DevOps ](/Devops/) # Microsoft Adds DRA-Backed NVIDIA vGPU Support to AKS Mar 19, 2026 3 Listen to this article - 0:00 Audio ready to play 0:00 0:00 Normal1.25x1.5x Like The Azure Kubernetes Service team shared a [detailed guide](https://blog.aks.azure.com/2026/03/06/dra-with-vGPUs-on-aks) on how to use [Dynamic Resource Allocation (DRA)](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) with NVIDIA vGPU technology on AKS. his update improves control and efficiency for shared GPU use in AI and media tasks. [Dynamic Resource Allocation (DRA)](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) is now the standard for GPU resource use in Kubernetes. Instead of static resources like nvidia.com/gpu, GPUs are allocated dynamically using DeviceClasses and ResourceClaims. This change enhances scheduling and improves integration with virtualization technologies like NVIDIA vGPU. The reason for combining these technologies is clear: virtual accelerators like NVIDIA vGPU often handle smaller tasks. They allow one physical GPU to be split among many users or applications. This setup is helpful for enterprise AI/ML development, fine-tuning, and audio/visual processing. vGPU offers predictable performance while still providing CUDA capabilities to containerized workloads. On the infrastructure side, this feature relies on Azure's [NVadsA10_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nvadsa10v5-series?tabs=sizebasic) virtual machine series. Instead of assigning the whole GPU to one VM, vGPU technology partitions it into multiple fixed-size slices at the hypervisor layer. From Kubernetes' view, each VM shows one clear GPU device. The hypervisor sets capacity and memory limits, not the software. The setup requires [Kubernetes 1.34](https://kubernetes.io/releases/1.34/) or newer. At this point, DRA primitives like deviceclasses and resourceslices are available. Teams provision a node pool with NVadsA10_v5 instances and apply a label (nvidia.com/gpu.present=true) for the NVIDIA DRA kubelet plugin as its node selector. They then deploy the NVIDIA DRA driver via Helm. The post highlights three important Helm flags for vGPU scenarios. The gpuResourcesEnabledOverride=true flag skips a check that prevents the NVIDIA DRA driver from installing with the legacy device plugin due to different GPU names. FeatureGates.IMEXDaemonsWithDNSNames=false disables an IMEX feature that requires a newer GRID driver version than what's supported on the A10 series in Azure. /filters:no_upscale()/news/2026/03/microsoft-nvidia-gpu/en/resources/1DRA_A10_vGPU_AKS_diagram-205255c16d9ef73b720c03cec8e0de7f (1)-1773844201509.jpg) Beyond the baseline one-sixth slice (Standard_NV6ads_A10_v5), the series offers a one-third profile with 8 GB of accelerator memory and a one-half profile with 12 GB. Limits are enforced at the hypervisor layer, so AKS sees a single GPU device with predictable capacity. This gives platform teams flexibility to size GPU allocation based on workload needs without overprovisioning nodes. The AKS team frames the broader significance as directional. As GPUs become first-class resources in Kubernetes, combining virtualized GPU with DRA offers a practical way to run shared, production-grade workloads. For large AKS deployments, especially in regulated or cost-sensitive industries, optimal GPU placement and utilization directly impact job throughput and infrastructure efficiency. Using DRA with vGPU helps organizations move from coarse node-level allocation to controlled, workload-driven GPU use at scale. Google Cloud is pursuing a [similar path on GKE](https://cloud.google.com/blog/products/compute/google-cloud-ai-infrastructure-at-nvidia-gtc-2026), focusing on DRA as a scheduling primitive for both GPUs and TPUs. GKE's DRA support lets workloads use CEL expressions to filter devices with specific attributes. This allows a single manifest to deploy to different clusters with various GPU types without changes. Specifically for vGPU, Google recently previewed fractional G4 VMs using NVIDIA vGPU technology based on the RTX PRO 6000 Blackwell GPU, managed through GKE and combined with container binpacking for higher utilization. When scheduled via Google's Dynamic Workload Scheduler, fallback priorities can improve resource access. ## About the Author Claudio Masolo Show moreShow less ### Rate this Article Adoption Style Author Contacted This content is in the DevOps topic Related Topics: Related Editorial Related Sponsors [ From Observability to Actionability: Designing Agentic AI for Autonomous SRE on AWS ](/url/f/f418403e-5267-48e9-8089-de3ab3c6f579/) Related Sponsor [ /filters:no_upscale()/sponsorship/topic/8e5012e2-847d-4389-ac4d-ff70a961fc6e/NeuBirdLogo-1770640733556.png) ](/url/f/f9183a9e-f112-42f5-ab9e-75c2adb8fa43/) Boost AWS effectiveness with Agentic AI — unify telemetry, reduce noise, and resolve incidents faster. [Learn More](/url/f/8d627844-a2b6-4169-af23-42b6082653fa/).** ### Related Content A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. Enter your e-mail address Select your country Select a country I consent to InfoQ.com handling my data as explained in this [Privacy Notice](https://www.infoq.com/privacy-notice).🏢 关于 InfoQ
InfoQ 是知名的技术媒体,专注于报道人工智能、科技创新领域的最新动态。
📎 相关链接
本文内容由 AI 辅助整理,仅供学习交流使用。版权归原作者所有。

图片来源: InfoQ
📷 相关图片
本文地址:http://searchkit.cn/article/15797