完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
我想知道是否可以看到每个VM的vGPU利用率。
不在操作系统内,而是来自Grid K1卡。 例如,如果GPU 0为80%,那么如果我知道45%的数字来自特定的VM,那就太棒了。 我查看了论坛,但没有找到任何具体的帖子。 我尝试使用nvidia-smi CLI命令。 例如:'nvidia-smi -q',但它显示了每个物理GPU的详细信息,包括利用率。 每个虚拟机利用率都没有。 谢谢,我感谢您对此请求的任何帮助。 以上来自于谷歌翻译 以下为原文 I want to know if it is possible to see the vGPU utilization per VM. Not within the OS but from the Grid K1 card. For instance, if GPU 0 is at 80%, it would be great if I knew 45% of that number is coming from a specific VM. I looked through the forums but didn't find any specific posts on this. I tried using nvidia-smi CLI command. For instance: 'nvidia-smi -q' but while it showed detailed info on each physical GPU, including utilization. There was no per VM utilization. Thanks and I appreciate any help on this request. |
|
相关推荐
24个回答
|
|
此外,这个请求的原因是我们在第一次vGPU部署中发现高GPU使用率和少量用户。
这是设置。 网格K1 K120Q vGPU简介 每个物理GPU 3-5个用户等于95%+利用率 用户行为对于MS office应用程序和具有硬件加速功能的Chrome / IE来说是相当标准的。 以上来自于谷歌翻译 以下为原文 Also, the reason for this request is we are finding high GPU utilization with a low number of users in our first vGPU deployment. Here is the setup.
User behavior is fairly standard with MS office applications and Chrome/IE with HW acceleration on. |
|
|
|
None
以上来自于谷歌翻译 以下为原文 Hi Taskman, I'm afraid it isn't but this is a need highlighted to product management. You can monitor the framebuffer for each VM though but not the GPU processing. https://virtuallyvisual.wordpress.com/2015/07/27/limitations-in-monitoring-shared-nvidia-gpu-technologies/ (this is worth reading as explains how trying to monitor in VM would be very misleading) https://virtuallyvisual.wordpress.com/2015/09/09/monitoring-nvidia-gpu-usage-of-the-framebuffer-for-vgpu-and-gpu-passthrough/ The framebuffer usage may give you an idea of which applications are using the GPU but has to be done via the process manager in VM. I will pass on this feedback to the product managers. Best wishes, Rachel |
|
|
|
BTW
什么是堆栈,例如 XenDesktop的+的vSphere? K1基本上是4xK600卡和chrome(和浏览器一般)可以非常渴望GPU看到:https://www.virtualexperience.no/2015/11/05/mythbusting-browser-gpu-usage-on-xenapp/ 因此,如果他们正在观看大量视频,则每个pGPU的4-5个用户是K600的1/4。 使用的编解码器/图形模式也将使用CPU和/或GPU(视图上的新爆炸使用GPU)。 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 BTW What is the stack e.g. XenDesktop+vSphere? The K1 is essentially 4xK600 cards and chrome (and browsers in general) can be very GPU hungry see: https://www.virtualexperience.no/2015/11/05/mythbusting-browser-gpu-usage-on-xenapp/ So 4-5 users per pGPU is 1/4 of a K600 if they are watching a lot of video. The codecs/graphics mode in use will also use CPU and or GPU (new blast extreme on view uses GPU). Best wishes, Rachel |
|
|
|
感谢Rachel的快速回复。
我们目前正在将vSphere ESXi 6与Horizon View 6.2配合使用。 我看了两个链接,我试图找到他们提到如何监视帧缓冲区的地方。 这是perfmon计数器还是CLI命令? 谢谢! 以上来自于谷歌翻译 以下为原文 Thanks Rachel for the quick response. We are currently using vSphere ESXi 6 with Horizon View 6.2. I looked at both links and I'm trying to find where they mention how to monitor Frame Buffer. Is that a perfmon counter or CLI command? Thanks! |
|
|
|
JS在一年前的视频解释说,无法监控vGPU性能。
一些第三方销售vGPU性能监视工具http://goliathtechnologies.com/software/goliath-nvidia-performance-monitor/。 (但它似乎使用相同的全球“NVML / nvidia-smi”性能指标。由RachelBerry更新。) 今天有详细的vGPU性能监视器的可靠API吗? 如果“nvidia-smi pmon”在Dom0(用于监视每个DomU)或vGPU DomU(用于监视DomU内的每个进程)中工作将非常有用。 它再次是关于神秘的GPU时间调度器配置& 可观察性(https://gridforums.nvidia.com/default/topic/743/talks-with-the-developers/gpu-scheduler-for-vgpu/)! 如果无法监控3年后DomU或DomU中每个进程的详细vGPU性能,那么它会给NVidia带来耻辱。 最好的问候,M.C> 编辑05/06/2016 以上来自于谷歌翻译 以下为原文 JS made explanation video year ago that there is not possible to monitor vGPU performance. Some 3rd-party sell vGPU performance monitor tools http://goliathtechnologies.com/software/goliath-nvidia-performance-monitor/. (But it seems to use the same global "NVML/nvidia-smi" performance metric. Updated by RachelBerry.) Is there any reliable API for detailed vGPU performance monitor today ? It will be very useful if "nvidia-smi pmon" will work in Dom0 (for monitoring per DomU) or in vGPU DomU (for monitoring per processes inside DomU). It is again about mysterious GPU timesliced scheduler configuration & observability (https://gridforums.nvidia.com/default/topic/743/talks-with-the-developers/gpu-scheduler-for-vgpu/)! It brings shame on NVidia if it is not possible to monitor detailed vGPU performance per DomU or per process in DomU after 3 years. Best regards, M.C> Edited 05/06/2016 |
|
|
|
您可以使用直通模式查看一个VM的gpusizer并在决定vgpu配置文件之前测试一个VM使用了多少。
另一种方法是确保在一个pgpu上只有一个vgpu活动的VM,然后你可以使用gpu-z或uberagent来获得每个进程的gpu和每个vm的正确结果。 如果您在同一物理gpu上有多个vm,则不能依赖于VM内度量标准。 k1上的浏览器视频使用率通常为每个物理gpu(pgpu)3-4个用户,但是对于支持gpu的浏览器,CPU仍然非常强烈。 看看这些博文: http://www.virtualexperience.no/2015/11/05/mythbusting-browser-gpu-usage-on-xenapp/ http://www.virtualexperience.no/2015/01/07/im-100-sure-that-100-is-not-100/ 以上来自于谷歌翻译 以下为原文 You can use passthrough mode to see gpusizer for one VM And test how much one VM is using before deciding vgpu profile. Another way is to make sure you have only one VM with vgpu active on one pgpu, then you can use gpu-z or uberagent to get gpu per process And per vm with correct result. If you have multiple vm's on the same Physical gpu you cannot rely on in-VM metrics. Browser video usage on a k1 is typically 3-4 users per physical gpu (pgpu), but CPU is still quite intense with gpu enabled browsers. Have a look at these blogposts: http://www.virtualexperience.no/2015/11/05/mythbusting-browser-gpu-usage-on-xenapp/ http://www.virtualexperience.no/2015/01/07/im-100-sure-that-100-is-not-100/ |
|
|
|
谢谢大家的快速回复,这是一个非常活跃的论坛。
现在,让我回到这篇文章的原因。 我们正在进行第一次vGPU部署,并且我注意到pGPU上有3-5个用户的99%利用率。 我已经使用process-explorer和GPU-Z完成了更多测试。 当VM自身位于pGPU上时,它占用了20-25%的GPU利用率。 起初,看起来这些工具无济于事,因为他们没有使用GPU资源显示任何进程。 但是,一旦我加载Chrome(GPU加速),process-explorer就会将其注册为使用GPU资源。 没有其他进程正在使用任何GPU资源,我已经尝试将所有正在运行的进程剥离到系统,vmware,nvidia进程。 我在两个不同的主机上复制了这个行为,每个主机都使用了两张K1卡。 一旦我的用户登录到VM上的桌面,pGPU利用率就会达到20-25%。 这使它看起来像是父图像或K1上的配置问题。 以上来自于谷歌翻译 以下为原文 Thank you everyone for the quick responses, this is a very active forum. Now, let me go back to the reason for this post. We are doing our first vGPU deployment and I'm noticing 99% utilization with 3-5 users on a pGPU. I have completed more testing with process-explorer and GPU-Z. When a VM is on a pGPU by itself, it is consuming 20-25% GPU utilization. At first it looked like the tools would not help as they showed no process using GPU resources. However, once I loaded Chrome (GPU accelerated), process-explorer then registered it as using GPU resources. No other process is using any GPU resources and I have tried stripping down all running processes to just the systems, vmware, nvidia processes. I duplicated the behavior on two different hosts, each with two K1 cards being used. As soon as my user logs into the desktop on the VM, pGPU utilization hits 20-25%. This makes it seem like its the parent image or an issue with the configuration on the K1. |
|
|
|
可以有许多隐藏的vGPU应用程序:
- Windows Aero编写器,尝试将其关闭 - 远程协议也使用vGPU资源(nvifr / nvfbc / nvenc),尝试通过直接控制台访问桌面 - 电源管理也存在问题,很多时候我看到两个Windows操作系统启动后的利用率为25%,但在第三个Windows操作系统之后只有5%。 如果它保持在P8(省电状态),请检查nvidia-smi中的“Perf”列。 它不能在外面进行监管(https://gridforums.nvidia.com/default/topic/378/) 以上来自于谷歌翻译 以下为原文 There can be many hidden vGPU application: - Windows Aero composer, try to switch it off - remoting protocol also use vGPU resources (nvifr/nvfbc/nvenc), try to access desktop over direct console - there can be also problem with power management, many times I see 25% utilization after two windows OS starts but only 5% after 3rd windows OS. Check "Perf" column from nvidia-smi if it is stay in P8 (power saving state). It cannot be regulated outside (https://gridforums.nvidia.com/default/topic/378/) |
|
|
|
谢谢@mcerveny,我没有意识到PCOIP协议在pGPU利用率中起了作用。
一旦断开与VM的连接,利用率就会下降到0%,因为它是该pGPU上唯一的VM。 重新登录时返回22-25%。 我猜(希望)这不正常。 如果它是一个因素,这里是用于PCOIP的GPO设置。 PCoIP会话变量/不可覆盖管理员设置 政策设定评论 配置剪贴板重定向已启用 配置剪贴板重定向仅启用客户端到服务器 政策设定评论 配置PCoIP会话带宽下限已启用 将PCoIP会话带宽下限设置为每秒千比特为:2000 政策设定评论 关闭Build-to-Lossless功能已启用 以上来自于谷歌翻译 以下为原文 Thanks @mcerveny, I didn't realize the PCOIP protocol played a factor in the pGPU utilization number. As soon as I disconnect from the VM, utilization drops to 0% as it was the only VM on that pGPU. Then back to 22-25% when logged back in. I'm guessing (hoping) this is not normal. In case it is a factor, here is the GPO settings being used for PCOIP. PCoIP Session Variables/Not Overridable Administrator Settingshide Policy Setting Comment Configure clipboard redirection Enabled Configure clipboard redirection Enabled client to server only Policy Setting Comment Configure the PCoIP session bandwidth floor Enabled Set PCoIP session bandwidth floor in kilobits per second to: 2000 Policy Setting Comment Turn off Build-to-Lossless feature Enabled |
|
|
|
新的更新,我正在挖掘发行说明,并在已知问题部分遇到了这个问题。
这正是我所看到的,但奇怪的是我没有找到其他人在网上报道它。 Nvidia的任何人都可以提供Ref#1735009的状态吗? 谢谢 从发行说明361.40 / 362.13 nvidia-smi显示具有活动状态的vGPU VM的高GPU利用率 Horizon会议 说明vGPU具有活动Horizon连接的VM使用较高的百分比 ESXi主机上的GPU。 GPU的利用率仍然很高 即使没有活动,Horizon会话的持续时间也是如此 VM上运行的应用程序。 版 解决方法无 状态开放 参考。 #1735009 以上来自于谷歌翻译 以下为原文 New update, I was digging through the release notes and came across this in the known issues section. This is exactly what I'm seeing but it is odd that I didn't find anyone else reporting it online. Can anyone at Nvidia provide a status on Ref# 1735009? Thanks From Release notes of 361.40/362.13 nvidia-smi shows high GPU utilization for vGPU VMs with active Horizon sessions Description vGPU VMs with an active Horizon connection utilize a high percentage of the GPU on the ESXi host. The GPU utilization remains high for the duration of the Horizon session even if there are no active applications running on the VM. Version Workaround None Status Open Ref. # 1735009 |
|
|
|
你好任务员,
该问题随VMware开放以解决问题。 我不知道根本原因或我担心的任何解决方法,并且它不会影响每个会话。 PCoIP本身不使用GPU进行编码,但它确实查询API直接从FrameBuffer读取。 当从具有单个显示器的客户端访问时,BLAST(从7.0开始)将使用GPU进行编码。 Magnar& mcerveny几乎涵盖了所有其他可能的原因,记住K1是一个相当小的GPU,所以很容易用一些浏览器应用加载它,并且通常,虽然反直觉,只有2个GPU的卡(K2 / 如果应用程序负载需要GPU资源而不是图形内存,则M60)可以提供更好的性能和密度。 以上来自于谷歌翻译 以下为原文 Hi Taskman, That issue is open with VMware for resolution. I don't know the root cause or any workaround I'm afraid, and it doesn't affect every session. PCoIP itself doesn't use the GPU for encoding, but it does query the API's to read directly from the FrameBuffer. BLAST (since 7.0) will use the GPU for encoding when accessing from a client with a single display. Magnar & mcerveny have both pretty much covered all the other likely causes, remember the K1 is a pretty small GPU, so it's easy to load it up with a few browser apps, and often, though counter intuitive, the cards with just 2 GPU's (K2 / M60) can give better performance and density if the application load requires GPU resource over Graphics Memory. |
|
|
|
正如M.C所指出的那样,第三方工具谎言Goliath非常好,它们使用NVIDIA APIS以及由管理程序供应商从中派生的那些工具,并与我们密切合作以确保正确使用并且互操作性良好。
然而它们是有限的,因为nvidia-smi是由卡的不足以提供每个VM信息GPU资源使用的功能,因此它不是第三方可以提供的功能。 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 As M.C points out there are third-party tools liek Goliath, which is very good, they use the NVIDIA APIS and those derived from them by the hypervisor vendors and work with us closely to ensure used properly and interoperability is good. However they are limited as nvidia-smi is by the underlyign capabilities of the card to provide per VM info GPU resource usage and so it's not functionality a third-party can provide either. Best wishes, Rachel |
|
|
|
感谢大家。
我刚刚与VMware进行了交谈,我的问题与发行说明中的已知问题相符。 他们正在升级到Nvidia。 对于我们所处的POC,我已将部署更改为Depth-First而不是广度优先,以便进行负载测试并识别此类潜在问题。 就目前而言,在Nvidia发布解决方案之前,我将切换回广度优先来缓解此问题。 Per @JasonSouthern提到了K1s的功能和我们看到的数字。 我还将与Nvidia联系,了解M60作为我们POC的一部分。 一旦发生,我会更新这篇文章。 再次感谢。 以上来自于谷歌翻译 以下为原文 Thanks everyone. I just spoke with VMware and my issue matches the known issue in the release notes. They are escalating it on their end to Nvidia. For the POC we are in, I had changed the deployment to Depth-First instead of breadth-first in order to do a load test and identify potential issues like this. For now, I'll switch back to breadth-first to mitigate this issue until a resolution is released by Nvidia. Per @JasonSouthern mentions of the capabilities of the K1s and the numbers we are seeing. I am also going to contact Nvidia about an eval for the M60 as part of our POC. I'll update this post once that occurs. Thanks again. |
|
|
|
嗨伙计,
现在,支持人员已经发布了一个KB,用于解释主机和每个VM上的帧缓冲监控。 因此,虽然您无法获得每个VM的GPU资源,但这可能有助于了解您的应用程序使用情况: http://nvidia.custhelp.com/app/answers/detail/a_id/4108/ 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 Hi folks, Support have now published a KB explaining framebuffer monitoring both on host and per VM. So while you can't get GPU resource per VM this may be of use for understanding your application use: http://nvidia.custhelp.com/app/answers/detail/a_id/4108/ Best wishes, Rachel |
|
|
|
@Taskman,我认为K1可能会因你的目的而动力不足。
具有直通许可的M60解决方案非常实惠,并且可以为您提供更多的GPU功能,并且可以更好地扩展。 这就是我们正在实施的路线。 以上来自于谷歌翻译 以下为原文 @Taskman, I think the K1 may be underpowered for your purposes. The M60 solution with passthrough licensing is quite affordable and will give you a lot more GPU power and will scale much better. That's the route we're in the process of implementing. |
|
|
|
任务管理器,
在启动Horizon PCoIP会话时,有关此20-25%GPU利用率问题的任何更新? 这绝对是我缺乏GRID性能的一个因素。 帕斯卡尔 以上来自于谷歌翻译 以下为原文 Taskman, any update on this 20-25% GPU utilization issue when initiating a Horizon PCoIP session? This is definitely a contributing factor to my lack of GRID performance. Pascal |
|
|
|
嗨Pascal,
这是VMware堆栈中确定的问题(即,不是一个NVIDIA可以解决的),因此您需要向他们提出一张票并请求修复(尽管我不相信其中一个已经发布)。 我们正在跟踪它并将案例传递给VMware。 我们在草案中有一篇知识库文章: 症状/错误 vSphere / View部署和NVIDIA GRID vGPU可以看到高GPU负载,即使会话/虚拟机处于空闲状态也可以看到这种情况。 nvidia-smi显示具有活动Horizon会话的vGPU VM的高GPU利用率。 具有活动Horizon连接的vGPU VM利用ESXi主机上的大部分GPU。 即使VM上没有活动的应用程序,GPU利用率在Horizon会话期间仍然很高。 NVIDIA参考 #1735009 解决方法/解决方案 目前没有解决方法,受影响的客户需要向VMware提出支持案例,希望在未来的产品版本中发布修复程序。 问题出在Horizon View产品中,因此这不是NVIDIA可以解决的问题。 受影响的产品 使用NVIDIA GRID vGPU和NVIDIA GRID卡(K1,K2,M60,M6,M10)时的VMware Horizon View 7.0及更早版本。 Citrix产品 此问题仅影响VMware Horizon View以及相关的Blast Extreme和PCoIP协议。 Citrix XenDesktop / XenApp和HDX / ICA不受此问题的影响。 参考 NVIDIA GRID vGPU for VMware的最新发行说明(版本361.40 / 362.13)中记录了此问题: 以上来自于谷歌翻译 以下为原文 Hi Pascal, It's an issue identified in the VMware stack (i.e. not one NVIDIA can resolve) and as such you need to raise a ticket with them and request a fix (although I don't believe one has been released yet). We are trackign it and passing on cases to VMware. We have a KB article in draft: Symptom / Error High GPU load is seen with vSphere/View deployments and NVIDIA GRID vGPU, this may be seen even when sessions/VMs are idle. nvidia-smi shows high GPU utilization for vGPU VMs with active Horizon session. vGPU VMs with an active Horizon connection utilize a high percentage of the GPU on the ESXi host. The GPU utilization remains high for the duration of the Horizon session even if there are no active applications running on the VM. NVIDIA Ref. #1735009 Workaround / Solution There is no workaround currently and customers affected need to raise a support case with VMware who hope to release a fix in a future release of their product. The issue is within the Horizon View product and as such this is not an issue NVIDIA can resolve. Affected Products VMware Horizon View 7.0 and earlier when using NVIDIA GRID vGPU and NVIDIA GRID Cards (K1, K2, M60, M6, M10). Citrix Products This issue only affects VMware Horizon View and related Blast Extreme and PCoIP protocols. Citrix XenDesktop/XenApp and HDX/ICA are unaffected by this issue. References This issue is documented in the latest release notes (Version 361.40 / 362.13) for NVIDIA GRID vGPU for VMware: |
|
|
|
非常感谢雷切尔。
我刚刚用VMWare提出了一张票。 再次感谢。 帕斯卡尔 以上来自于谷歌翻译 以下为原文 much thanks Rachel. I have just raised a ticket with VMWare. Thanks again. PAscal |
|
|
|
回复:症状/错误
vSphere / View部署和NVIDIA GRID vGPU可以看到高GPU负载,即使会话/虚拟机处于空闲状态也可以看到这种情况。 nvidia-smi显示具有活动Horizon会话的vGPU VM的高GPU利用率。 具有活动Horizon连接的vGPU VM利用ESXi主机上的大部分GPU。 即使VM上没有活动的应用程序,GPU利用率在Horizon会话期间仍然很高。 NVIDIA参考 #1735009 VMware已经发布了针对VMware Horizon 7.0.1更新的Blast Extreme协议的修复程序。 有PCoIP问题的用户需要继续提高使用该协议修复VMware的需求。 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 Re: Symptom / Error High GPU load is seen with vSphere/View deployments and NVIDIA GRID vGPU, this may be seen even when sessions/VMs are idle. nvidia-smi shows high GPU utilization for vGPU VMs with active Horizon session. vGPU VMs with an active Horizon connection utilize a high percentage of the GPU on the ESXi host. The GPU utilization remains high for the duration of the Horizon session even if there are no active applications running on the VM. NVIDIA Ref. #1735009 VMware have released a fix for the Blast Extreme protocol with VMware Horizon 7.0.1 update. Users with issues on PCoIP need to continue to raise the need for a fix with that protocol with VMware. Best wishes, Rachel |
|
|
|
只有小组成员才能发言,加入小组>>
使用Vsphere 6.5在Compute模式下使用2个M60卡遇到VM问题
3151 浏览 5 评论
是否有可能获得XenServer 7.1的GRID K2驱动程序?
3565 浏览 4 评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2025-1-12 21:27 , Processed in 1.033295 second(s), Total 111, Slave 95 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号