发帖

【优惠升级】华秋PCB首单最高立减100元，SMT免费贴片！！！

[问答]

用于vGPU的GPU调度程序

6454 gpu NVIDIA

问答对人有帮助，内容完整，我也想知道答案 0 你好。我对NVidia开发人员提出了问题和建议。关于GPU调度程序的真正功能的信息很少。调度程序只是简单的循环法吗？它是可编程的吗？它是从dom0编程的（例如，Dom0中的vgpu / libnvidia-vgpu进程）？十多年来，有更复杂的调度程序。如果你查看网络硬件，你可以看到更多高级调度程序（https://en.wikipedia.org/wiki/Network_scheduler）。由于NVidia背景基于Sun Microsystems，因此SunOS / Solaris中有更复杂的处理器调度程序示例。 SunOS / Solaris公平共享调度程序（FSS）（实现共享，包括分层共享（区域/项目））和动态池（实现封顶和固定/绑定）的组合非常强大，并且易于实现并且几乎可以展示其功能 20年。 GPU调度程序可以更复杂吗？如果是的话，还有更多实际目标： - 如果共享是可编程的，则应该删除“在一个物理gpu中一个类型的所有vgpu（例如k120q）”的限制！ - 如果共享的分层可编程性比CUDA可用，则所有vGPU类型都应该可用！ - 如果调度程序具有固定/绑定功能（对SMX），则由于较少的指令和数据缓存未命中，性能应该提升！ - 如果调度程序（可能是非分层的）可以移动到domU for Grid2.0“完整”配置文件M6-8Q和M60-8Q可以删除dom0的开销并在domU中启用CUDA而不是相同的功能应该可用于k180q和k280q （是的，我仍然乐观地认为NVidia总部允许将此功能向后移植到K1 / K2网格）！ GPU调度程序是否有任何可观察性API（性能监视器API）（每个vGPU（在Dom0中）和vGPU内部的每个进程（在DomU中））？（https://gridforums.nvidia.com/de ... utilization-per-vm/）谢谢你的回答，马丁以上来自于谷歌翻译以下为原文 Hello. I have questions and proposal to NVidia developers. There are few information about true function of GPU scheduler. Is the scheduler only simple round-robin ? Is it programmable ? Is it programmed from dom0 (eg. vgpu/libnvidia-vgpu process in Dom0) ? There are more sophisticated schedulers for more then decade. If you look in network hardware you can see many more advanced schedulers (https://en.wikipedia.org/wiki/Network_scheduler). Because NVidia background is based on Sun Microsystems there is more sophisticated example of processor scheduler in SunOS/Solaris. The SunOS/Solaris combination of Fair Share Scheduler (FSS) (implements sharing, including hierarchical shares (zones/projects)) and dynamic pools (implements capping and pinning/binding) is VERY powerful and also simple to implement and demonstrating its power for nearly 20 years. Can the GPU scheduler be more sophisticated ? If yes, there are more practical goals: - If the share is programmable than the restriction about "all vgpu of one type (for example k120q) in one physical gpu" should be removed ! - If the share is hierarchically programmable than the CUDA in all vGPU types should be available ! - If the scheduler have pinning/binding capability (to SMX) than the performance should be boosted due to less instruction and data cache misses ! - If the scheduler (probably non hierarchical) can be moved to domU for Grid2.0 "full" profiles M6-8Q and M60-8Q that remove overhead of dom0 and enable CUDA in domU than the same feature should be available for k180q and k280q (yes, I am still optimistic that NVidia HQ allows to backport this feature and more to K1/K2 grid) ! Is there any observability API (performance monitor API) for GPU scheduler (per vGPU (in Dom0) and per processes inside vGPU (in DomU)) ? ( https://gridforums.nvidia.com/de ... utilization-per-vm/ ) Thanks for answers, Martin 0
2018-9-11 16:37:04　　评论淘帖0 邀请回答您可以邀请以下用户，快速回答问题 × 60user188 该类别下有 8 个回答。邀请回答 wo4456 该类别下有 7 个回答。邀请回答 jingfu888 该类别下有 7 个回答。邀请回答 muuwfwr 该类别下有 6 个回答。邀请回答 cmlzm 该类别下有 5 个回答。邀请回答鑫12345 该类别下有 5 个回答。邀请回答 nnmnnm 该类别下有 5 个回答。邀请回答 zsqzsqzs 该类别下有 5 个回答。邀请回答 myf888 该类别下有 5 个回答。邀请回答 60user30 该类别下有 4 个回答。邀请回答 www036 该类别下有 4 个回答。邀请回答 mao5091 该类别下有 4 个回答。邀请回答 ey8616 该类别下有 4 个回答。邀请回答 zjjcn 该类别下有 3 个回答。邀请回答 zzy0407 该类别下有 3 个回答。邀请回答 h1654155275.6441 该类别下有 3 个回答。邀请回答 yinxuebaitou 该类别下有 3 个回答。邀请回答 8237jfsduw 该类别下有 3 个回答。邀请回答 60user9 该类别下有 3 个回答。邀请回答 zbinxiang 该类别下有 3 个回答。邀请回答举报刘璐相关推荐 • GPU-Z可以监控每个vGPU的总GPU利用率吗？ 2474 • GPU虚拟化在哪里发生？ 1940 • vGPU和vSGA同时进行是否可以将1个GPU或1个卡指定为vGPU 3582 • 如何在vGPU环境中优化GPU性能 3008 • XenDesktop vGPU PoC应用程序问题 3220 • 完整档案k180q/k280q 2542 • 可以在vGPU配置文件上运行TensorFlow吗？ 3364 • 为什么nvidia不建议将m10用于SolidWorks 6545 • XS 6.5和VGPU未在控制台中看到vGPU配置文件 2053 • 使用vGPU的Horizon View：未分配GPU资源 4166 9个回答

答案对人有帮助，有参考价值 0 嗨，MArtin，我认为可以解除对同质（所有相同）vGPU类型的限制，但它有点像我头脑中的普通可编程阵列，固定大小意味着可以有效地完成许多事情。我认为还需要避免记忆碎片，特别是当GPU被重新分配时（我想到vMotion和类似的可能的那一天）将是一个考虑因素。确保进行常规和持续测试，质量保证和回归测试的需要会带来一些限制。 BAck移植总是需要投资额外的质量保证，不仅测试我们，还测试OEM测试实验室。各种各样的事情是可能的，但我们必须保持质量和可靠性。可以固定和封装CPU，但我自己的经验非常混杂，尤其是CAD / 3D应用程序 - 反向固定PTC Creo实际上提高了性能，直观的钉扎降低了它，因为一些非常严重的半光谱行为iirc。太多配置选项通常意味着用户会陷入困境。我不是这方面的专家 - 我希望有人会出现这种情况。尽管我们需要知道用户故事/业务案例是什么，但我们需要了解每个功能请求....为什么你需要混合vGPU类型和证据，它值得在测试矩阵等方面进行大量扩展...... 最好的祝愿，雷切尔以上来自于谷歌翻译以下为原文 Hi MArtin, The restriction on homogenous (all the same) vGPU types could I guess be lifted however it's a bit like normal programmable arrays in my head, that a fixed size means many things can be done efficiently. I think also the need to avoid memmory fragmentation particularly as GPUs reassigned (I'm think of the day when vMotion and similar is possible) would be a consideration. Some restrictions are imposed by the need to ensure cotinual and ongoing testing, QA and regression testing. BAck porting always requires investment in extra QA and test for not just us but also the OEMs test labs. All sorts of things are possible but we must maintain quality and reliability. It is possible to pin and cap CPUs but my own experiences have been extremely mixed particularly with CAD/3D applications - reverse pinning PTC Creo actually improved performance and the intuitive pinning degraded it because of some very stragne semophore behaviour iirc. Too many configuration options can often mean users get themselves in a real muddle. I'm not an expert in this area - I'm hoping someone who is will pop along. With every feature request though we need to know what the user story/business case is.... why you _need_ to mix vGPU types and evidence it's worth a substantial expansion in the test matrix etc... Best wishes, Rachel

2018-9-11 16:44:41 评论举报张倩

答案对人有帮助，有参考价值 0 vGPU启动的“广度优先”分配机制对于性能是最佳的，但是第一次分配确定整个GPU的vGPU配置文件并且它不可移动。例如，在K1上启动新的4x k120q，而下一个新的k160q是不可启动的，旧的k120q是不可移动的。是的，还有“深度优先”，但它对一个GPU上共享的4x k120q的性能有影响。这导致这五个VM / VDI示例的用户体验（用户体验，今年的NVidia流行语）较低。最好的问候，M.C> 以上来自于谷歌翻译以下为原文 There is "breadth-first" allocation mechanism for vGPU startup that is optimal for performance but first allocation determine vGPU profile for whole GPU and it is not movable. For example start new 4x k120q on K1 and next new k160q is unstartable and old k120q are unmovable. Yes, there is also "depth-first" but it has impact on performance for shared 4x k120q on one GPU. This leads to lower UX (user-experience, NVidia buzzword for this year) for this five VM/VDI example. Best regards, M.C>

2018-9-11 16:57:26 评论举报李星星

答案对人有帮助，有参考价值 0 嗨马丁，广度和深度分配是由XenServer / XenCenter和VMware中的等效功能实现的功能。我想知道你是否真的需要更多的管理工具控制。我仍然有点担心这会大大扩展QA矩阵; 很多用户都有足够的用户或类似的应用程序，他们可以很容易地汇集。我没有听到很多人告诉我每个pGPU拥有同质虚拟机是个大问题... 最好的祝愿，雷切尔以上来自于谷歌翻译以下为原文 Hi Martin, The breadth and depth allocations are functionality implemented by XenServer/XenCenter and by the equivalent in VMware. I'm wondering if you really need more control in the management tools. I'm still somewhat wary that this could expand the QA matrix substantially; a lot of users have sufficient users or similar apps that they can pool easily. I haven't heard a large number of people telling me that having homogenous VMs per pGPU is a big issue... Best wishes, Rachel

2018-9-11 17:14:30 评论举报李月如

答案对人有帮助，有参考价值 0 嗨，MArtin，我和Citrix的产品管理团队谈了一句话，虽然他们可能会调整分发，但它仍然只是一天的开始。实际上，他们认为VMotion / XenMotion是一种可以根据需要平衡负载的前进方式（这是Citrix / VMware和NVIDIA都希望长期实现的目标）。最好的祝愿，雷切尔以上来自于谷歌翻译以下为原文 Hi MArtin, I had a word with the product management team at Citrix and whilst they could possibly tweak the distribution it would still just be start of day. Really long goal they feel VMotion/XenMotion is the way forward that would balance load as needed (this is something both Citrix/VMware and NVIDIA are keen to achieve long term). Best wishes, Rachel

2018-9-11 17:23:39 评论举报李进锋

答案对人有帮助，有参考价值 0 网格5.0 Pascal芯片的新“QoS调度程序”：我不知道Pascal的这个“QoS调度程序”是否只是营销品牌愚蠢的“固定/平等共享调度程序”。 “... Pascal有一个名为Preemption的新硬件功能，允许在vGPU配置文件上进行计算.Preemption是一个允许任务上下文切换的功能。它使GPU能够基本上暂停和恢复任务......” - 请参阅https://gridforums.nvidia.com/default/topic/1604/nvidia-grid-vgpu/compute-mode-quot-prohibited-quot-grid-m60-/post/5161/#5161 - 请参阅http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/10 - 在http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf中搜索“preemtion” - 在http://on-demand.gputechconf.com/gtc/2016/presentation/s6810-swapna-matwankar-optimizing-application-performance-cuda-tools.pdf中搜索“preemtion”。 - cuDeviceGetAttribute（） - CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED - https://devtalk.nvidia.com/default/topic/1023524/system-management-and-monitoring-nvml-/-vgpu-management-qos-api-/ - docs https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy - 但计算抢占不会作为程序员可见控件公开！现在很明显，NVidia重新发现了轮子 - 帕斯卡芯片中的“先发制人”。欢迎来到1964年！（参见https://en.wikipedia.org/wiki/Computer_multitasking#Preemptive_multitasking）。本公开解释了vGPU和CUDA在先前芯片世代中的所有缺陷，即vGPU半虚拟化驱动程序无法强制切换SMX / SMM上下文，并且重度依赖于客户驱动程序协作多任务（受FRL限制）和客户操作系统。 NVidia令人难以置信，羞耻，羞耻，羞耻！现在，所有“GRID P * - * Q”配置文件都启用了CUDA。新的“可观察性”：每个进程利用率API（可用于> = r375），最终公开的函数为nvmlDeviceGetProcessUtilization（）和nvmlDeviceGetVgpuProcessUtilization（）（请参阅https://devtalk.nvidia.com/default/topic/934756/system-management-and-monitoring-nvml- /每进程统计，NVIDIA-SMI-pmon- /）。让我们再等几年，在SMX / SMM / SMP上固定/绑定要缓存有效，在GPU上混合vGPU配置文件...... 以上来自于谷歌翻译以下为原文 Grid 5.0 New "QoS scheduler" for Pascal chips: I do not known if this "QoS scheduler" for Pascal is only marketing branded stupid "fixed/equal share scheduler". "... Pascal has a new hardware feature called Preemption that allows Compute on vGPU profiles. Preemption is a feature that allows task Context switching. It gives the GPU the ability to essentially pause and resume a task ..." - see https://gridforums.nvidia.com/default/topic/1604/nvidia-grid-vgpu/compute-mode-quot-prohibited-quot-grid-m60-/post/5161/#5161 - see http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/10 - search for "preemtion" in http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf - search for "preemtion" in http://on-demand.gputechconf.com/gtc/2016/presentation/s6810-swapna-matwankar-optimizing-application-performance-cuda-tools.pdf - cuDeviceGetAttribute() - CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED - https://devtalk.nvidia.com/default/topic/1023524/system-management-and-monitoring-nvml-/-vgpu-management-qos-api-/ - docs https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy - BUT compute preemption isn't exposed as a programmer visible control ! Now it is clear that NVidia rediscovered wheel - "preemption" in Pascal chip. Welcome to year 1964 ! (see https://en.wikipedia.org/wiki/Computer_multitasking#Preemptive_multitasking). This disclosure explains all pitfalls with vGPU and CUDA in previous chip generations that vGPU paravirtualized driver was unable to force switch SMX/SMM context and heavy depends on guest drivers cooperative multitasking (limited by FRL) and guest operating system. Unbelievable, shame, shame, shame on NVidia ! CUDA is now enabled in all "GRID P-Q" profiles. New "observability": Per process utilization API (usable for >= r375) with finally disclosured functions nvmlDeviceGetProcessUtilization() and nvmlDeviceGetVgpuProcessUtilization() (see https://devtalk.nvidia.com/default/topic/934756/system-management-and-monitoring-nvml-/per-process-statistics-nvidia-smi-pmon-/). Let's wait few more years, for pinning/binding on SMX/SMM/SMP to be cache effective, for mixing vGPU profiles on GPU ...

2018-9-11 17:42:29 评论举报李天竹

答案对人有帮助，有参考价值 0 嗨马丁对Pascal GPU（A，B和Q）上的所有配置文件启用CUDA（App，vPC和vDWS）至于在相同的物理GPU上混合FB配置文件，我们中的一些人在一段时间之前使用NVIDIA工程提出了这一点，但是有理由说明它没有被提供。正如您所说，希望随着威廉希尔官方网站的发展，这将作为一项功能添加。问候本以上来自于谷歌翻译以下为原文 Hi Martin CUDA is enabled for all profiles on Pascal GPUs (A, B & Q) (App, vPC & vDWS) As for mixing FB Profiles on the same physical GPU, a few of us raised this with NVIDIA engineering a while back, however there are reasons why it hasn't been offered. As you say, hopefully this will be added as a feature as the technology develops. Regards Ben

2018-9-11 17:54:38 评论举报王海燕

答案对人有帮助，有参考价值 0 CUDA / OpenCL仅在P * - * Q（https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#features-grid-vgpu）中。数字签名/usr/share/nvidia/vgpu/vgpuConfig.xml优先于/usr/share/nvidia/vgx/.conf（请查看“egrep -i'cuda \| vgpuType \| signature'usr / share / nvidia / vgpu /vgpuConfig.xml“和”grep cuda_enabled /usr/share/nvidia/vgx/.conf“）（https://gridforums.nvidia.com/default/topic/258/nvidia-grid-vgpu/documentation-for- vgpu-configs / post / 2087 /＃2087）...你应该发布你的/usr/share/nvidia/vgpu/vgpuConfig.xml和/ usr / bin / nvidia-vgpud 以上来自于谷歌翻译以下为原文 CUDA/OpenCL is only in P-Q (https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#features-grid-vgpu). Digitally signed /usr/share/nvidia/vgpu/vgpuConfig.xml has precendence over /usr/share/nvidia/vgx/.conf (check with "egrep -i 'cuda\|vgpuType\|signature' usr/share/nvidia/vgpu/vgpuConfig.xml" and "grep cuda_enabled /usr/share/nvidia/vgx/.conf") (https://gridforums.nvidia.com/default/topic/258/nvidia-grid-vgpu/documentation-for-vgpu-configs/post/2087/#2087) ... you should post your's /usr/share/nvidia/vgpu/vgpuConfig.xml and /usr/bin/nvidia-vgpud

2018-9-11 18:01:20 评论举报张晶晶

答案对人有帮助，有参考价值 0 道歉，你是对的。我刚刚重新检查，那些评估驱动程序不是生产。生产驱动程序没有此功能。请注意，我上面编辑了我的帖子，删除了错误的驱动程序信息，以免给其他阅读此内容的人造成混淆以上来自于谷歌翻译以下为原文 My apologies, you're correct. I've just re-checked and those were evaluation drivers not production. Production drivers do not have this functionality. Please note, I've edited my post above to remove the incorrect driver information so as not to add confusion for anyone else reading this

2018-9-11 18:11:17 评论举报张玉华

答案对人有帮助，有参考价值 0 Nvidia更新了调度程序幻灯片。正如预期的那样，“QoS”标题被删除（新的抢占式调度程序远离真正的QoS）。您可以使用旧的“共享/尽力/时间切片调度程序”与协作式多任务处理，或者您可以使用“固定/等同调度程序”进行抢占式多任务处理，并且由于“空/未使用”插槽而导致卡性能丢失。无法重新分配“未使用”的插槽！每个VM的“插槽”应该是可编程的（如设置比率/共享（最小保证和重新分配未使用）和设置最大值（加盖）！）。（调度程序由驱动程序参数选择（https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy）。）更新摘要（删除“QoS”）：基于协作式多任务处理的共享/尽力而为/时间切片调度程序：固定/等同调度程序基于抢占式多任务处理，性能丢失（“空/未使用的插槽”！）： GTC-EU-2017更新：以上来自于谷歌翻译以下为原文 Nvidia updated scheduler slides. As expected "QoS" title was removed (the new preemptive schedulers are far away from true QoS). You can use old "Shared/Best Effort/Time Sliced Scheduler" with cooperative multitasking OR you can use "Fixed/Equal Scheduler" with preemptive multitasking and with card performance lost due to "empty/unused" slots. It is not possible to redistribute "unused" slots ! The "slots" per VM should be programmable (like set ratio/share (minimum guaranteed and redistribute unused) and set maximum (capping) !). (Scheduler is chosen by driver parameter (https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy).) Updated summary (removed "QoS"): Shared/Best Effort/Time Sliced Scheduler based on cooperative multitasking: Fixed/Equal Schedulers based on preemptive multitasking with performance lost ("empty/unused slots"!): Update from GTC-EU-2017:

2018-9-11 18:30:13 评论举报李梅

只有小组成员才能发言，加入小组>>

18个成员聚集在这个小组

精选推荐

使用Vsphere 6.5在Compute模式下使用2个M60卡遇到VM问题

3129 浏览 5 评论
是否有可能获得XenServer 7.1的GRID K2驱动程序？

3538 浏览 4 评论

最新话题

热门话题

创建小组步骤

创建小组创建自己的地盘
个性设置精心打造小组空间
邀请好友邀请好友加入我的小组
小组升级小组积分升级赢得社区推荐

创建属于自己的小组

快速回复 返回顶部 返回列表

关注微信公众号

电子发烧友网

电子发烧友论坛

社区合作: 刘勇; 联系电话：15994832713; 邮箱地址：liuyong@huaqiu.com

社区管理: elecfans短短; 微信：elecfans_666; 邮箱：users@huaqiu.com

【优惠升级】华秋PCB首单最高立减100元，SMT免费贴片！！！

返回英伟达

回复

关闭

站长推荐 /7

快速回复 返回顶部 返回列表

- 威廉希尔官方网站社区: HarmonyOS威廉希尔官方网站社区

RISC-V MCU威廉希尔官方网站社区

FPGA开发者威廉希尔官方网站社区

- OpenHarmony开源社区: OpenHarmony开源社区

- 嵌入式论坛: ARM威廉希尔官方网站论坛

STM32/STM8威廉希尔官方网站论坛

嵌入式威廉希尔官方网站论坛

单片机/MCU论坛

RISC-V威廉希尔官方网站论坛

瑞芯微Rockchip开发者社区

FPGA|CPLD|ASIC论坛

DSP论坛

- 电路图及DIY: 电路设计论坛

DIY及创意

电子元器件论坛

专家问答

- 电源威廉希尔官方网站论坛: 电源威廉希尔官方网站论坛

无线充电威廉希尔官方网站

- 综合威廉希尔官方网站与应用: 机器人论坛

USB论坛

电机控制

模拟威廉希尔官方网站

音视频威廉希尔官方网站

综合威廉希尔官方网站交流

上位机软件（C/Python/Java等）

- 无线通信论坛: WIFI威廉希尔官方网站

蓝牙威廉希尔官方网站

天线|RF射频|微波|雷达威廉希尔官方网站

- EDA设计论坛: PCB设计论坛

DigiPCBA论坛

Protel|AD|DXP论坛

PADS威廉希尔官方网站论坛

Allegro论坛

multisim论坛

proteus论坛|仿真论坛

KiCad EDA 中文论坛

DFM|可制造性设计论坛

- 测试测量论坛: LabVIEW论坛

Matlab论坛

测试测量威廉希尔官方网站

传感威廉希尔官方网站

- 招聘/交友/外包/交易/杂谈: 项目外包

供需及二手交易

工程师杂谈|交友

招聘|求职|工程师职场

- 官方社区: 发烧友官方/活动

华秋商城

华秋电路

time

recommend

hot

post

—
—
—

版
块
导
航