如何使用OFF-CPU火焰图调查分析Linux性能问题概述-电子发烧友网

本文用off-cpu火焰图分析一个程序的延迟(主要在拿锁上)，找出来瓶颈，并消除的故事。本文非常值得一读，但是阅码场没有足够的时间将其翻译为中文，希望童鞋们直接读英文。

The Setup

As a perf ormance engineer at MemSQL, one of my primary responsibilities is to ensure that customer Proof of Concepts (POCs) run smoothly. I was recently asked to assist with a big POC, where I was surprised to encounter an uncommon Linux performance issue. I was running a synthetic workload of 16 threads (one for each CPU core). Each one simultaneously executed a very simple query (select count(*) from t where i > 5) against a columnstore table.

In theory, this ought to be a CPU bound operation since it would be reading from a file that was already in disk buffer cache. In practice, our cores were spending about 50% of their time idle

In this post, I’ll walk through some of the debugging techniques and reveal exactly how we reached resolution.

What were our threads doing?

After confirming that our workload was indeed using 16 threads, I looked at the state of our various threads. In every refresh of myhtopwindow, I saw that a handful of threads were in theDstate corresponding to “Uninterruptible sleep”:

Why were we going off CPU?

At this point, I generated anoff-cpu flamegraphusing Linuxperf_eventsto see why we entered this state.Off-CPUmeans that instead of looking at what is keeping the CPU busy, you look at what is preventing it from being busy by things happening elsewhere (e.g. waiting for IO or a lock). The normal way to generate these visualizations is to useperf inject -s, but the machine I tested on did not have a new enough version ofperf. Instead I had to use anawkscriptI had previously written:

$ sudoperfrecord --call-graph=fp -e 'sched:sched_switch' -e 'sched:sched_stat_sleep' -e 'sched:sched_stat_blocked' --pid $(pgrep memsqld | head -n 1) -- sleep 1

[ perf record: Woken up 1 times to write data ]

[ perf record: Captured and wrote 1.343 MB perf.data (~58684 samples) ]

$ sudoperfscript -f time,comm,pid,tid,event,ip,sym,dso,trace -i sched.data | ~/FlameGraph/stackcollapse-perf-sched.awk | ~/FlameGraph/flamegraph.pl --color=io --countname=us >off-cpu.svg

Note: recording scheduler events viaperf recordcan have a very large overhead and should be used cautiously in production environments. This is why I wrap theperf recordaround asleep 1to limit the duration.

In an off-cpu flamegraph, the width of a bar is proportional to the total time spent off cpu. Here we see a lot of time is spent inrwsem_down_write_failed.

From the repeated calls torwsem_down_read_failedandrwsem_down_write_failed, we see that culprit wasmmapcontending in the kernel on themm->mmap_semlock:

down_write(&mm->mmap_sem);

ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,&populate);

up_write(&mm->mmap_sem);

This was causing everymmapsyscall to take 10-20ms (almost half the latency of the query itself). MemSQL was so fast that that we had inadvertently written a benchmark for Linuxmmap!

The fix was simple — we switched from usingmmapto using the traditional filereadinterface. After this change, we nearly doubled our throughput and became CPU bound as we expected:

For more information and discussion around Linux performance,check out the original post on my personal blog.

Download MemSQL Community Edition to run your own performance tests for free today:memsql.com/download

Alex Reece is a systems and performance engineer. He believes in active benchmarking, root cause analysis, and fast code.

声明：本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人，不代表电子发烧友网立场。文章及其配图仅供工程师学习之用，如有内容侵权或者其他违规问题，请联系本站处理。举报投诉

cpu

cpu

+关注

关注
68

文章
10855

浏览量
211608
Linux

Linux

+关注

关注
87

文章
11296

浏览量
209358
SQL

SQL

+关注

关注
1

文章
762

浏览量
44117

原文标题：用off-cpu火焰图调查Linux性能问题

文章出处：【微信号：LinuxDev，微信公众号：Linux阅码场】欢迎添加关注！文章转载请注明出处。

Linux性能分析工具大全

今天浩道跟大家分享关于linux性能分析过程中常用到的分析工具！

发表于 01-05 09:52 •605次阅读

中国锂离子电池原材料市场调查分析报告2008-2009版

中国锂离子电池原材料市场调查分析报告2008-2009版详细内容请见:http://www.boomingfield.com/Html/yjxxcl/2008-9/18

发表于 12-29 15:12

_首届中国嵌入式应用状况_调查分析报告

发表于 08-20 14:48

全志Tina中使用perf分析CPU使用率

perf简介Perf是是内置于Linux内核源码树中的性能剖析(profiling)工具。不仅可以用于应用程序的性能统计分析，还可以用于内核的性能

发表于 05-20 14:25

火焰识别

本人长期从事Linux系统的图像处理产品研发，近期在做火焰识别，火炉温度控制，智能精准灭火，最近在用树莓派，期待本产品有更好的性能，我希望可以有机会试用该开发版，体验新产品的强大功能，同时及时反馈自己的用户体验，使双方共赢。

发表于 07-23 10:18

CPU核心工作性能

CPU核心工作性能 CPU核心概述

发表于 12-17 10:59 •339次阅读

Linux CPU的性能应该如何优化

在Linux系统中，由于成本的限制，往往会存在资源上的不足，例如 CPU、内存、网络、IO 性能。本文，就对 Linux 进程和 CPU 的

发表于 01-18 08:52 •3375次阅读

疫情之下，中国LED显示屏市场活力调查分析

疫情之下，中国LED显示屏市场活力调查分析 众所周知市场活跃是行业发展的主要动力，而2020年初突如其来的疫情，给中国市场带来了巨大的冲击，LED显示屏市场也不例外。而我们收到了行业多方面的市场反馈

发表于 04-02 11:23 •1911次阅读

火焰图系列之使用火焰图隐藏功能提高绘制精度

我们可以看到，火焰图显示， func程序占用了近四分之一的CPU时间。但是由于我们把 func绑定在CPU0和1上执行，根据小学数学我们应该可以计算出来 func最多占用 2/32=6

发表于 06-23 10:15 •2040次阅读

火焰图：全局视野的Linux性能剖析

CPU火焰图中的每一个方框是一个函数，方框的长度，代表了它的执行时间，所以越宽的函数，执行越久。火焰图的楼层每高一层，就是更深一级的函数被调用，最顶层的函数，是叶子函数。

发表于 06-28 09:44 •2050次阅读

杀手级分析——bootchart

之前小弟一直在宣传推广火焰图，结果是很多童鞋凡事都用火焰图。说实话，火焰图特别适合

发表于 09-08 09:13 •7621次阅读

基于linux eBPF的进程off-cpu的方法

的swap等。如下图所示，红色部分属于on-cpu部分，蓝色部分属于off-cpu。一般我们用的perf命令等都是采样on-cpu的指令进行CPU的消耗

发表于 09-25 15:41 •3112次阅读

Linux问题分析与性能优化

文章来源于：https://mp.weixin.qq.com/s/d1NLXGp7teOgskussBXNMQ作者：alex目录排查顺序方法论性能分析工具CPU分析思路内存

发表于 09-06 19:01 •895次阅读

Linux问题故障定位的小技巧

a. on-CPU：执行中，执行中的时间通常又分为用户态时间user和系统态时间sys。 b. off-CPU：等待下一轮上CPU，或者等待I/O、锁、换页等等，其状态可以细分为可执行、匿名换页、睡眠、锁、空闲等状态。

发表于 07-09 16:30 •417次阅读

使用Arthas火焰图工具的Java应用性能分析和优化经验

分享作者在使用Arthas火焰图工具进行Java应用性能分析和优化的经验。

发表于 10-28 09:27 •255次阅读