常用脚本-1.计算java进程cpu消耗topN

一、作用

由于某些原因(死锁、死循环、锁等待),Java进程有时候会出现CPU过高的情况,如果能了解当前Java进程中的每个线程消耗的CPU量以及对应的线程栈,对于问题的定位十分有帮助。

二、使用方法

1. 代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
############################################################################
# @desc: 统计top N消耗CPU的线程栈
# @author: leifu
###########################################################################
if [ $# -eq 0 ];then
echo "please enter java pid"
exit -1
fi
jstack_cmd=""
if [[ $JAVA_HOME != "" ]]; then
jstack_cmd="$JAVA_HOME/bin/jstack"
else
r=`which jstack 2>/dev/null`
if [[ $r != "" ]]; then
jstack_cmd=$r
else
echo "can not find jstack"
exit -2
fi
fi
pid=$1
topN=$2
now=$(date "+%Y%m%d%H%M%S")
jstackFile=jstack_${pid}_${now}.txt
topFile=top_${pid}_${now}.txt
resultFile=result_${pid}_${now}.txt
#jstack到本地
jstack $pid > ${jstackFile}
#top -H -p 按照cpu消耗倒排将线程到本地
top -H -b -n 1 -p $pid | sed '1,/^$/d' | sed '1d;/^$/d' | grep -v $pid | sort -nrk9 | head -${topN} > ${topFile}
#从jstack中找到对应线程信息
cat ${topFile} | while read line
do
echo "$line" | awk '{print "tid: "$1," cpu: %"$9}' >> ${resultFile}
tid_0x=`printf "%0x" $(echo "$line" | awk '{print $1}')`
cat ${jstackFile} | grep $tid_0x -A20 | sed -n '1,/^$/p' >> ${resultFile}
done
#备份jstack和top信息
mkdir -p backup
mv ${jstackFile} backup/
mv ${topFile} backup/

有关代码的实现细节,见第三节。

2. 使用

1
sh jstack_top_cpu.sh ${pid} ${topN}

3. 返回结果

1
2
backup目录中包含了: jstack的快照和top -H -p ${pid}的快照
当前目录中: result_${timestamp}是最终结果。

例如返回结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
tid: 37745 cpu: %27.7
"DubboServerHandler-10.10.34.11:20880-thread-254" daemon prio=10 tid=0x00007ffaa0053000 nid=0x9371 waiting on condition [0x00007ff576458000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006bfddea40> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:422)
at java.util.concurrent.FutureTask.get(FutureTask.java:199)
at com.sohu.tv.mobil.async.service.impl.AsyncComponentImpl.executeMultiTask(AsyncComponentImpl.java:76)
at com.sohu.tv.mobil.service.feature.data.UgcFeatureBatchDataComponentImplV5.get(UgcFeatureBatchDataComponentImplV5.java:115)
at com.sohu.tv.mobil.common.data.impl.BatchDataServiceImpl.mget(BatchDataServiceImpl.java:88)
at com.sohu.tv.mobil.service.feature.impl.UgcFeatureServiceImpl.getUgcFeatureBatch(UgcFeatureServiceImpl.java:66)
at com.sohu.tv.mobil.service.strategy.ugc.traffic.AbstractPairTrafficStrategy.getResult(AbstractPairTrafficStrategy.java:104)
at com.sohu.tv.mobil.service.api.BlogServiceImpl.dealStrategy(BlogServiceImpl.java:217)
at com.sohu.tv.mobil.service.api.BlogServiceImpl.doRecommend(BlogServiceImpl.java:130)
at com.sohu.tv.mobil.service.api.BlogServiceImpl.newRecommend(BlogServiceImpl.java:61)
at com.alibaba.dubbo.common.bytecode.Wrapper25.invokeMethod(Wrapper25.java)
at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46)
at com.alibaba.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:72)
at com.alibaba.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:53)
at com.alibaba.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:78)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
tid: 37261 cpu: %25.9
"DubboServerHandler-10.10.34.11:20880-thread-136" daemon prio=10 tid=0x00007ffaa8022000 nid=0x918d waiting on condition [0x00007ff57df3d000]
java.lang.Thread.State: TIMED_WAITING (parking)

三、脚本实现思路

1.利用top -H -p ${pid}找到当前进程下所有线程,按照cpu倒序topN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
37745 root 20 0 34.3g 23g 18m S 27.7 76.3 53:45.06 java
37261 root 20 0 34.3g 23g 18m S 25.9 76.3 15:58.38 java
38167 root 20 0 34.3g 23g 18m S 24.0 76.3 16:48.27 java
37070 root 20 0 34.3g 23g 18m R 24.0 76.3 2:50.49 java
37400 root 20 0 34.3g 23g 18m S 22.2 76.3 5:24.99 java
38116 root 20 0 34.3g 23g 18m S 20.3 76.3 5:48.78 java
37437 root 20 0 34.3g 23g 18m S 16.6 76.3 11:34.33 java
37895 root 20 0 34.3g 23g 18m S 14.8 76.3 22:10.59 java
37005 root 20 0 34.3g 23g 18m S 11.1 76.3 9:52.72 java
37426 root 20 0 34.3g 23g 18m S 5.5 76.3 1:45.73 java
37105 root 20 0 34.3g 23g 18m S 3.7 76.3 8:53.36 java
38602 root 20 0 34.3g 23g 18m S 1.8 76.3 0:04.94 java
37775 root 20 0 34.3g 23g 18m S 1.8 76.3 0:24.95 java
37431 root 20 0 34.3g 23g 18m S 1.8 76.3 3:45.74 java
37310 root 20 0 34.3g 23g 18m S 1.8 76.3 0:49.59 java
37277 root 20 0 34.3g 23g 18m S 1.8 76.3 3:19.19 java
37179 root 20 0 34.3g 23g 18m S 1.8 76.3 0:45.26 java
37169 root 20 0 34.3g 23g 18m S 1.8 76.3 0:54.43 java
36972 root 20 0 34.3g 23g 18m S 1.8 76.3 2:50.36 java
36965 root 20 0 34.3g 23g 18m S 1.8 76.3 2:49.13 java

2.使用jstack ${pid}记录下进程快照

3.将第一步中的线程id(第一列)转为是十六进制,在jstack中寻找进记录栈信息(grep -A)

例如十进制37745的十六进制是0x9371,可以在jstack的快照找到对应的堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat jstack_36645_20170712135221.txt | grep -A 20 0x9371
"DubboServerHandler-10.10.34.11:20880-thread-254" daemon prio=10 tid=0x00007ffaa0053000 nid=0x9371 waiting on condition [0x00007ff576458000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006bfddea40> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:422)
at java.util.concurrent.FutureTask.get(FutureTask.java:199)
at com.sohu.tv.mobil.async.service.impl.AsyncComponentImpl.executeMultiTask(AsyncComponentImpl.java:76)
at com.sohu.tv.mobil.service.feature.data.UgcFeatureBatchDataComponentImplV5.get(UgcFeatureBatchDataComponentImplV5.java:115)
at com.sohu.tv.mobil.common.data.impl.BatchDataServiceImpl.mget(BatchDataServiceImpl.java:88)
at com.sohu.tv.mobil.service.feature.impl.UgcFeatureServiceImpl.getUgcFeatureBatch(UgcFeatureServiceImpl.java:66)
at com.sohu.tv.mobil.service.strategy.ugc.traffic.AbstractPairTrafficStrategy.getResult(AbstractPairTrafficStrategy.java:104)
at com.sohu.tv.mobil.service.api.BlogServiceImpl.dealStrategy(BlogServiceImpl.java:217)
at com.sohu.tv.mobil.service.api.BlogServiceImpl.doRecommend(BlogServiceImpl.java:130)
at com.sohu.tv.mobil.service.api.BlogServiceImpl.newRecommend(BlogServiceImpl.java:61)
at com.alibaba.dubbo.common.bytecode.Wrapper25.invokeMethod(Wrapper25.java)
at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46)
1
启发:除了可以按照cpu倒排,还可以按照time倒排,算出哪些线程生命周期比较长。