Linux 性能测试工具

Linux性能测试工具

Linux Benchmark Suite Homepage网站上列举了诸多Linux性能测试工具,包括CPU/RAM/ROM/Cache/net等性能测试。
iozone工具我们在前面的文章中已经介绍和使用过了。今天,我们主要来玩一下关于RAM的读写性能测试。


lmbench

测试工具

这个工具集中包含以下几种测试工具,我们主要使用到bw_mem工具进行Memory read和write测试。

1、Bandwidth benchmarks(带宽测试)

  • Cached file read
  • Memory copy (bcopy)
  • Memory read
  • Memory write
  • Pipe
  • TCP

2、Latency benchmarks(延时测试)

  • Context switching.
  • Networking: connection establishment, pipe, TCP, UDP, and RPC hot potato
  • File system creates and deletes.
  • Process creation.
  • Signal handling
  • System call overhead
  • Memory read latency

3、Miscellanious

  • Processor clock rate calculation

交叉编译

1、从 How do I get LMbench? 或者从github下载源码;
2、设置交叉工具链和编译FLAG
修改src/Makefile中的CCEXFLAGS为:

1
2
CC=/home/xxx/work2/xxx/imx8x/prebuilt/toolchains/aarch64-imx8x-linux/bin/aarch64-poky-linux-gcc
EXFLAGS=-static -march=armv8-a -mfpu=neon -mfloat-abi=hard -mtune=cortex-a35 -funroll-loops

3、编译
使用make OS=arm-linux build编译。
如果出现如下错误:

1
2
3
4
5
6
7
8
9
10
cd src && make
make[1]: Entering directory '/home/xxx/work2/util/lmbench3/src'
make[2]: Entering directory '/home/xxx/work2/util/lmbench3/src'
make[2]: *** No rule to make target '../SCCS/s.ChangeSet', needed by 'bk.ver'. Stop.
make[2]: Leaving directory '/home/xxx/work2/util/lmbench3/src'
Makefile:117: recipe for target 'lmbench' failed
make[1]: *** [lmbench] Error 2
make[1]: Leaving directory '/home/xxx/work2/util/lmbench3/src'
Makefile:20: recipe for target 'build' failed
make: *** [build] Error 2

解决办法就是创建一个文件。

1
2
3
mkdir SCCS
cd SCCS
touch s.ChangeSet

上述编译好了之后,就可以在/bin/arm-linux/目录下生成可执行文件,关于各个可执行文件的用法,可以简要的参照:
lmbench1.0 manual pages
lmbench

memory性能测试

bw_mem文件拷贝到板子上运行如下命令:

1
2
3
4
5
6
7
8
9
10
@android:/var # ./bw_mem 256M wr                                              
268.44 573.30
@android:/var # ./bw_mem 256M fwr
268.44 3034.69
@android:/var #
@android:/var #
@android:/var # ./bw_mem 256M rd
268.44 896.46
@android:/var # ./bw_mem 256M frd
268.44 867.99

各个参数含义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
/*
* rd - 4 byte read, 32 byte stride
* wr - 4 byte write, 32 byte stride
* rdwr - 4 byte read followed by 4 byte write to same place, 32 byte stride
* cp - 4 byte read then 4 byte write to different place, 32 byte stride
* fwr - write every 4 byte word
* frd - read every 4 byte word
* fcp - copy every 4 byte word
*
* All tests do 512 byte chunks in a loop.
*
* XXX - do a 64bit version of this.
*/

输出结果为megabytes, megabytes_per_second

参考资料

Lmbench-Ti
LMbench - Tools for Performance Analysis


STREAM

官方的说法是:

The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.

翻译过来就是:
STREAM benchmark是一个简单的综合基准测试程序,用于测量可持续存储器带宽(以MB/s为单位)和简单矢量内核的相应计算速率。

交叉编译

1、下载源码

1
2
3
mkdir STREAM
cd STREAM
wget -r -R 'index*' -np -nH --cut-dirs=3 http://www.cs.virginia.edu/stream/FTP/Code/

2、指定交叉工具链:

1
CC=/home/xxx/work2/xxx/imx8x/prebuilt/toolchains/aarch64-imx8x-linux/bin/aarch64-poky-linux-gcc

在编译的时候,需要指定一些参数,这些参数对结果影响很大,建议直接查看stream.c源文件。

3、编译c语言版本的工具make stream_c.exe,生成stream_c.exe可执行文件。

4、执行结果为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@android:/var # ./stream_c.exe                                                
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 134214 microseconds.
(= 134214 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 1731.3 0.106711 0.092416 0.125942
Scale: 1294.3 0.140618 0.123622 0.164197
Add: 1107.7 0.232000 0.216672 0.263167
Triad: 991.0 0.252958 0.242168 0.276101
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

目前还不知道这个结果都是些什么,与上文的lmbenchbw_mem工具差异蛮的的,有待后续研究。

STREAM参考资料如下:
STREAM: Sustainable Memory Bandwidth in High Performance Computers
What is STREAM
stream gitlab