0-Day CI Linux Kernel Performance Report (V5.13)

Published: 07/06/2021

By Beibei Si

 Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, test parameter description 
  • Section 3, merged regressions and improvements in v5.13 release candidates
  • Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.13 release cycle
  • Section 5, performance comparison among different kernel releases
  • Section 6, test machine list

Test Parameters Descriptions

Here are the descriptions for each parameter/field used in the tests.

Classification Name Description
General runtime Run the test case within a certain time period (seconds or minutes)
  nr_task

If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1.

If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number

  nr_threads Alias of nr_task
  iterations Number to repeat this job
  test_size Test disk size or memory size
  set_nic_irq_affinity Set NIC interrupt affinity
  disable_latency_stats Latency_stats may introduce too much noise if there are too many context switches, allow to disable it
  transparent_hugepage Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage)
  boot_params:bp1_memmap Boot parameters of memmap
  disk:nr_pmem number of pmem partitions used by test
  swap:priority Priority means the  priority  of  the  swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value. 
Test Machine model Name of Intel processor microarchitecture
  brand Brand name of cpu
  cpu_number Number of cpu
  memory Size of memory

Linux Kernel V5.13 Release Test

Linus had released the 5.13 kernel, and mentioned “Of course, if the last week was small and calm, 5.13 overall is actually fairly large. In fact, it's one of the bigger 5.x releases, with over 16k commits (over 17k if you count merges), from over 2k developers. But it's a 'big all over' kind of thing, not something particular that stands out as particularly unusual.”. Headline features in this release included the "misc" group controllermultiple sources for trusted keys, kernel stack randomization on every system call, support for Clang control-flow integrity enforcementthe ability to call kernel functions directly from BPF programs, minor-fault handling for userfaultfd(), the removal of /dev/kmem, the Landlock security module, and, of course, thousands of cleanups and fixes.

0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 11 regressions and 10 improvements during feature development phase for v5.13. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

Observation Summary

0-Day CI observed 11 regressions and 10 improvements during the feature development phase for v5.13, which is in the time frame from v5.13-rc1 to v5.13 release.

Test Indicator Mail Test Scenario Test Machine Development Base Status
aim7.jobs-per-min [sched,fair] 0c2de3f054: -4.1% regression

disk: 1BRD_48G

fs: xfs

test: sync_disk_rw

load: 600

cpufreq_governor: performance

lkp-cpl-4sp1 v5.12-rc2 merged at v5.13-rc1, no response from author yet
aim9.sync_disk_rw.ops_per_sec [sched] d27e9ae2f2: -2.1% regression

testtime: 300s

test: sync_disk_rw

cpufreq_governor: performance

lkp-knl-f1 v5.12-rc2 merged at v5.13-rc1, no response from author yet
fio.write_iops [mm] 8cc621d2f4: -21.8% regression

disk: 2pmem

fs: ext4

runtime: 200s

nr_task: 50%

time_based: tb

rw: randwrite

bs: 4k

ioengine: libaio

test_size: 200G

cpufreq_governor: performance

lkp-csl-2sp6 v5.12 merged at v5.13-rc1, author accepted the regression and provided a test patch, the regression reduced to -2.9%
netperf.Throughput_tps [smp] a32a4d8a81: -2.1% regression

ip: ipv4

runtime: 300s

nr_threads: 1

cluster: cs-localhost

test: UDP_RR

cpufreq_governor: performance

lkp-csl-2ap3 v5.12-rc2 merged at v5.13-rc1, author couldn't reproduce in his environment, but he would work further
stress-ng.fanotify.ops_per_sec [fanotify] 7cea2a3c50: -23.4% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: fanotify

cpufreq_governor: performance

lkp-csl-2sp5 v5.12-rc3 merged at v5.13-rc1, author couldn't reproduce in his environment, but there’s already fix patch merged at v5.13-rc5, regression recovered to +32.2%
stress-ng.loop.ops_per_sec [block] c76f48eb5c: -99.9% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: loop

cpufreq_governor: performance

lkp-csl-2sp5 v5.12-rc4 merged at v5.13-rc1, no response from author but there's already fix patch merged at v5.13-rc6, regression recovered to +101263.6% 
stress-ng.procfs.ops_per_sec [sched, debug] 3b87f136f8: -31.7% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: procfs

cpufreq_governor: performance

lkp-csl-2sp5 v5.12-rc2 merged at v5.13-rc1, author accepted the regression and thought it’s expected
stress-ng.sigpending.ops_per_sec [perf/x86/intel] f83d2f91d2: -4.0% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: sigpending

cpufreq_governor: performance

lkp-csl-2sp5 v5.12-rc3 merged at v5.13-rc1, no response from author yet
tbench.throughput-MB/sec [sched/fair] c722f35b51: -29.1% regression

nr_threads: 100%

cluster: cs-localhost

cpufreq_governor: performance

lkp-csl-2sp8 v5.12-rc2 merged at v5.13-rc1, author accepted the regression and is in progress to fix it
vm-scalability.throughput [mm] 2d146aa3aa: -2.8% regression

runtime: 300s

test: lru-file-mmap-read-rand

cpufreq_governor: performance

lkp-csl-2ap4 v5.12 merged at v5.13-rc1, no response from author yet
will-it-scale.per_thread_ops [spi] c7299fea67: -4.0% regression

nr_task: 100%

mode: thread

test: getppid1

cpufreq_governor: performance

lkp-csl-2sp9 v5.12-rc2 merged at v5.13-rc4, no response from author yet
Improvement
aim7.jobs-per-min [xfs] 1fea323ff0: 2.4% improvement

disk: 4BRD_12G

md: RAID1

fs: xfs

test: disk_rw

load: 3000

cpufreq_governor: performance

lkp-csl-2sp9 v5.12-rc4 merged at v5.13-rc1
phoronix-test-suite.npb.FT.A.total_mop_s [mm/gup] 31b912de13: 61.8% improvement

test: npb-1.3.1

option_a: FT.A

cpufreq_governor: performance

lkp-csl-2sp8 v5.12 merged at v5.13-rc1
pigz.throughput [tty] ffb324e6f8: 1.3% improvement

nr_threads: 25%

blocksize: 128K

cpufreq_governor: performance

lkp-knm02 v5.13-rc1 merged at v5.13-rc2
stress-ng.fanotify.ops_per_sec [fanotify] a8b98c808e: 32.2% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: filesystem

test: fanotify

cpufreq_governor: performance

lkp-csl-2sp7 v5.13-rc1 merged at v5.13-rc5
stress-ng.klog.ops_per_sec [printk] 996e966640: 1097.4% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: klog

cpufreq_governor: performance

lkp-csl-2sp5 v5.11 merged at v5.13-rc1
stress-ng.link.ops_per_sec [f2fs] b5d15199a2: 175.3% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: f2fs

class: filesystem

test: link

cpufreq_governor: performance

lkp-ivb-2ep1 v5.12-rc2 merged at v5.13-rc1 
stress-ng.loop.ops_per_sec [block] 990e78116d: 101263.6% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: loop

cpufreq_governor: performance

lkp-csl-2sp5 v5.12 merged at v5.13-rc6
will-it-scale.per_process_ops [objtool/x86] 9bc0bb5072: 5.6% improvement

nr_task: 16

mode: process

test: eventfd1

cpufreq_governor: performance

lkp-csl-2ap2 v5.12-rc5 merged at v5.13-rc1
will-it-scale.per_process_ops [bpf] a10787e6d5: 3.5% improvement

nr_task: 16

mode: process

test: mmap2

cpufreq_governor: performance

lkp-csl-2sp9 v5.11 merged at v5.13-rc1 
will-it-scale.per_thread_ops [x86/entry] fe950f6020: 5.2% improvement

nr_task: 50%

mode: thread

test: lseek1

cpufreq_governor: performance

lkp-csl-2sp9 v5.12-rc6 merged at v5.13-rc1

stress-ng.procfs.ops_per_sec

stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

Scenario: procfs Test on ext4

Commit 3b87f136f8 was reported to have -31.7% regression of stress-ng.procfs.ops_per_sec when comparing to v5.12-rc2. It was merged to mainline at v5.13-rc1.

Correlated commits

3b87f136f8 sched,debug: Convert sysctl sched_domains to debugfs
branch tip/sched/core
report [sched, debug] 3b87f136f8: -31.7% regression
test scenario

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: procfs

cpufreq_governor: performance

test machine lkp-csl-2sp5
status merged on v5.13-rc1, author accepted the regression and thought it’s expected

stress-ng.klog.ops_per_sec

stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

Scenario: klog Test on ext4

996e966640 was reported to have 1097.4% improvement of stress-ng.klog.ops_per_sec when comparing to v5.11. It was merged to mainline at v5.13-rc1.

Correlated commits

996e966640 printk: remove logbuf_lock
branch linux-next/master
report [printk] 996e966640: 1097.4% improvement
test scenario

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: klog

cpufreq_governor: performance

test machine lkp-csl-2sp5
status merged on v5.13-rc1

tbench.throughput-mb/sec

Dbench is a loadtester for various protocols such as iSCSI, NFS, SCSI, SMB.

Scenario: throughput on Localhost

Commit c722f35b51 was reported to have -29.1% regression of tbench.throughput-MB/sec when comparing to v5.12-rc2. It was merged to mainline at v5.13-rc1.

Correlated commits

c722f35b51 sched/fair: Bring back select_idle_smt(), but differently
branch linux/master
report [sched/fair] c722f35b51: -29.1% regression
test scenario

nr_threads: 100%

cluster: cs-localhost

cpufreq_governor: performance

test machine lkp-csl-2sp8
status merged on v5.13-rc1, author accepted the regression and is in progress to fix it

shift-left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.13 release cycle, 0-Day CI had reported 14 major performance regressions and 16 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.

Report Summary

0-Day CI had reported 14 performance regressions and 16 improvements by doing shift-left testing on developer and maintainer repos.

Test Indicator Mail Test Scenario Test Machine Status
netperf.Throughput_Mbps [sched/fair] 69bc1b14c3: -36.9% regression

ip: ipv4 

runtime: 300s 

nr_threads: 1 

cluster: cs-localhost 

send_size: 5K 

test: TCP_SENDFILE 

cpufreq_governor: performance

lkp-csl-2ap3 currently not merged, author accepted the regression and is in progress to fix it
phoronix-test-suite.nuttcp.TCPTransfer.ServerToClient.127.0.0.1.mbits_sec [sched/fair] c3fbef9e2f: -14.5% regression

test: nuttcp-1.0.3

cpufreq_governor: performance

lkp-csl-2sp8 currently not merged, no response from author yet
phoronix-test-suite.supertuxkart.1280x1024.Windowed.Basic.1.OldMine.frames_per_second [drm/auth] b657695085: -90.8% regression

need_x: true

test: supertuxkart-1.5.2

option_a: Windowed

option_b: Basic

option_c: 1

option_d: Old Mine [Approximately 90k triangles]

cpufreq_governor: performance

lkp-cfl-d1 currently not merged, no response from author yet
phoronix-test-suite.x264.0.frames_per_second [sched] 7a86d20411: -3.8% regression

test: x264-2.5.0

cpufreq_governor: performance

lkp-csl-2sp8 currently not merged, no response from author yet
qperf.udp.recv_bw [sched/fair] 4a98e7875e: -68.1% regression

runtime: 600s

cluster: cs-localhost

cpufreq_governor: performance

lkp-knl-f1 currently not merged, no response from author yet
stress-ng.copy-file.ops_per_sec [libata] 2c76f9f255: -25.9% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: f2fs

class: filesystem

test: copy-file

cpufreq_governor: performance

lkp-csl-2sp7 currently not merged, no response from author yet
stress-ng.dentry.ops_per_sec [xfs] 752d5513e6: -26.1% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: dentry

cpufreq_governor: performance

lkp-ivb-2ep1 currently not merged, no response from author yet
stress-ng.lockbus.ops_per_sec [clocksource] 8901ecc231: -9.5% regression

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: memory

test: lockbus

cpufreq_governor: performance

lkp-csl-2sp5 currently not merged, author accepted the regression and is in progress to fix it
stress-ng.sync-file.ops_per_sec [xfs] 0279bbbbc0: -19.1% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: sync-file

cpufreq_governor: performance

lkp-csl-2sp7 currently not merged, no response from author yet
vm-scalability.throughput [mm] 9cdbf239b5: -12.4% regression

runtime: 300s

size: 8T

test: anon-cow-seq-mt

cpufreq_governor: performance

lkp-csl-2ap4 currently not merged, author doubted the regression and required more test
vm-scalability.throughput [mm/mremap] ecf8443e51: -29.4% regression

runtime: 300

thp_enabled: never

thp_defrag: always

nr_task: 8

nr_ssd: 1

test: swap-w-seq-mt

cpufreq_governor: performance

lkp-csl-2ap1 currently not merged, author accepted the regression and dropped the patch
will-it-scale.per_process_ops [mm/page_alloc] f846e2dfa4: -7.5% regression

nr_task: 16

mode: process

test: page_fault2

cpufreq_governor: performance

lkp-hsw-4ex1 currently not merged, no response from author yet
will-it-scale.per_process_ops [irqflags] 21db66c4ff: -4.5% regression

nr_task: 16

mode: process

test: dup1

cpufreq_governor: performance

lkp-csl-2ap2 currently not merged, no response from author yet
will-it-scale.per_thread_ops [kentry] 5c61d03b2b: -5.5% regression

nr_task: 50%

mode: thread

test: lseek2

cpufreq_governor: performanc

lkp-csl-2sp9 currently not merged, no response from author yet
Improvement
aim7.jobs-per-min [xfs] 25f25648e5: 22.3% improvement

disk: 4BRD_12G

md: RAID0

fs: xfs

test: sync_disk_rw

load: 300

cpufreq_governor: performance

lkp-csl-2sp9 currently not merged
filebench.sum_operations/s [cpufreq] c1d6d2fd2f: 50.0% improvement

disk: 1HDD

fs: xfs

test: filemicro_create.f

cpufreq_governor: performance

lkp-knm02 currently not merged
fio.write_iops [sched] 9edeaea1bc: 4.7% improvement

disk: 2pmem

fs: xfs

mount_option: dax

runtime: 200s

nr_task: 50%

time_based: tb

rw: write

bs: 2M

ioengine: mmap

test_size: 200G

cpufreq_governor: performance

lkp-csl-2sp6 currently not merged
iozone.average_KBps [mm/page_alloc.c] 0b1c982be8: 8.1% improvement

disk: 2HDD

fs: xfs

iosched: kyber

cpufreq_governor: performance

lkp-knl-f1 currently not merged
netperf.Throughput_Mbps [mm/page_alloc] 2ee804b294: 118.7% improvement

ip: ipv4

runtime: 300s

nr_threads: 25%

cluster: cs-localhost

send_size: 10K

test: SCTP_STREAM_MANY

cpufreq_governor: performance

lkp-csl-2ap3 currently not merged
netperf.Throughput_Mbps [mm/page_alloc] 442b3ab9ff: 120.5% improvement

ip: ipv4

runtime: 300s

nr_threads: 25%

cluster: cs-localhost

send_size: 10K

test: SCTP_STREAM_MANY

cpufreq_governor: performance

lkp-csl-2ap3 currently not merged
netperf.Throughput_tps [iommu/vt] e93a67f5a0: 28.9% improvement

ip: ipv4

runtime: 300s

nr_threads: 16

cluster: cs-localhost

test: TCP_CRR

cpufreq_governor: performance

lkp-csl-2ap3 currently not merged
phoronix-test-suite.stress-ng.SystemVMessagePassing.bogo_ops_s [sched/fair] 5359f5ca0f: 77.9% improvement

test: stress-ng-1.2.2

option_a: System V Message Passing

cpufreq_governor: performance

lkp-csl-2sp8 currently not merged
stress-ng.clock.ops_per_sec [clocksource] df29d3cd5a: 4.5% improvement

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: interrupt

test: clock

cpufreq_governor: performance

lkp-csl-2sp7 currently not merged
stress-ng.dir.ops_per_sec [xfs] 80d287de7b: 80.1% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: dir

cpufreq_governor: performance

lkp-csl-2sp7 currently not merged
stress-ng.get.ops_per_sec [kernfs] 9a658329cd: 191.4% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: get

cpufreq_governor: performance

lkp-csl-2sp5 currently not merged
stress-ng.inode-flags.ops_per_sec [xfs] be05dd0e68: 31.0% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: inode-flags

cpufreq_governor: performance

lkp-csl-2sp7 currently not merged
stress-ng.msg.ops_per_sec [trace] 3d3d9c072e: 18.5% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: msg

cpufreq_governor: performance

lkp-csl-2sp5 currently not merged
will-it-scale.per_process_ops [dma] 45c604d2b7: 3.0% improvement

nr_task: 100%

mode: process

test: mmap2

cpufreq_governor: performance

lkp-skl-fpga01 currently not merged
will-it-scale.per_process_ops [mm] c0af5203b9: 7.4% improvement

nr_task: 16

mode: process

test: mmap1

cpufreq_governor: performance

lkp-csl-2sp9 currently not merged
will-it-scale.per_thread_ops [kprobes] ec6aba3d2b: 3.8% improvement

nr_task: 100%

mode: thread

test: getppid1

cpufreq_governor: performance

lkp-csl-2sp9 currently not merged

netperf.throughput_tps

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

Scenario: tcp_sendfile Test on Localhost

Commit 69bc1b14c3 was reported to have -36.9% regression of netperf.Throughput_Mbps when comparing to v5.13-rc2.

Correlated commits

69bc1b14c3 sched/fair: skip select_idle_sibling() in presence of sync wakeups
branch aa/master
report [sched/fair] 69bc1b14c3: -36.9% regression
test scenario

ip: ipv4 

runtime: 300s 

nr_threads: 1 

cluster: cs-localhost 

send_size: 5K 

test: TCP_SENDFILE 

cpufreq_governor: performance

test machine lkp-csl-2ap3
status currently not merged, author accepted the regression and is in progress to fix it

qperf.udp.recv_bw

Qperf is a small benchmark program to check the performance impact of OpenCL command queue profiling.

Scenario: recv_bw Test on Localhost

Commit 4a98e7875e was reported to have -68.1% regression of qperf.udp.recv_bw when comparing to v5.13-rc2.

Correlated commits

4a98e7875e sched/fair: skip select_idle_sibling() in presence of sync wakeups
branch aa/main
report [sched/fair] 4a98e7875e: -68.1% regression
test scenario

runtime: 600s

cluster: cs-localhost

cpufreq_governor: performance

test machine lkp-knl-f1
status currently not merged, no response from author yet

stress-ng.dir.ops_per_sec

Stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

Scenario: 1hdd-xfs-filesystem-dir

Commit 80d287de7b was reported to have 80.1% improvement of stress-ng.dir.ops_per_sec when comparing to v5.13-rc2.

Correlated commits

802f1d522d xfs: journal IO cache flush reductions
branch linux-review/Dave-Chinner/xfs-CIL-and-log-optimisations/20210603-134113
report [xfs] 80d287de7b: 80.1% improvement
test scenario

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: dir

cpufreq_governor: performance

test machine lkp-csl-2sp7
status currently not merged

Latest Release Performance Comparing

This session gives some information about the performance difference among different kernel releases, especially between v5.13 and v5.12. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparison, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth checking. The root cause to cause the regressions won’t be covered in this session. 

In the following figures, the value on the Y-axis is the relative performance number. We used the v5.12 data as the base (performance number is 100).

Test Suite: vm-scalability

Vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. Below 2 tests show the typical test results.

vm-scalability Test 1

Here are the test configuration and performance test summary for above tests:

  vm-scalability Test 1
test machine

model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

runtime 300s
size 1T
vm-scalability test parameter test: lru-shm
performance summary vm-scalability.throughput on kernel v5.13 has -45.93% regression when comparing to v5.12

Test Suite: will-it-scale

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

will-it-scale Test 1

Here are the parameters and performance test summary for above tests:

  will-it-scale Test 1 
test machine

model: Cascade Lake

brand: Intel® Xeon® Gold 6238M CPU @ 2.10GHz

cpu_number: 88

memory: 128G

nr_task 16
will-it-scale test parameter

mode: process

test: mmap2

summary will-it-scale.per_process_ops on kernel v5.13 has 8.59% improvement when comparing to v5.12

Test Suite: unixbench

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

Unixbench Test 1

Here are the test configuration and performance test summary for above tests:

  Unixbench Test 1 
test machine

model: Coffee Lake

brand: Intel® Xeon® E-2278G CPU @ 3.40GHz

nr_cpu: 16

memory: 32G

runtime 300s
nr_task 30%
unixbench test parameter test: syscall
performance summary unixbench.score on kernel v5.13 has 1.57% improvement when comparing to v5.12

Test Suite: reaim

Reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

reaim Test 1

Here are the test configuration and performance test summary for above tests:

  reaim Test 1
test machine

model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

runtime 300s
nr_task 100%
disk No requirement
fs No requirement
reaim test parameter test: dbase
performance  summary reaim.jobs_per_min on kernel v5.13 has -42.19% regression when comparing to v5.12

Test Suite: pigz

Pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

pigz Test 1

Here are the test configuration and performance test summary for above tests:

  pigz Test 1
test machine

model: Knights Mill

brand: Intel® Xeon Phi™ CPU 7255 @ 1.10GHz

cpu_number: 272

memory: 112G

nr_threads 100%
pigz Test parameter blocksize: 512K
performance  summary pigz.throughput on kernel v5.13 has 3.53% improvement when comparing to v5.12

Test Suite: netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

netperf Test 1

Here are the test configuration and performance test summary for above tests:

  netperf Test 1
test machine

model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

disable_latency_stats 1
set_nic_irq_affinity 1
runtime 300s
nr_threads 1
ip ipv4
netperf test parameter test: SCTP_STREAM_MANY
performance  summary netperf.Throughput_Mbps on kernel v5.13 has 11.98% improvement when comparing to v5.12

Test Suite: hackbench

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's  main job  is  to  create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long  it takes for each pair to send data back and forth.

hackbench Test 1

Here are the test configuration and performance test summary for above tests:

  hackbench Test 1
test machine

model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

nr_cpu: 192

memory: 192G

disable_latency_stats 1
nr_threads 100%
unixbench test parameter

mode: threads

ipc: socket

performance summary hackbench.throughput on kernel v5.13 has 17.04% regression when comparing to v5.12

Test Suite: fio

Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.

fio Test 1

fio Test 2

Here are the test configuration and performance test summary for above tests:

  fio Test 1 fio Test 2
test machine

model: Cascade Lake

brand: Intel® Xeon® CPU @ 2.20GHz

cpu_number: 192

memory: 192G

model: Ivy Bridge-EP

brand: Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz

cpu_number: 48

memory: 112G

runtime 300s 300s
file system xfs btrfs
disk 1SSD 1HDD
boot_params No requirement No requirement
nr_task 32 1
time_based No requirement No requirement
fio test parameter

fio-setup-basic:

  rw: randwrite

  bs: 4k

  ioengine: io_uring

  test_size: 256G

fio-setup-basic:

  rw: randwrite

  bs: 4M

  ioengine: sync

  test_size: 128G

performance summary fio.write_bw_MBps on kernel v5.13 is almost the same as that in v5.12 fio.write_bw_MBps on kernel v5.13 has -12% regression when comparing to v5.12

Test Suite: ebizzy

Ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.

ebizzy Test 1

Here are the test configuration and performance test summary for above test:

  ebizzy Test 1
test machine

model: Cascade Lake

brand: Intel® Xeon® Gold 6252 CPU @ 2.10GHz

cpu_number: 96

memory: 192G

transparent_hugepage No requirement
nr_threads 200%
iterations 100x
ebizzy test parameter duration: 10s
performance  summary ebizzy.throughput on kernel v5.13 is almost the same as that in v5.12

Test Machines

IVB Desktop

model Ivy Bridge
brand Intel® Core™ i3-3220 CPU @ 3.30GHz
cpu number 8
memory 16G
model Ivy Bridge
brand Intel® Core™ i3-3220 CPU @ 3.30GHz
cpu number 4
memory 8G

SKL SP

model Skylake
brand Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
cpu number 80
memory 64G

BDW EP

model Broadwell-EP
brand Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz
cpu number 88
memory 128G

HSW EP

model Haswell-EP
brand Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz
cpu number 72
memory 128G

IVB EP

model Ivy Bridge-EP
brand Intel® Xeon® CPU E5-2690 v2 @ 3.00GHz
cpu number 40
memory 384G
model Ivytown Ivy Bridge-EP
brand Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
cpu number 48
memory 64G

HSX EX

model Brickland Haswell-EX
brand Intel® Xeon® CPU E7-8890 v3 @ 2.50GHz
cpu number 144
memory 512G

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.