0-Day CI Linux Kernel Performance Report (V5.12)

Published: 12/18/2021

Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, test parameter description 

  • Section 3, merged regressions and improvements in v5.12 release candidates

  • Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.12 release cycle

  • Section 5, performance comparison among different kernel releases

  • Section 6, test machine list

Test Parameters Descriptions

Here are the descriptions for each parameter/field used in the tests.

Classification Name Description
General runtime Run the test case within a certain time period (seconds or minutes)
  nr_task If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1.

If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number
  nr_threads Alias of nr_task
  iterations Number to repeat this job
  test_size Test disk size or memory size
  set_nic_irq_affinity Set NIC interrupt affinity
  disable_latency_stats Latency_stats may introduce too much noise if there are too many context switches, allow to disable it
  transparent_hugepage Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage)
  boot_params:bp1_memmap Boot parameters of memmap
  disk:nr_pmem number of pmem partitions used by test
  swap:priority Priority means the priority of the swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value.
Test Machine model Name of Intel processor microarchitecture
  brand Brand name of cpu
  cpu_number Number of cpu
  memory Size of memory

Linux Kernel V5.12 Release Test

The 5.12 release of the Linux kernel was on April 25, 2021. Linus has released the 5.12 kernel. "Thanks to everybody who made last week very calm indeed, which just makes me feel much happier about the final 5.12 release." Headline features in 5.12 include the removal of a number of obsolete, (mostly) 32-bit Arm subarchitectures, atomic instructions for BPF, conditional file lookups with LOOKUP_CACHED, support for zoned block devices in the Btrfs filesystemthreaded NAPI polling in the network stack, filesystem ID mapping, support for building the kernel with Clang link-time optimization, the KFENCE kernel-debugging tool, and more. See the LWN merge-window summaries (part 1part 2) and the (in-progress) KernelNewbies 5.12 page for more information.

0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 11 regressions and 13 improvements during feature development phase for v5.12. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

Observation Summary

0-Day CI observed 11 regressions and 13 improvements during the feature development phase for v5.12, which is in the time frame from v5.12-rc1 to v5.12 release.

Test Indicator Report Test Scenario Test Machine Development Base Status
adrestia.wakeup_cost_periodic_us [sched/fair] 9fe1f127b9: -8.1% regression num_threads: 100 cpufreq_governor: performance lkp-knl-f1 v5.11 merged at v5.12-rc1, no response from author yet
aim7.jobs-per-min [ext4] efc6134527: -6.8% regression disk: 1BRD_48G 

fs: ext4 

test: creat-clo 

load: 1000

cpufreq_governor: performance
lkp-cpl-4sp1 v5.12-rc2 merged at v5.12-rc4, author doesn't have comparable hardware to reproduce, 0-Day CI team is following up
aim9.signal_test.ops_per_sec [sched] 9e81889c76: -2.9% regression nr_threads: 100% 

blocksize: 128K 

cpufreq_governor: performance
lkp-knl-f1 v5.12-rc2 merged at v5.12-rc3, no response from author yet
aim9.signal_test.ops_per_sec [workqueue/tracing] 83b62687a0: -6.1% regression testtime: 5s 

test: all 

cpufreq_governor: performance
lkp-knl-f1 v5.12-rc3 merged at  v5.12-rc4, no response from author yet
fio.read_iops [io_uring] 7a612350a9: -6.5% regression disk: 2pmem 

fs: ext2 

mount_option: dax 

runtime: 200s 

nr_task: 50%

time_based: tb 

rw: read 

bs: 2M 

ioengine: mmap 

test_size: 200G 

cpufreq_governor: performance
lkp-csl-2sp6 v5.12-rc2 merged at  v5.12-rc3, no response from author yet
fxmark.hdd_btrfs_DWAL_63_bufferedio.works/sec [mm] f3344adf38: -52.4% regression disk: 1HDD 

media: hdd 

test: DWAL 

fstype: btrfs 

directio: bufferedio 

cpufreq_governor: performance
lkp-knm01 v5.11 merged at v5.12-rc1, fixed by later patch on mainline
fxmark.hdd_btrfs_DWOL_9_bufferedio.works/sec [btrfs] 2e294c6049: -37.9% regression disk: 1HDD 

media: hdd 

test: DWOL 

fstype: btrfs 

directio: bufferedio 

cpufreq_governor: performance
lkp-knm01 v5.11-rc7 merged at v5.12-rc1, fixed by later patch on mainline
fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec [mm/filemap]cbd59c48ae: -7.6% regression disk: 1HDD 

media: hdd 

test: DRBM 

fstype: ext4_no_jnl 

directio: bufferedio 

cpufreq_governor: performance
lkp-knm01 v5.11 merged at v5.12-rc1, author accepted the possibility of regression and WIP
netperf.Throughput_tps [x86/mce] 7bb39313cd: -4.5% regression ip: ipv4 

runtime: 300s 

nr_threads: 16 

cluster: cs-localhost 

test: TCP_CRR 

cpufreq_governor: performance
lkp-csl-2ap3 v5.11-rc2 merged at v5.12-rc1, test evironment dependent
netperf.Throughput_tps [kbuild] 6a3193cdd5: -11.9% regression ip: ipv4 

runtime: 300s 

nr_threads: 200% 

cluster: cs-localhost 

test: SCTP_RR 

cpufreq_governor: performance
lkp-csl-2ap4 v5.12-rc5 merged at v5.12-rc6, no response from author yet
will-it-scale.per_process_ops [entry] 47b8ff194c: -3.0% regression nr_task: 100% 

mode: process 

test: futex3 

cpufreq_governor: performance
lkp-csl-2ap2 v5.11 merged at v5.12-rc1, no response from author yet

 

Improvement          
fio.write_iops [btrfs] 5deb17e18e: 1.2% improvement runtime: 300s 

disk: 1HDD 

fs: btrfs 

nr_task: 1 

test_size: 128G r

w: randwrite 

bs: 4k 

ioengine: sync 

cpufreq_governor: performance
lkp-cfl-e1 v5.11-rc7 merged at v5.12-rc1
fsmark.files_per_sec [btrfs] ab12313a9f: 21.3% improvement iterations: 1x 

nr_threads: 32t 

disk: 1SSD 

fs: btrfs 

filesize: 8K 

test_size: 400M 

sync_method: fsyncBeforeClose 

nr_directories: 16d 

nr_files_per_directory: 256fpd 

cpufreq_governor: performance
lkp-csl-2sp7 v5.11-rc7 merged at v5.12-rc1
netperf.Throughput_tps [bpf] a9ed15dae0: 3.9% improvement ip: ipv4 

runtime: 300s 

nr_threads: 25% 

cluster: cs-localhost 

test: UDP_RR 

cpufreq_governor: performance
lkp-csl-2sp9 v5.11-rc4 merged at v5.12-rc1
pigz.throughput [workqueue/tracing] 83b62687a0: 4.4% improvement nr_threads: 100% 

blocksize: 128K 

cpufreq_governor: performance
lkp-knm02 v5.12-rc3 merged at v5.12-rc4
stress-ng.loop.ops_per_sec [loop] 6cc8e74308: 139.4% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: loop 

cpufreq_governor: performance
lkp-csl-2sp5 v5.11-rc5 merged at v5.12-rc1
stress-ng.memfd.ops_per_sec [mm] 802f1d522d: 8.7% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: memfd 

cpufreq_governor: performance
lkp-csl-2sp5 v5.11 merged at v5.12-rc1
stress-ng.timer.ops_per_sec [x86/perf] abd562df94: 6.1% improvement nr_threads: 100% 

disk: 1HDD 

testtime: 60s 

class: interrupt 

test: timer 

cpufreq_governor: performance
lkp-csl-2sp7 v5.11-rc2 merged at v5.12-rc1
stress-ng.timerfd.ops_per_sec [x86/pv] ab234a260b: 6.6% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: timerfd 

cpufreq_governor: performance
lkp-csl-2sp5 v5.11-rc7 merged at v5.12-rc1
stress-ng.vfork.ops_per_sec [mm, slub] 3286222fc6: 32.1% improvement nr_threads: 100% 

disk: 1HDD 

testtime: 60s 

sc_pid_max: 4194304 

class: scheduler 

test: vfork 

cpufreq_governor: performance
lkp-csl-2sp7 v5.11-rc7 merged at v5.11
stress-ng.xattr.ops_per_sec [xfs] 06058bc405: 33.8% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: xfs 

class: filesystem 

test: xattr 

cpufreq_governor: performance
lkp-csl-2sp7 v5.11-rc4 merged at v5.12-rc1
vm-scalability.throughput [mm] f9ce0be71d: 2.2% improvement runtime: 300s 

size: 2T 

test: shm-xread-seq-mt 

cpufreq_governor: performance
lkp-csl-2ap4 v5.11-rc4 merged at v5.12-rc1
vm-scalability.throughput [hugetlb] 4eae4efa2c: 1.1% improvement runtime: 300s 

size: 8T 

test: anon-cow-seq-hugetlb 

cpufreq_governor: performance
lkp-csl-2sp6 v5.12-rc2 merged at v5.12-rc3
will-it-scale.per_thread_ops [io_uring] 7c30f36a98: 9.1% improvement nr_task: 50% 

mode: thread 

test: unix1 

cpufreq_governor: performance
lkp-hsw-4ex1 v5.12-rc2 merged at v5.12-rc3

netperf.Throughput_tps

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

Scenario: tcp_crr test on Localhost

3.2.1 Scenario: tpc_crr test on Localhost

Commit 7bb39313cd was reported to have -4.5% regression of netperf.Throughput_tps when comparing to v5.11-rc2. It was merged to mainline at v5.12-rc1.

Correlated commits

7bb39313cd x86/mce: Make mce_timed_out() identify holdout CPUs
branch linus/master
report [x86/mce] 7bb39313cd: -4.5% regression
test scenario ip: ipv4 

runtime: 300s 

nr_threads: 16 

cluster: cs-localhost 

test: TCP_CRR 

cpufreq_governor: performance
test machine lkp-csl-2ap3
status merged at v5.12-rc1, test evironment dependent

aim7.jobs-per-min

Aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

Scenario: creat-clo test on ext4

Scenario: creat-clo test on ext4

Commit efc6134527 was reported to have -6.8% regression of aim7.jobs-per-min when comparing to v5.12-rc2. It was merged to mainline at v5.12-rc4.

Correlated commits

efc6134527 ext4: shrink race window in ext4_should_retry_alloc()
branch linus/master
report [ext4] efc6134527: -6.8% regression
test scenario disk: 1BRD_48G 

fs: ext4 

test: creat-clo 

load: 1000 

cpufreq_governor: performance
test machine lkp-cpl-4sp1
status merged at v5.12-rc4, author doesn't have comparable hardware to reproduce, 0-Day CI team is following up

stress-ng.memfd.ops_per_sec

Stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

Scenario: 1hdd-ext4-os-memfd
 

Scenario: 1hdd-ext4-os-memfd

Commit 802f1d522d was reported to have 8.7% improvement of stress-ng.memfd.ops_per_sec when comparing to v5.11. It was merged to mainline at v5.12-rc1.

Correlated commits

802f1d522d mm: page_counter: re-layout structure to reduce false sharing
branch linus/master
report [mm] 802f1d522d: 8.7% improvement
test scenario nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: memfd 

cpufreq_governor: performance
test machine lkp-csl-2sp5
status merged at v5.12-rc1

shift-left testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.12 release cycle, 0-Day CI had reported 18 major performance regressions and 12 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.

Report Summary

0-Day CI had reported 18 performance regressions and 12 improvements by doing shift-left testing on developer and maintainer repos.

Test Indicator Mail Test Scenario Test Machine Status
aim7.jobs-per-min [xfs] 571519716f: -8.5% regression disk: 4BRD_12G 

md: RAID0 

fs: xfs 

test: disk_src 

load: 3000 

cpufreq_governor: performance
lkp-csl-2sp9 currently not merged, no response from author yet
aim7.jobs-per-min [xfs] 7c60766161: -56.3% regression disk: 4BRD_12G 

md: RAID0 

fs: xfs 

test: disk_src 

load: 3000 

cpufreq_governor: performance
lkp-csl-2sp9 currently not merged, no response from author yet
aim7.jobs-per-min [xfs] 301157ab53: -65.3% regression disk: 4BRD_12G 

md: RAID1 

fs: xfs 

test: disk_rw 

load: 3000 

cpufreq_governor: performance
lkp-csl-2sp9 currently not merged, no response from author yet
aim9.sync_disk_rw.ops_per_sec [percpu] ace7e70901: -2.3% regression testtime: 300s 

test: sync_disk_rw 

cpufreq_governor: performance
lkp-knl-f1 currently not merged, author WIP
hackbench.throughput [ZEN] 2d56caddc2: -20.5% regression nr_threads: 100% 

iterations: 4 

mode: threads 

ipc: pipe 

cpufreq_governor: performance
lkp-skl-fpga01 currently not merged, no response from author yet
netperf.Throughput_Mbps [mm/page_alloc] c14b25d7bc: -2.4% regression ip: ipv4 

runtime: 300s 

nr_threads: 50% 

cluster: cs-localhost 

send_size: 10K 

test: SCTP_STREAM_MANY 

cpufreq_governor: performance
lkp-csl-2ap3 currently not merged, no response from author yet
netperf.Throughput_tps [sched/fair] 322b5a8117: -84.0% regression ip: ipv4 

runtime: 300s 

nr_threads: 200% 

cluster: cs-localhost 

test: TCP_RR 

cpufreq_governor: performance
lkp-csl-2ap4 author droped the patch
stress-ng.dnotify.ops_per_sec [xfs] 7f83561097: -73.9% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: xfs 

class: filesystem 

test: dnotify 

cpufreq_governor: performance
lkp-csl-2sp7 currently not merged, no response from author yet
stress-ng.eventfd.ops_per_sec [seq_file] 5fd6060e50: -49.1% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: eventfd 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged, author WIP
stress-ng.loop.ops_per_sec [block] c76f48eb5c: -99.9% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: loop 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged, no response from author yet
stress-ng.opcode.ops_per_sec [clocksource] 6c52b5f3cf: -14.4% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: opcode 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged, author WIP
stress-ng.procfs.ops_per_sec [sched, debug] 3b87f136f8: -31.7% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: procfs 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged, author confirmed it's expected
stress-ng.sigsegv.ops_per_sec 08ed4efad6: -41.9% regression nr_threads: 100% 

disk: 1HDD 

testtime: 60s 

class: interrupt 

test: sigsegv 

cpufreq_governor: performance
lkp-ivb-2ep1 currently not merged, author WIP
stress-ng.vm-segv.ops_per_sec [sched/fair] b360fb5e59: -13.9% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: vm 

test: vm-segv 

cpufreq_governor: performance
lkp-csl-2sp7 currently not merged, author accepted the regression but doubted the value of the regression  
stress-ng.vm-segv.ops_per_sec [sched/fair] 38ac256d1c: -13.8% regression nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: vm-segv 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged, author WIP
vm-scalability.throughput [mm] 4f09feb8bf: -4.3% regression runtime: 300s 

test: lru-file-readonce 

cpufreq_governor: performance
lkp-csl-2ap4 currently not merged, no response from author yet
will-it-scale.per_process_ops [proc] 43b2a76b1a: -11.3% regression nr_task: 16 

mode: process 

test: eventfd1 

cpufreq_governor: performance
lkp-csl-2ap2 currently not merged, no response from author yet
will-it-scale.per_thread_ops [mm] 81a779a1a4: -4.2% regression nr_task: 100% 

mode: thread 

test: futex1 

cpufreq_governor: performance
lkp-csl-2ap2 currently not merged, author WIP

 

Improvement        
aim7.jobs-per-min [xfs] 1fea323ff0: 2.4% improvement disk: 4BRD_12G 

md: RAID1 

fs: xfs 

test: disk_rw 

load: 3000 

cpufreq_governor: performance
lkp-csl-2sp9 currently not merged
apachebench.requests_per_second [x86] 0c8dacb551: 9.6% improvement runtime: 300s 

concurrency: 1000 

cluster: cs-localhost 

cpufreq_governor: performance
lkp-bdw-de1 currently not merged
fsmark.files_per_sec [btrfs] b05645404a: 81.3% improvement iterations: 1x 

nr_threads: 64t 

disk: 1BRD_48G 

fs: btrfs 

filesize: 4M 

test_size: 24G 

sync_method: NoSync 

cpufreq_governor: performance
lkp-csl-2ap2 currently not merged
stress-ng.klog.ops_per_sec [printk] 996e966640: 1097.4% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: klog 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged
stress-ng.link.ops_per_sec [f2fs] b5d15199a2: 175.3% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: f2fs 

class: filesystem 

test: link 

cpufreq_governor: performance
lkp-ivb-2ep1 currently not merged
stress-ng.msg.ops_per_sec [io_uring] 860d1bed91: 34.9% improvement nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: msg

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged
stress-ng.sock.ops_per_sec [sched/fair] d619f7afd7: 69.4% improvement nr_threads: 100% 

disk: 1HDD 

testtime: 60s 

class: network 

test: sock 

cpufreq_governor: performance
lkp-csl-2sp5 currently not merged
unixbench.score [fs] aec499039e: 19.2% improvement runtime: 300s 

nr_task: 30% 

test: syscall 

cpufreq_governor: performance
lkp-csl-2sp4 currently not merged
vm-scalability.median [mm] bcb0df12bc: 1.5% improvement runtime: 300s 

size: 8T 

test: anon-cow-seq-hugetlb 

cpufreq_governor: performance
lkp-csl-2sp6 currently not merged
vm-scalability.throughput [mm] 599aa62474: 55.2% improvement runtime: 300s 

size: 8T 

test: anon-cow-seq 

cpufreq_governor: performance
lkp-csl-2ap4 currently not merged
will-it-scale.per_process_ops [objtool/x86] 9bc0bb5072: 5.6% improvement nr_task: 16 

mode: process 

test: eventfd1 

cpufreq_governor: performance
lkp-csl-2ap2 currently not merged
will-it-scale.per_process_ops [bpf] a10787e6d5: 3.5% improvement nr_task: 16 

mode: process 

test: mmap2 

cpufreq_governor: performance
lkp-csl-2sp9 currently not merged

Will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

Scenario: futex1 Test

Scenario: futex1 Test

Commit 81a779a1a4 was reported to have -4.2% regression of will-it-scale.per_thread_ops when comparing to v5.12-rc1.

Correlated commits

81a779a1a4 mm: introduce memfd_secret system call to create "secret" memory areas
branch rppt/memfd-secret/v18
report [x86/mce] 7bb39313cd: -4.5% regression
test scenario nr_task: 100% 

mode: thread 

test: futex1 

cpufreq_governor: performance
test machine lkp-csl-2ap2
status merged at v5.12-rc1, test evironment dependent

stress-ng.opcode.ops_per_sec

stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

Scenario: opcode test on ext4

Scenario: opcode test on ext4

Commit 6c52b5f3cf was reported to have -14.4% regression of stress-ng.opcode.ops_per_sec when comparing to v5.12-rc5.

Correlated commits

6c52b5f3cf clocksource: Reduce WATCHDOG_THRESHOLD
branch rcu/dev.2021.04.13a
report [clocksource] 6c52b5f3cf: -14.4% regression
test scenario nr_threads: 10% 

disk: 1HDD 

testtime: 60s 

fs: ext4 

class: os 

test: opcode 

cpufreq_governor: performance
test machine lkp-csl-2sp5
status currently not merged, author WIP

unixbench.score

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

Scenario: syscall test

Scenario: syscall test

Commit aec499039e was reported to have 19.2% improvement of unixbench.score when comparing to v5.12-rc5.

Correlated commits

aec499039e fs: Optimized file struct to improve performance
branch linux-review/Shaokun-Zhang/fs-Optimized-file-struct-to-improve-performance/20210409-114859
report [fs] aec499039e: 19.2% improvement
test scenario runtime: 300s 

nr_task: 30% 

test: syscall 

cpufreq_governor: performance
test machine lkp-csl-2sp4
status currently not merged

Latest Release Performance Comparing

This session gives some information about the performance difference among different kernel releases, especially between v5.12 and v5.10. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparison, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth checking. The root cause to cause the regressions won’t be covered in this session. 

In the following figures, the value on the Y-axis is the relative performance number. We used the v5.10 data as the base (performance number is 100).

Test Suite: vm-scalability

Vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. Below 2 tests show the typical test results.

vm-scalability Test 1

vm-scalability Test 1

vm-scalability Test 2

vm-scalability Test 2

Here are the test configuration and performance test summary for above tests:

  vm-scalability Test 1 vm-scalability Test 2
test machine model: Skylake

brand: 

cpu_number: 104

memory: 192G
model: Knights Mill   

brand: Intel® Xeon Phi™ CPU 7295 @ 1.50GHz   

cpu_number: 288   

memory: 80G
runtime 350s 300s
size 1T 8T
vm-scalability test parameter test case: lru-shm test case: anon-cow-seq-hugetlb
performance summary vm-scalability.throughput on kernel v5.12 has 3.04% improvement when comparing to v5.1 vm-scalability.throughput on kernel v5.12 has -14.54% regression when comparing to v5.11

Test Suite: will-it-scale

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

will-it-scale Test 1

will-it-scale Test 1

will-it-scale Test 2

will-it-scale Test 2

Here are the parameters and performance test summary for above tests:

  will-it-scale Test 1 will-it-scale Test 2
test machine model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G
model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G
nr_task 16 100%
will-it-scale test parameter mode: thread

test: futex2
mode: process

test: futex3
summary will-it-scale.per_thread_ops on kernel v5.12 has 3.94% improvement when comparing to v5.11 will-it-scale.per_process_ops on kernel v5.12 has -78.7% regression when comparing to v5.11

Test Suite: unixbench

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

Unixbench Test 1

Unixbench Test 1

Here are the test configuration and performance test summary for above tests:

  Unixbench Test 1
test machine model: Cascade Lake

brand: Intel® Xeon® CPU @ 2.30GHz

cpu_number: 96

memory: 128G
runtime 300s
nr_task 1
unixbench test parameter test: execl
performance summary unixbench.score on kernel v5.12 has -7.02% regression when comparing to v5.11

Test Suite: reaim

Reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

reaim Test 1

reaim Test 1

Here are the test configuration and performance test summary for above tests:

  reaim Test 1
test machine model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G
runtime 300s
nr_task 100%
disk No requirement
fs No requirement
reaim test parameter test case: short
performance  summary reaim.jobs_per_min on kernel v5.12 has -10.39% regression when comparing to v5.11

Test Suite: pigz

Pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

pigz Test 1

pigz Test 1

Here are the test configuration and performance test summary for above tests:

  pigz Test 1
test machine model: Knights Mill

brand: Intel® Xeon Phi™ CPU 7255 @ 1.10GHz

cpu_number: 272

memory: 112G
nr_threads 100%
pigz Test parameter blocksize: 512K
performance  summary pigz.throughput on kernel v5.12 has 2.09% improvement when comparing to v5.11

Test Suite: netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

netperf Test 1

netperf Test 1

Here are the test configuration and performance test summary for above tests:

  netperf Test 1
test machine model: Cascade Lake

brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G
disable_latency_stats 1
set_nic_irq_affinity 1
runtime 300s
nr_threads 50%
ip ipv4
netperf test parameter test case: UDP_RR
performance  summary netperf.Throughput_tps on kernel v5.12 has 5.75% improvement when comparing to v5.11

Test Suite: hackbench

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's  main job  is  to  create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long  it takes for each pair to send data back and forth.

hackbench Test 1

hackbench Test 1

Here are the test configuration and performance test summary for above tests:

  hackbench Test 1
test machine model: Cascade Lake

brand: Intel® Xeon® CPU @ 2.30GHz

cpu_number: 96

memory: 128G
disable_latency_stats 1
nr_task 50%
unixbench test parameter mode: threads

ipc: socket
performance summary hackbench.throughput on kernel v5.12 has 8.8% improvement when comparing to v5.11

Test Suite: Fio

Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.

fio Test 1

fio Test 1

Here are the test configuration and performance test summary for above tests:

  fio Test 1
test machine model: Cascade Lake

brand: Intel® Xeon® CPU @ 2.20GHz

cpu_number: 192

memory: 192G
runtime 300s
file system xfs
disk 1SSD
boot_params No requirement
nr_task 32
time_based No requirement
fio test parameter fio-setup-basic:   

rw: randwrite   

bs: 4k   

ioengine: io_uring   

test_size: 256g
performance  summary fio.write_iops on kernel v5.12 is almost the same as that in v5.11

Test Suite: ebizzy

Ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.

ebizzy Test 1

Here are the test configuration and performance test summary for above test:

  ebizzy Test 1
test machine model: Knights Mill

brand: Intel® Xeon Phi™ CPU 7295 @ 1.50GHz

cpu_number: 288

memory: 80G
transparent_hugepage No requirement
nr_threads 200%
iterations 100x
ebizzy test parameter duration: 10s
performance  summary ebizzy.throughput on kernel v5.12 is almost the same as that in v5.11

Test Machines

IVB DESKTOP

model Ivy Bridge
brand Intel® Core™ i3-3220 CPU @ 3.30GHz
cpu number 8
memory 16G

 

model Ivy Bridge
brand Intel® Core™ i3-3220 CPU @ 3.30GHz
cpu number 4
memory 8G

SKL SP

model Skylake
brand Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
cpu number 80
memory 64G

BDW EP

model Broadwell-EP
brand Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz
cpu number 88
memory 128G

HSW EP

model Haswell-EP
brand Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz
cpu number 72
memory 128G

IVB EP

model Ivy Bridge-EP
brand Intel® Xeon® CPU E5-2690 v2 @ 3.00GHz
cpu number 40
memory 384G

 

model Ivytown Ivy Bridge-EP
brand Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
cpu number 48
memory 64G

HSX EX

model Brickland Haswell-EX
brand Intel® Xeon® CPU E7-8890 v3 @ 2.50GHz
cpu number 144
memory 512G

 

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.