0-Day CI Linux Kernel Performance Report (V5.12)
Published: 12/18/2021
Introduction
0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:
-
Section 2, test parameter description
-
Section 3, merged regressions and improvements in v5.12 release candidates
-
Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.12 release cycle
-
Section 5, performance comparison among different kernel releases
-
Section 6, test machine list
Test Parameters Descriptions
Here are the descriptions for each parameter/field used in the tests.
Classification | Name | Description |
---|---|---|
General | runtime | Run the test case within a certain time period (seconds or minutes) |
nr_task | If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1. If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number |
|
nr_threads | Alias of nr_task | |
iterations | Number to repeat this job | |
test_size | Test disk size or memory size | |
set_nic_irq_affinity | Set NIC interrupt affinity | |
disable_latency_stats | Latency_stats may introduce too much noise if there are too many context switches, allow to disable it | |
transparent_hugepage | Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage) | |
boot_params:bp1_memmap | Boot parameters of memmap | |
disk:nr_pmem | number of pmem partitions used by test | |
swap:priority | Priority means the priority of the swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value. | |
Test Machine | model | Name of Intel processor microarchitecture |
brand | Brand name of cpu | |
cpu_number | Number of cpu | |
memory | Size of memory |
Linux Kernel V5.12 Release Test
The 5.12 release of the Linux kernel was on April 25, 2021. Linus has released the 5.12 kernel. "Thanks to everybody who made last week very calm indeed, which just makes me feel much happier about the final 5.12 release." Headline features in 5.12 include the removal of a number of obsolete, (mostly) 32-bit Arm subarchitectures, atomic instructions for BPF, conditional file lookups with LOOKUP_CACHED, support for zoned block devices in the Btrfs filesystem, threaded NAPI polling in the network stack, filesystem ID mapping, support for building the kernel with Clang link-time optimization, the KFENCE kernel-debugging tool, and more. See the LWN merge-window summaries (part 1, part 2) and the (in-progress) KernelNewbies 5.12 page for more information.
0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 11 regressions and 13 improvements during feature development phase for v5.12. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.
Observation Summary
0-Day CI observed 11 regressions and 13 improvements during the feature development phase for v5.12, which is in the time frame from v5.12-rc1 to v5.12 release.
Test Indicator | Report | Test Scenario | Test Machine | Development Base | Status |
---|---|---|---|---|---|
adrestia.wakeup_cost_periodic_us | [sched/fair] 9fe1f127b9: -8.1% regression | num_threads: 100 cpufreq_governor: performance | lkp-knl-f1 | v5.11 | merged at v5.12-rc1, no response from author yet |
aim7.jobs-per-min | [ext4] efc6134527: -6.8% regression | disk: 1BRD_48G fs: ext4 test: creat-clo load: 1000 cpufreq_governor: performance |
lkp-cpl-4sp1 | v5.12-rc2 | merged at v5.12-rc4, author doesn't have comparable hardware to reproduce, 0-Day CI team is following up |
aim9.signal_test.ops_per_sec | [sched] 9e81889c76: -2.9% regression | nr_threads: 100% blocksize: 128K cpufreq_governor: performance |
lkp-knl-f1 | v5.12-rc2 | merged at v5.12-rc3, no response from author yet |
aim9.signal_test.ops_per_sec | [workqueue/tracing] 83b62687a0: -6.1% regression | testtime: 5s test: all cpufreq_governor: performance |
lkp-knl-f1 | v5.12-rc3 | merged at v5.12-rc4, no response from author yet |
fio.read_iops | [io_uring] 7a612350a9: -6.5% regression | disk: 2pmem fs: ext2 mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: read bs: 2M ioengine: mmap test_size: 200G cpufreq_governor: performance |
lkp-csl-2sp6 | v5.12-rc2 | merged at v5.12-rc3, no response from author yet |
fxmark.hdd_btrfs_DWAL_63_bufferedio.works/sec | [mm] f3344adf38: -52.4% regression | disk: 1HDD media: hdd test: DWAL fstype: btrfs directio: bufferedio cpufreq_governor: performance |
lkp-knm01 | v5.11 | merged at v5.12-rc1, fixed by later patch on mainline |
fxmark.hdd_btrfs_DWOL_9_bufferedio.works/sec | [btrfs] 2e294c6049: -37.9% regression | disk: 1HDD media: hdd test: DWOL fstype: btrfs directio: bufferedio cpufreq_governor: performance |
lkp-knm01 | v5.11-rc7 | merged at v5.12-rc1, fixed by later patch on mainline |
fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec | [mm/filemap]cbd59c48ae: -7.6% regression | disk: 1HDD media: hdd test: DRBM fstype: ext4_no_jnl directio: bufferedio cpufreq_governor: performance |
lkp-knm01 | v5.11 | merged at v5.12-rc1, author accepted the possibility of regression and WIP |
netperf.Throughput_tps | [x86/mce] 7bb39313cd: -4.5% regression | ip: ipv4 runtime: 300s nr_threads: 16 cluster: cs-localhost test: TCP_CRR cpufreq_governor: performance |
lkp-csl-2ap3 | v5.11-rc2 | merged at v5.12-rc1, test evironment dependent |
netperf.Throughput_tps | [kbuild] 6a3193cdd5: -11.9% regression | ip: ipv4 runtime: 300s nr_threads: 200% cluster: cs-localhost test: SCTP_RR cpufreq_governor: performance |
lkp-csl-2ap4 | v5.12-rc5 | merged at v5.12-rc6, no response from author yet |
will-it-scale.per_process_ops | [entry] 47b8ff194c: -3.0% regression | nr_task: 100% mode: process test: futex3 cpufreq_governor: performance |
lkp-csl-2ap2 | v5.11 | merged at v5.12-rc1, no response from author yet |
Improvement | |||||
---|---|---|---|---|---|
fio.write_iops | [btrfs] 5deb17e18e: 1.2% improvement | runtime: 300s disk: 1HDD fs: btrfs nr_task: 1 test_size: 128G r w: randwrite bs: 4k ioengine: sync cpufreq_governor: performance |
lkp-cfl-e1 | v5.11-rc7 | merged at v5.12-rc1 |
fsmark.files_per_sec | [btrfs] ab12313a9f: 21.3% improvement | iterations: 1x nr_threads: 32t disk: 1SSD fs: btrfs filesize: 8K test_size: 400M sync_method: fsyncBeforeClose nr_directories: 16d nr_files_per_directory: 256fpd cpufreq_governor: performance |
lkp-csl-2sp7 | v5.11-rc7 | merged at v5.12-rc1 |
netperf.Throughput_tps | [bpf] a9ed15dae0: 3.9% improvement | ip: ipv4 runtime: 300s nr_threads: 25% cluster: cs-localhost test: UDP_RR cpufreq_governor: performance |
lkp-csl-2sp9 | v5.11-rc4 | merged at v5.12-rc1 |
pigz.throughput | [workqueue/tracing] 83b62687a0: 4.4% improvement | nr_threads: 100% blocksize: 128K cpufreq_governor: performance |
lkp-knm02 | v5.12-rc3 | merged at v5.12-rc4 |
stress-ng.loop.ops_per_sec | [loop] 6cc8e74308: 139.4% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: loop cpufreq_governor: performance |
lkp-csl-2sp5 | v5.11-rc5 | merged at v5.12-rc1 |
stress-ng.memfd.ops_per_sec | [mm] 802f1d522d: 8.7% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: memfd cpufreq_governor: performance |
lkp-csl-2sp5 | v5.11 | merged at v5.12-rc1 |
stress-ng.timer.ops_per_sec | [x86/perf] abd562df94: 6.1% improvement | nr_threads: 100% disk: 1HDD testtime: 60s class: interrupt test: timer cpufreq_governor: performance |
lkp-csl-2sp7 | v5.11-rc2 | merged at v5.12-rc1 |
stress-ng.timerfd.ops_per_sec | [x86/pv] ab234a260b: 6.6% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: timerfd cpufreq_governor: performance |
lkp-csl-2sp5 | v5.11-rc7 | merged at v5.12-rc1 |
stress-ng.vfork.ops_per_sec | [mm, slub] 3286222fc6: 32.1% improvement | nr_threads: 100% disk: 1HDD testtime: 60s sc_pid_max: 4194304 class: scheduler test: vfork cpufreq_governor: performance |
lkp-csl-2sp7 | v5.11-rc7 | merged at v5.11 |
stress-ng.xattr.ops_per_sec | [xfs] 06058bc405: 33.8% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: xfs class: filesystem test: xattr cpufreq_governor: performance |
lkp-csl-2sp7 | v5.11-rc4 | merged at v5.12-rc1 |
vm-scalability.throughput | [mm] f9ce0be71d: 2.2% improvement | runtime: 300s size: 2T test: shm-xread-seq-mt cpufreq_governor: performance |
lkp-csl-2ap4 | v5.11-rc4 | merged at v5.12-rc1 |
vm-scalability.throughput | [hugetlb] 4eae4efa2c: 1.1% improvement | runtime: 300s size: 8T test: anon-cow-seq-hugetlb cpufreq_governor: performance |
lkp-csl-2sp6 | v5.12-rc2 | merged at v5.12-rc3 |
will-it-scale.per_thread_ops | [io_uring] 7c30f36a98: 9.1% improvement | nr_task: 50% mode: thread test: unix1 cpufreq_governor: performance |
lkp-hsw-4ex1 | v5.12-rc2 | merged at v5.12-rc3 |
netperf.Throughput_tps
Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.
Scenario: tcp_crr test on Localhost
Commit 7bb39313cd was reported to have -4.5% regression of netperf.Throughput_tps when comparing to v5.11-rc2. It was merged to mainline at v5.12-rc1.
Correlated commits
7bb39313cd | x86/mce: Make mce_timed_out() identify holdout CPUs |
branch | linus/master |
report | [x86/mce] 7bb39313cd: -4.5% regression |
test scenario | ip: ipv4 runtime: 300s nr_threads: 16 cluster: cs-localhost test: TCP_CRR cpufreq_governor: performance |
test machine | lkp-csl-2ap3 |
status | merged at v5.12-rc1, test evironment dependent |
aim7.jobs-per-min
Aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.
Scenario: creat-clo test on ext4
Commit efc6134527 was reported to have -6.8% regression of aim7.jobs-per-min when comparing to v5.12-rc2. It was merged to mainline at v5.12-rc4.
Correlated commits
efc6134527 | ext4: shrink race window in ext4_should_retry_alloc() |
branch | linus/master |
report | [ext4] efc6134527: -6.8% regression |
test scenario | disk: 1BRD_48G fs: ext4 test: creat-clo load: 1000 cpufreq_governor: performance |
test machine | lkp-cpl-4sp1 |
status | merged at v5.12-rc4, author doesn't have comparable hardware to reproduce, 0-Day CI team is following up |
stress-ng.memfd.ops_per_sec
Stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
Scenario: 1hdd-ext4-os-memfd
Commit 802f1d522d was reported to have 8.7% improvement of stress-ng.memfd.ops_per_sec when comparing to v5.11. It was merged to mainline at v5.12-rc1.
Correlated commits
802f1d522d | mm: page_counter: re-layout structure to reduce false sharing |
branch | linus/master |
report | [mm] 802f1d522d: 8.7% improvement |
test scenario | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: memfd cpufreq_governor: performance |
test machine | lkp-csl-2sp5 |
status | merged at v5.12-rc1 |
shift-left testing
Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.12 release cycle, 0-Day CI had reported 18 major performance regressions and 12 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.
Report Summary
0-Day CI had reported 18 performance regressions and 12 improvements by doing shift-left testing on developer and maintainer repos.
Test Indicator | Test Scenario | Test Machine | Status | |
---|---|---|---|---|
aim7.jobs-per-min | [xfs] 571519716f: -8.5% regression | disk: 4BRD_12G md: RAID0 fs: xfs test: disk_src load: 3000 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged, no response from author yet |
aim7.jobs-per-min | [xfs] 7c60766161: -56.3% regression | disk: 4BRD_12G md: RAID0 fs: xfs test: disk_src load: 3000 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged, no response from author yet |
aim7.jobs-per-min | [xfs] 301157ab53: -65.3% regression | disk: 4BRD_12G md: RAID1 fs: xfs test: disk_rw load: 3000 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged, no response from author yet |
aim9.sync_disk_rw.ops_per_sec | [percpu] ace7e70901: -2.3% regression | testtime: 300s test: sync_disk_rw cpufreq_governor: performance |
lkp-knl-f1 | currently not merged, author WIP |
hackbench.throughput | [ZEN] 2d56caddc2: -20.5% regression | nr_threads: 100% iterations: 4 mode: threads ipc: pipe cpufreq_governor: performance |
lkp-skl-fpga01 | currently not merged, no response from author yet |
netperf.Throughput_Mbps | [mm/page_alloc] c14b25d7bc: -2.4% regression | ip: ipv4 runtime: 300s nr_threads: 50% cluster: cs-localhost send_size: 10K test: SCTP_STREAM_MANY cpufreq_governor: performance |
lkp-csl-2ap3 | currently not merged, no response from author yet |
netperf.Throughput_tps | [sched/fair] 322b5a8117: -84.0% regression | ip: ipv4 runtime: 300s nr_threads: 200% cluster: cs-localhost test: TCP_RR cpufreq_governor: performance |
lkp-csl-2ap4 | author droped the patch |
stress-ng.dnotify.ops_per_sec | [xfs] 7f83561097: -73.9% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: xfs class: filesystem test: dnotify cpufreq_governor: performance |
lkp-csl-2sp7 | currently not merged, no response from author yet |
stress-ng.eventfd.ops_per_sec | [seq_file] 5fd6060e50: -49.1% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: eventfd cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged, author WIP |
stress-ng.loop.ops_per_sec | [block] c76f48eb5c: -99.9% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: loop cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged, no response from author yet |
stress-ng.opcode.ops_per_sec | [clocksource] 6c52b5f3cf: -14.4% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: opcode cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged, author WIP |
stress-ng.procfs.ops_per_sec | [sched, debug] 3b87f136f8: -31.7% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: procfs cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged, author confirmed it's expected |
stress-ng.sigsegv.ops_per_sec | 08ed4efad6: -41.9% regression | nr_threads: 100% disk: 1HDD testtime: 60s class: interrupt test: sigsegv cpufreq_governor: performance |
lkp-ivb-2ep1 | currently not merged, author WIP |
stress-ng.vm-segv.ops_per_sec | [sched/fair] b360fb5e59: -13.9% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: vm test: vm-segv cpufreq_governor: performance |
lkp-csl-2sp7 | currently not merged, author accepted the regression but doubted the value of the regression |
stress-ng.vm-segv.ops_per_sec | [sched/fair] 38ac256d1c: -13.8% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: vm-segv cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged, author WIP |
vm-scalability.throughput | [mm] 4f09feb8bf: -4.3% regression | runtime: 300s test: lru-file-readonce cpufreq_governor: performance |
lkp-csl-2ap4 | currently not merged, no response from author yet |
will-it-scale.per_process_ops | [proc] 43b2a76b1a: -11.3% regression | nr_task: 16 mode: process test: eventfd1 cpufreq_governor: performance |
lkp-csl-2ap2 | currently not merged, no response from author yet |
will-it-scale.per_thread_ops | [mm] 81a779a1a4: -4.2% regression | nr_task: 100% mode: thread test: futex1 cpufreq_governor: performance |
lkp-csl-2ap2 | currently not merged, author WIP |
Improvement | ||||
---|---|---|---|---|
aim7.jobs-per-min | [xfs] 1fea323ff0: 2.4% improvement | disk: 4BRD_12G md: RAID1 fs: xfs test: disk_rw load: 3000 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged |
apachebench.requests_per_second | [x86] 0c8dacb551: 9.6% improvement | runtime: 300s concurrency: 1000 cluster: cs-localhost cpufreq_governor: performance |
lkp-bdw-de1 | currently not merged |
fsmark.files_per_sec | [btrfs] b05645404a: 81.3% improvement | iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: btrfs filesize: 4M test_size: 24G sync_method: NoSync cpufreq_governor: performance |
lkp-csl-2ap2 | currently not merged |
stress-ng.klog.ops_per_sec | [printk] 996e966640: 1097.4% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: klog cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged |
stress-ng.link.ops_per_sec | [f2fs] b5d15199a2: 175.3% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: f2fs class: filesystem test: link cpufreq_governor: performance |
lkp-ivb-2ep1 | currently not merged |
stress-ng.msg.ops_per_sec | [io_uring] 860d1bed91: 34.9% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: msg cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged |
stress-ng.sock.ops_per_sec | [sched/fair] d619f7afd7: 69.4% improvement | nr_threads: 100% disk: 1HDD testtime: 60s class: network test: sock cpufreq_governor: performance |
lkp-csl-2sp5 | currently not merged |
unixbench.score | [fs] aec499039e: 19.2% improvement | runtime: 300s nr_task: 30% test: syscall cpufreq_governor: performance |
lkp-csl-2sp4 | currently not merged |
vm-scalability.median | [mm] bcb0df12bc: 1.5% improvement | runtime: 300s size: 8T test: anon-cow-seq-hugetlb cpufreq_governor: performance |
lkp-csl-2sp6 | currently not merged |
vm-scalability.throughput | [mm] 599aa62474: 55.2% improvement | runtime: 300s size: 8T test: anon-cow-seq cpufreq_governor: performance |
lkp-csl-2ap4 | currently not merged |
will-it-scale.per_process_ops | [objtool/x86] 9bc0bb5072: 5.6% improvement | nr_task: 16 mode: process test: eventfd1 cpufreq_governor: performance |
lkp-csl-2ap2 | currently not merged |
will-it-scale.per_process_ops | [bpf] a10787e6d5: 3.5% improvement | nr_task: 16 mode: process test: mmap2 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged |
Will-it-scale.per_thread_ops
Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.
Scenario: futex1 Test
Commit 81a779a1a4 was reported to have -4.2% regression of will-it-scale.per_thread_ops when comparing to v5.12-rc1.
Correlated commits
81a779a1a4 | mm: introduce memfd_secret system call to create "secret" memory areas |
branch | rppt/memfd-secret/v18 |
report | [x86/mce] 7bb39313cd: -4.5% regression |
test scenario | nr_task: 100% mode: thread test: futex1 cpufreq_governor: performance |
test machine | lkp-csl-2ap2 |
status | merged at v5.12-rc1, test evironment dependent |
stress-ng.opcode.ops_per_sec
stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
Scenario: opcode test on ext4
Commit 6c52b5f3cf was reported to have -14.4% regression of stress-ng.opcode.ops_per_sec when comparing to v5.12-rc5.
Correlated commits
6c52b5f3cf | clocksource: Reduce WATCHDOG_THRESHOLD |
branch | rcu/dev.2021.04.13a |
report | [clocksource] 6c52b5f3cf: -14.4% regression |
test scenario | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: opcode cpufreq_governor: performance |
test machine | lkp-csl-2sp5 |
status | currently not merged, author WIP |
unixbench.score
UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.
Scenario: syscall test
Commit aec499039e was reported to have 19.2% improvement of unixbench.score when comparing to v5.12-rc5.
Correlated commits
aec499039e | fs: Optimized file struct to improve performance |
branch | linux-review/Shaokun-Zhang/fs-Optimized-file-struct-to-improve-performance/20210409-114859 |
report | [fs] aec499039e: 19.2% improvement |
test scenario | runtime: 300s nr_task: 30% test: syscall cpufreq_governor: performance |
test machine | lkp-csl-2sp4 |
status | currently not merged |
Latest Release Performance Comparing
This session gives some information about the performance difference among different kernel releases, especially between v5.12 and v5.10. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparison, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth checking. The root cause to cause the regressions won’t be covered in this session.
In the following figures, the value on the Y-axis is the relative performance number. We used the v5.10 data as the base (performance number is 100).
Test Suite: vm-scalability
Vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. Below 2 tests show the typical test results.
vm-scalability Test 1
vm-scalability Test 2
Here are the test configuration and performance test summary for above tests:
vm-scalability Test 1 | vm-scalability Test 2 | |
---|---|---|
test machine | model: Skylake brand: cpu_number: 104 memory: 192G |
model: Knights Mill brand: Intel® Xeon Phi™ CPU 7295 @ 1.50GHz cpu_number: 288 memory: 80G |
runtime | 350s | 300s |
size | 1T | 8T |
vm-scalability test parameter | test case: lru-shm | test case: anon-cow-seq-hugetlb |
performance summary | vm-scalability.throughput on kernel v5.12 has 3.04% improvement when comparing to v5.1 | vm-scalability.throughput on kernel v5.12 has -14.54% regression when comparing to v5.11 |
Test Suite: will-it-scale
Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.
will-it-scale Test 1
will-it-scale Test 2
Here are the parameters and performance test summary for above tests:
will-it-scale Test 1 | will-it-scale Test 2 | |
---|---|---|
test machine | model: Cascade Lake brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
model: Cascade Lake brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
nr_task | 16 | 100% |
will-it-scale test parameter | mode: thread test: futex2 |
mode: process test: futex3 |
summary | will-it-scale.per_thread_ops on kernel v5.12 has 3.94% improvement when comparing to v5.11 | will-it-scale.per_process_ops on kernel v5.12 has -78.7% regression when comparing to v5.11 |
Test Suite: unixbench
UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.
Unixbench Test 1
Here are the test configuration and performance test summary for above tests:
Unixbench Test 1 | |
---|---|
test machine | model: Cascade Lake brand: Intel® Xeon® CPU @ 2.30GHz cpu_number: 96 memory: 128G |
runtime | 300s |
nr_task | 1 |
unixbench test parameter | test: execl |
performance summary | unixbench.score on kernel v5.12 has -7.02% regression when comparing to v5.11 |
Test Suite: reaim
Reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.
reaim Test 1
Here are the test configuration and performance test summary for above tests:
reaim Test 1 | |
---|---|
test machine | model: Cascade Lake brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
runtime | 300s |
nr_task | 100% |
disk | No requirement |
fs | No requirement |
reaim test parameter | test case: short |
performance summary | reaim.jobs_per_min on kernel v5.12 has -10.39% regression when comparing to v5.11 |
Test Suite: pigz
Pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
pigz Test 1
Here are the test configuration and performance test summary for above tests:
pigz Test 1 | |
---|---|
test machine | model: Knights Mill brand: Intel® Xeon Phi™ CPU 7255 @ 1.10GHz cpu_number: 272 memory: 112G |
nr_threads | 100% |
pigz Test parameter | blocksize: 512K |
performance summary | pigz.throughput on kernel v5.12 has 2.09% improvement when comparing to v5.11 |
Test Suite: netperf
Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.
netperf Test 1
Here are the test configuration and performance test summary for above tests:
netperf Test 1 | |
---|---|
test machine | model: Cascade Lake brand: Intel® Xeon® Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
disable_latency_stats | 1 |
set_nic_irq_affinity | 1 |
runtime | 300s |
nr_threads | 50% |
ip | ipv4 |
netperf test parameter | test case: UDP_RR |
performance summary | netperf.Throughput_tps on kernel v5.12 has 5.75% improvement when comparing to v5.11 |
Test Suite: hackbench
Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's main job is to create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long it takes for each pair to send data back and forth.
hackbench Test 1
Here are the test configuration and performance test summary for above tests:
hackbench Test 1 | |
---|---|
test machine | model: Cascade Lake brand: Intel® Xeon® CPU @ 2.30GHz cpu_number: 96 memory: 128G |
disable_latency_stats | 1 |
nr_task | 50% |
unixbench test parameter | mode: threads ipc: socket |
performance summary | hackbench.throughput on kernel v5.12 has 8.8% improvement when comparing to v5.11 |
Test Suite: Fio
Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.
fio Test 1
Here are the test configuration and performance test summary for above tests:
fio Test 1 | |
---|---|
test machine | model: Cascade Lake brand: Intel® Xeon® CPU @ 2.20GHz cpu_number: 192 memory: 192G |
runtime | 300s |
file system | xfs |
disk | 1SSD |
boot_params | No requirement |
nr_task | 32 |
time_based | No requirement |
fio test parameter | fio-setup-basic: rw: randwrite bs: 4k ioengine: io_uring test_size: 256g |
performance summary | fio.write_iops on kernel v5.12 is almost the same as that in v5.11 |
Test Suite: ebizzy
Ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.
ebizzy Test 1
Here are the test configuration and performance test summary for above test:
ebizzy Test 1 | |
---|---|
test machine | model: Knights Mill brand: Intel® Xeon Phi™ CPU 7295 @ 1.50GHz cpu_number: 288 memory: 80G |
transparent_hugepage | No requirement |
nr_threads | 200% |
iterations | 100x |
ebizzy test parameter | duration: 10s |
performance summary | ebizzy.throughput on kernel v5.12 is almost the same as that in v5.11 |
Test Machines
IVB DESKTOP
model | Ivy Bridge |
brand | Intel® Core™ i3-3220 CPU @ 3.30GHz |
cpu number | 8 |
memory | 16G |
model | Ivy Bridge |
brand | Intel® Core™ i3-3220 CPU @ 3.30GHz |
cpu number | 4 |
memory | 8G |
SKL SP
model | Skylake |
brand | Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz |
cpu number | 80 |
memory | 64G |
BDW EP
model | Broadwell-EP |
brand | Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz |
cpu number | 88 |
memory | 128G |
HSW EP
model | Haswell-EP |
brand | Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz |
cpu number | 72 |
memory | 128G |
IVB EP
model | Ivy Bridge-EP |
brand | Intel® Xeon® CPU E5-2690 v2 @ 3.00GHz |
cpu number | 40 |
memory | 384G |
model | Ivytown Ivy Bridge-EP |
brand | Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz |
cpu number | 48 |
memory | 64G |
HSX EX
model | Brickland Haswell-EX |
brand | Intel® Xeon® CPU E7-8890 v3 @ 2.50GHz |
cpu number | 144 |
memory | 512G |
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.