0-Day CI Linux Kernel Performance Report (V5.14)
Published: 10/09/2021
By Beibei Si
Introduction
0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:
- Section 2, test parameter description
- Section 3, merged regressions and improvements in v5.14 release candidates
- Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.14 release cycle
- Section 5, performance comparison among different kernel releases
- Section 6, test machine list
Test Parameters Descriptions
Here are the descriptions for each parameter/field used in the tests.
Classification | Name | Description |
---|---|---|
General | runtime | Run the test case within a certain time period (seconds or minutes) |
nr_task | If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1. If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number |
|
nr_threads | Alias of nr_task | |
iterations | Number to repeat this job | |
test_size | Test disk size or memory size | |
set_nic_irq_affinity | Set NIC interrupt affinity | |
disable_latency_stats | Latency_stats may introduce too much noise if there are too many context switches, allow to disable it | |
transparent_hugepage | Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage) | |
boot_params:bp1_memmap | Boot parameters of memmap | |
disk:nr_pmem | number of pmem partitions used by test | |
swap:priority | Priority means the priority of the swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value. | |
Test Machine | model | Name of Intel processor microarchitecture |
brand | Brand name of cpu | |
cpu_number | Number of cpu | |
memory | Size of memory |
Linux Kernel V5.14 Release Test
Linus has released the 5.14 kernel, and mentioned “So I realize you must all still be busy with all the galas and fancy balls and all the other 30th anniversary events, but at some point you must be getting tired of the constant glitz, the fireworks, and the champagne. That ball gown or tailcoat isn't the most comfortable thing, either. The celebrations will go on for a few more weeks yet, but you all may just need a breather from them. And when that happens, I have just the thing for you - a new kernel release to test and enjoy.” Headline features in 5.14 include: core scheduling (at last), the burstable CFS bandwidth controller, some initial infrastructure for BPF program loaders, the rq_qos I/O priority policy, some improvements to the SO_REUSEPORT networking option, the control-group "kill" button, the memfd_secret() system call, the quotactl_fd() system call, and much more. See the LWN merge-window summaries (part 1, part 2) for more details.
0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 9 regressions and 7 improvements during feature development phase for v5.14. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.
Observation Summary
0-Day CI observed 9 regressions and 7 improvements during the feature development phase for v5.14, which is in the time frame from v5.14-rc1 to v5.14 release.
Test Indicator | Test Scenario | Test Machine | Development Base | Status | |
---|---|---|---|---|---|
fsmark.files_per_sec | [xfs] a79b28c284: -4.6% regression | iterations: 1x nr_threads: 32t disk: 1SSD fs: xfs filesize: 8K test_size: 400M sync_method: fsyncBeforeClose nr_directories: 16d nr_files_per_directory: 256fpd cpufreq_governor: performance |
lkp-csl-2sp7 | v5.13-rc4 | merged at v5.14-rc1, no response from author yet |
stress-ng.loop.ops_per_sec | [pipe] 3a34b13a88: -12.6% regression | nr_threads: 100% iterations: 4 mode: process ipc: pipe cpufreq_governor: performance |
lkp-icl-2sp1 | v5.14-rc3 | merged at v5.14-rc4, author accepted the regression and thought it’s expected |
stress-ng.fallocate.ops_per_sec | [xfs] eef983ffea: -15.2% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: xfs class: filesystem test: fallocate cpufreq_governor: performance |
lkp-csl-2sp7 | v5.13-rc4 | merged at v5.14-rc1, no response from author, but the regression was gone in v5.14 |
tbench.throughput-MB/sec | [xfs] bad77c375e: -10.0% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: xfs class: filesystem test: fallocate cpufreq_governor: performance |
lkp-csl-2sp7 | v5.13-rc4 | merged at v5.14-rc1, no response from author yet |
stress-ng.link.ops_per_sec | [btrfs] ecc64fab7d: -81.7% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: btrfs class: filesystem test: link cpufreq_governor: performance |
lkp-csl-2sp7 | v5.13-rc4 | merged at v5.14-rc4, author accepted the regression and sent out a fixed patch 6e3688e66f2f, regression recovered to +443.3% |
stress-ng.lockbus.ops_per_sec | [clocksource] db3a34e174: -10.1% regression | nr_threads: 100% testtime: 60s class: cpu-cache test: lockbus cpufreq_governor: performance |
lkp-csl-2sp7 | v5.13-rc4 | merged at v5.14-rc1, no response from author but 0-Day CI team is following up |
stress-ng.mknod.ops_per_sec | [xfs] 2bf1ec0ff0: -45.4% regression | nr_threads: 10% disk: 1HDD testtime: 60s fs: xfs class: filesystem test: mknod cpufreq_governor: performance |
lkp-csl-2sp7 | v5.14-rc1 | merged at v5.14-rc4, no response from author but 0-Day team thought it’s acceptable |
stress-ng.sigio.ops_per_sec | [pipe] 3b844826b6: -99.3% regression | nr_threads: 100% disk: 1HDD testtime: 60s class: interrupt test: sigio cpufreq_governor: performance |
lkp-csl-2sp7 | v5.14-rc6 | merged at v5.14-rc7, author accepted the regression, Linus sent out a fixed patch fe67f4dd8daa, merged at v5.14 |
will-it-scale.per_process_ops | [mm/memcg] 5387c90490: -21.3% regression | nr_task: 50% mode: process test: unix1 cpufreq_governor: performance |
lkp-skl-fpga01 | v5.13 | merged at v5.14-rc1, no response from author but 0-Day CI team is following up |
Improvement | |||||
---|---|---|---|---|---|
fio.write_iops | [sched] 9edeaea1bc: 4.7% improvement | disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: write bs: 2M ioengine: mmap test_size: 200G cpufreq_governor: performance |
lkp-csl-2sp6 | v5.13-rc1 | merged at v5.14-rc1 |
aim9.sync_disk_rw.ops_per_sec | [io] 49e7f0c789: 6.5% improvement | disk: 1SSD fs: btrfs runtime: 300s nr_task: 8 rw: randwrite bs: 4k ioengine: io_uring test_size: 256g cpufreq_governor: performance |
lkp-csl-2ap1 | v5.14-rc1 | merged at v5.14-rc6 |
hackbench.throughput | [mm/memcg] 5387c90490: 41.3% improvement | nr_threads: 100% iterations: 4 mode: threads ipc: socket cpufreq_governor: performance |
lkp-skl-fpga01 | v5.13 | merged at v5.14-rc1 |
stress-ng.fanotify.ops_per_sec | [mm/memcg] 68ac5b3c8d: 42.4% improvement | nr_threads: 100% iterations: 4 mode: threads ipc: socket cpufreq_governor: performance |
lkp-csl-2ap4 | v5.13 | merged at v5.14-rc1 |
netperf.Throughput_tps | [iommu/vt] e93a67f5a0: 28.9% improvement | ip: ipv4 runtime: 300s nr_threads: 16 cluster: cs-localhost test: TCP_CRR cpufreq_governor: performance |
lkp-csl-2ap3 | v5.13-rc4 | merged at v5.14-rc1 |
stress-ng.msg.ops_per_sec | [trace] 3d3d9c072e: 18.5% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: msg cpufreq_governor: performance |
lkp-csl-2sp5 | v5.13-rc5 | merged at v5.14-rc1 |
will-it-scale.per_thread_ops | [kprobes] ec6aba3d2b: 3.8% improvement | nr_task: 100% mode: thread test: getppid1 cpufreq_governor: performance |
lkp-csl-2sp9 | v5.13-rc1 | merged at v5.14-rc1 |
Shift-left Testing
Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.14 release cycle, 0-Day CI had reported 9 major performance regressions and 8 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.
Report Summary
0-Day CI had reported 9 performance regressions and 8 improvements by doing shift-left testing on developer and maintainer repos.
Test Indicator | Test Scenario | Test Machine | Status | |
---|---|---|---|---|
aim7.jobs-per-min | [xfs] 6df693ed7b: -15.7% regression | disk: 4BRD_12G md: RAID1 fs: xfs test: disk_wrt load: 3000 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged, author accepted the regression and 0-Day CI team is working with the author to fix it |
aim7.jobs-per-min | [memcg] 45208c9105: -14.0% regression | disk: 1BRD_48G fs: xfs test: disk_rr load: 3000 cpufreq_governor: performance |
lkp-icl-2sp2 | currently not merged, author accepted the regression and is inprogress to fix it |
fxmark.hdd_ext4_no_jnl_DWTL_1_directio.works/sec | [loop] 2112f5c133: -49.6% regression | disk: 1HDD media: hdd test: DWTL fstype: ext4_no_jnl directio: directio cpufreq_governor: performance |
lkp-knm02 | merged at v5.15-rc1, no response from author yet |
netperf.Throughput_tps | [bpf] b89fbfbb85: -21.3% regression | ip: ipv4 runtime: 300s nr_threads: 16 cluster: cs- localhost test: TCP_CRR cpufreq_governor: performance |
lkp-csl-2ap3 | merged at v5.15-rc1, author couldn't reproduce in his environment, 0-Day CI team is following up |
stress-ng.memhotplug.ops_per_sec | [mm/migrate] 9eeb73028c: -53.8% regression | 10%-1HDD-60s-ext4-os-memhotplug-performance-ucode=0x5003006 | lkp-csl-2sp5 | currently not merged, author accepted the regression, and 0-Day CI team is working with the author to fix it |
will-it-scale.per_process_ops | [memcg] 059dd9003a: -39.8% regression | nr_task: 100% mode: process test: lock1 cpufreq_governor: performance |
lkp-icl-2sp1 | currently not merged, no response from author yet |
will-it-scale.per_process_ops | [memcg] 0f12156dff: -33.6% regression | nr_task: 50% mode: process test: lock1 cpufreq_governor: performance |
lkp-skl-fpga01 | merged at v5.15-rc1, author accepted the regression and the patch was reverted on latest kernel tree |
will-it-scale.per_thread_ops | [posix] 63a17eea7d: -43.8% regression | nr_task: 100% mode: thread test: lseek1 cpufreq_governor: performance |
lkp-skl-fpga01 | currently not merged, no response from author yet |
will-it-scale.per_thread_ops | [memcg] fa4e6b1ad5: -15.4% regression | nr_task: 50% mode: thread test: poll2 cpufreq_governor: performance |
lkp-hsw-4ex1 | currently not merged, no response from author yet |
Improvement | ||||
---|---|---|---|---|
aim7.jobs-per-min | [ext4] cc883236b7: 69.4% improvement | disk: 4BRD_12G md: RAID0 fs: ext4 test: disk_rw load: 3000 cpufreq_governor: performance |
lkp-csl-2sp9 | currently not merged |
filebench.sum_operations/s | [sched] 260916b537: 5.6% improvement | disk: 1HDD fs: f2fs test: filemicro_writefsync.f cpufreq_governor: performance |
lkp-knm02 | currently not merged |
fsmark.files_per_sec | [SUNRPC] e38b3f2005: 1857.1% improvement | iterations: 1x nr_threads: 1t disk: 1BRD_48G fs: f2fs fs2: nfsv4 filesize: 4M test_size: 24G sync_method: NoSync cpufreq_governor: performance |
lkp-csl-2ap2 | merged at v5.15-rc1 |
stress-ng.link.ops_per_sec | [btrfs] 6e3688e66f: 443.3% improvement | nr_threads: 10% disk: 1HDD testtime: 60s fs: btrfs class: filesystem test: link cpufreq_governor: performance |
lkp-csl-2sp7 | currently not merged |
stress-ng.loop.ops_per_sec | [loop] 8883efd909: 148.4% improvement | nr_threads: 100% disk: 1HDD testtime: 60s class: device test: loop cpufreq_governor: performance |
lkp-csl-2sp7 | currently not merged |
stress-ng.loop.ops_per_sec | [loop] acd1746478: 140.9% improvement | nr_threads: 100% disk: 1HDD testtime: 60s class: device test: loop cpufreq_governor: performance |
lkp-csl-2sp7 | currently not merged |
stress-ng.netdev.ops_per_sec | [net] b0e99d0377: 7.7% improvement | nr_threads: 100% testtime: 60s class: network test: netdev cpufreq_governor: performance |
lkp-csl-2sp5 | merged at v5.15-rc1 |
will-it-scale.per_thread_ops | [fsnotify] e43de7f086: 10.2% improvement | nr_task: 100% mode: thread test: eventfd1 cpufreq_governor: performance |
lkp-csl-2ap2 | merged at v5.15-rc1 |
Test Machines
IVB DESKTOP
model | Ivy Bridge |
brand | Intel® Core™ i3-3220 CPU @ 3.30GHz |
cpu number | 8 |
memory | 16G |
model | Ivy Bridge |
brand | Intel® Core™ i3-3220 CPU @ 3.30GHz |
cpu number | 4 |
memory | 8G |
SKL SP
model | Skylake |
brand | Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz |
cpu number | 80 |
memory | 64G |
BDW EP
model | Broadwell-EP |
brand | Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz |
cpu number | 88 |
memory | 128G |
HSW EP
model | Haswell-EP |
brand | Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz |
cpu number | 72 |
memory | 128G |
IVB EP
model | Ivy Bridge-EP |
brand | Intel® Xeon® CPU E5-2690 v2 @ 3.00GHz |
cpu number | 40 |
memory | 384G |
model | Ivytown Ivy Bridge-EP |
brand | Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz |
cpu number | 48 |
memory | 64G |
HSX EX
model | Brickland Haswell-EX |
brand | Intel® Xeon® CPU E7-8890 v3 @ 2.50GHz |
cpu number | 144 |
memory | 512G |
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.