Analysis as to why a node may crash multiple times during reboot (Intel® Rack Scale Design Direct)
Various processes segfault (Segmentation Fault Error) may show up. The following is an output provided as an example:
2018-06-11T02:21:11.407233+02:00 jrc5065 kernel: chroma[22449]: segfault at 0 ip 00002b9380c262e9 sp 00007ffefc169c40 error 4
2018-06-11T02:21:11.407405+02:00 jrc5065 kernel: chroma[22451]: segfault at 0 ip 00002ab0069dd2e9 sp 00007ffe41d38640 error 4
2018-06-11T02:21:11.407490+02:00 jrc5065 kernel: chroma[22443]: segfault at 0 ip 00002b1b4fcde2e9 sp 00007ffdf555a0c0 error 4
2018-06-11T02:21:11.407760+02:00 jrc5065 kernel: chroma[22447]: segfault at 0 ip 00002b36481552e9 sp 00007ffd349e1bc0 error 4
2018-07-10T15:17:56.674483+02:00 jrc5065 kernel: systemd-udevd[1812]: segfault at 67247d ip 000055b97d97fd54 sp 00007ffeb37912d0 error 6 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:56.728418+02:00 jrc5065 kernel: systemd-udevd[1815]: segfault at 55b97d9aecf0 ip 000055b97d97bdf8 sp 00007ffeb37912d0 error 7 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:56.903902+02:00 jrc5065 kernel: systemd-udevd[1839]: segfault at 55b97d9aecf0 ip 000055b97d97bdf8 sp 00007ffeb37912d0 error 7 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:56.932901+02:00 jrc5065 kernel: systemd-udevd[1810]: segfault at 55b97d9aecf0 ip 000055b97d97bdf8 sp 00007ffeb37912d0 error 7 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:56.979885+02:00 jrc5065 kernel: systemd-udevd[1826]: segfault at 55b97d9aecf0 ip 000055b97d97bdf8 sp 00007ffeb37912d0 error 7 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:57.273517+02:00 jrc5065 kernel: systemd-udevd[1821]: segfault at 5 ip 00007ff4618ffcff sp 00007ffeb3790d00 error 4 in liblzma.so.5.2.2[7ff4618eb000+25000]
2018-07-10T15:17:57.273677+02:00 jrc5065 kernel: systemd-udevd[1831]: segfault at 67247d ip 000055b97d97fd54 sp 00007ffeb37912d0 error 6 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:57.273750+02:00 jrc5065 kernel: systemd-udevd[1835]: segfault at 7 ip 000055b97d97bdff sp 00007ffeb37912d0 error 6 in systemd-udevd[55b97d976000+4b000]
2018-07-10T15:17:57.277861+02:00 jrc5065 kernel: systemd-udevd[1859]: segfault at 5 ip 00007ff4618ffcff sp 00007ffeb3790d00 error 4 in liblzma.so.5.2.2[7ff4618eb000+25000]
2018-07-10T15:18:42.602983+02:00 jrc5065 kernel: mmremote[6059]: segfault at fffffffffffee0bc ip 00000000004adfde sp 00007ffeb9a2b2c0 error 7 in mmksh[400000+128000]
2018-07-10T15:19:26.350965+02:00 jrc5065 kernel: runmmfs[5832]: segfault at f200f70a ip 00007f713e553706 sp 00007ffcdc458328 error 6 in libc-2.17.so[7f713e422000+1b8000]
2018-07-10T15:21:21.926079+02:00 jrc5065 kernel: mmremote[16345]: segfault at fffffffffffee0bc ip 00000000004adfde sp 00007fff2e2f3d90 error 7 in mmksh[400000+128000]
}}}
Nothing shows up in (System Event Log), though.
Contact support with the system event log1 and sos report2 (if possible/available) for diagnosis should you need further assistance.
1 How to Extract and Read the System Event Log (SEL) for Intel® Server Boards
2 Notes on the sos report:
. It is an utility that gathers configuration and diagnostic information about the system.
. It needs to be installed first via "sudo apt-get install sosreport" command.
. Reboot the system after the install; then, run the utility.