TL;DR Solution
Update Linux kernel to the latest version.
Simple Conclusion
Early Linux kernel doesn’t support the latest Intel’s CPU architecture very well. The Linux OS uses hpet clock source instead of TSC. It caused low performance and other issues.
Issues
Servers
- Overclocked core i9 server which is the new one. Its CPU frequency is more than 4.6 GHz.
- Other servers’ CPU are E5 2600 v3/v4 series. Generally, the CPU frequency is 3.x GHz.
OS - Ubuntu 14.04 is running on all of them for convenience.
Low Performance on i9 server
The same program is supposed to be faster on i9 server. But my test shows the program is much slower on the new i9 server. The key function takes about 1 microsecond on old servers. It takes 10 microseconds on the new server.
Troubleshooting
- According to
perf
result on i9 server, the key function spent most of CPU time on system callgettimeofday()
. - I made a simple test program to see what is the time cost of calling
gettimeofday()
on each machine. - That program showed calling
gettimeofday()
took 4.5 microseconds on i9 machine. It only took about 100 nanoseconds on E5 series machine. - The function
gettimeofday()
is supposed to use ticks count value in CPU register which only costs couples CPU circles. There should be something wrong in clock source. Checked the clock source on i9 server. It is using hpet clock source.
1
2$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpetChecked available clock sources on i9 server. Only hpet source is there. The default clock source TSC is missing.
1
2$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpetIt’s clear now.
gettimeofday()
uses the hpet clock source that caused the performance issues. Hpet clock is from a chip on the motherboard. It costs much more than TSC.
More details
Why TSC is missing
Checked the Linux kernel repo and Red Hat Linux web site, an early Linux kernel couldn’t get the right clock frequency of skylake CPU. Actually, that number is hardcode in the kernel. The boot process would fail to calibrate clock by TSC source. Then the kernel dropped TSC and used hpet as clock source.
Why hpet is slower than TSC
HPET is from timers in a chip on motherboard. The communication between CPU and this chip is slower than getting the value in CPU register.
Clocks sources in Linux OS
- TSC is the default clock source in Linux OS. Many years ago, TSC depends on CPU frequency which couldn’t change after boot. But on recent intel’s CPU, the frequency could decrease to save energy or boost up for performance. In a modern intel’s CPU, TSC is from an individual frequency in CPU which wouldn’t change. The accurate is less than 1 nanosecond on recent CPU.
- HPET is from timers in a chip on the motherboard. Those timers’ frequency is at least 10 MHz which is much lower than CPU frequency.
- There are many other clock sources. Please find more details in this page clock sources in Linux