Deeperf

Dive Deeper into Performance


  • Home

  • About

  • Tags

  • Archives

Troubleshooting: gcloud compute ssh could not fetch instance

Posted on 2019-05-14 | Edited on 2019-05-15

Issues

On gcloud compute, I created an instance in project gvm. Then the command gcloud compute ssh failed to log into that instance. The error message is

1
2
3
4
5
$ gcloud compute ssh instance-1

$ ERROR: (gcloud.compute.ssh) Could not fetch instance:

- The resource 'projects/gvm/zones/us-west1-a/instances/instance-1' was not found

Troubleshooting

According to this Google Cloud Document. The reason for this case is the instance zone and gcloud default zone are different. The command line didn’t specified the instance zone. So the google cloud compute default zone was used. Obviously, The instance should not be found in the default zone. Just adding the zone option in command could solve the problem. The command format is like

1
gcloud compute --project "gvm" ssh --zone "instance-zone" "instance-name"

The cmd info could be fount in vm instances page in gcloud console.

  • find the instance
  • scroll the SSH list
  • choose View gcloud command

20190514_gcloud_ssh_cmd

This post is based on the original blog. The text and snapshot are updated.

A Performance Issue Caused by the TSC Clock Source Missing in Linux

Posted on 2019-04-30 | Edited on 2019-05-09

TL;DR Solution

Update Linux kernel to the latest version.

Simple Conclusion

Early Linux kernel doesn’t support the latest Intel’s CPU architecture very well. The Linux OS uses hpet clock source instead of TSC. It caused low performance and other issues.

Issues

Servers

  • Overclocked core i9 server which is the new one. Its CPU frequency is more than 4.6 GHz.
  • Other servers’ CPU are E5 2600 v3/v4 series. Generally, the CPU frequency is 3.x GHz.
    OS
  • Ubuntu 14.04 is running on all of them for convenience.
    Low Performance on i9 server
    The same program is supposed to be faster on i9 server. But my test shows the program is much slower on the new i9 server. The key function takes about 1 microsecond on old servers. It takes 10 microseconds on the new server.

Troubleshooting

  • According to perf result on i9 server, the key function spent most of CPU time on system call gettimeofday().
  • I made a simple test program to see what is the time cost of calling gettimeofday() on each machine.
  • That program showed calling gettimeofday() took 4.5 microseconds on i9 machine. It only took about 100 nanoseconds on E5 series machine.
  • The function gettimeofday() is supposed to use ticks count value in CPU register which only costs couples CPU circles. There should be something wrong in clock source.
  • Checked the clock source on i9 server. It is using hpet clock source.

    1
    2
    $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
    hpet
  • Checked available clock sources on i9 server. Only hpet source is there. The default clock source TSC is missing.

    1
    2
    $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
    hpet
  • It’s clear now. gettimeofday() uses the hpet clock source that caused the performance issues. Hpet clock is from a chip on the motherboard. It costs much more than TSC.

More details

Why TSC is missing
Checked the Linux kernel repo and Red Hat Linux web site, an early Linux kernel couldn’t get the right clock frequency of skylake CPU. Actually, that number is hardcode in the kernel. The boot process would fail to calibrate clock by TSC source. Then the kernel dropped TSC and used hpet as clock source.

Why hpet is slower than TSC
HPET is from timers in a chip on motherboard. The communication between CPU and this chip is slower than getting the value in CPU register.

Clocks sources in Linux OS

  • TSC is the default clock source in Linux OS. Many years ago, TSC depends on CPU frequency which couldn’t change after boot. But on recent intel’s CPU, the frequency could decrease to save energy or boost up for performance. In a modern intel’s CPU, TSC is from an individual frequency in CPU which wouldn’t change. The accurate is less than 1 nanosecond on recent CPU.
  • HPET is from timers in a chip on the motherboard. Those timers’ frequency is at least 10 MHz which is much lower than CPU frequency.
  • There are many other clock sources. Please find more details in this page clock sources in Linux

Links

  • Got wrong CPU frequency issue in Linux Kernel
  • A bug report of skylake frequency issue
  • The CPU frequency error messages in booting.
  • TSC clocksource is not available in RHEL 6 and 7
  • clock sources in Linux

Xing

2 posts
11 tags
© 2019 Xing
Powered by Hexo v3.8.0
|
Theme – NexT.Mist v7.1.1