Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status: show guest VM health (cpu/ram/disk) #3574

Open
tstromberg opened this issue Jan 23, 2019 · 15 comments
Open

status: show guest VM health (cpu/ram/disk) #3574

tstromberg opened this issue Jan 23, 2019 · 15 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@tstromberg
Copy link
Contributor

The status command should make it obvious when one is out of guest VM resources - particularly disk space.

@tstromberg tstromberg added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 23, 2019
@afbjorklund
Copy link
Collaborator

You can use ssh, to run some simple ad-hoc Linux commands to do this monitoring...

$ minikube ssh -- top -b -n 1 | head -n 4
top - 20:47:27 up  1:01,  1 user,  load average: 0.00, 0.01, 0.00
Tasks: 142 total,   1 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu0  :   0.0/6.2     6[||||||                                                                                              ]
%Cpu1  :   0.0/0.0     0[                                                                                                    ]
$ minikube ssh -- free -m
              total        used        free      shared  buff/cache   available
Mem:           1990          72        1127          16         790        1871
Swap:           999           0         999
$ minikube ssh -- df -h /var/lib/docker
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        17G   49M   16G   1% /var/lib/docker

Nothing wrong with having the basic information available with a simple command, though.

@tstromberg tstromberg added priority/backlog Higher priority than priority/awaiting-more-evidence. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 23, 2019
@afbjorklund
Copy link
Collaborator

Maybe this will muddy the "status" command output, and better off as a separate cmd (or extra flag) ?

This library looked quite useful, for making a health check binary that can be deployed on the VM:

/~https://github.com/shirou/gopsutil

Strange that there is no built-in monitoring of node disk usage, only cpu and ram ? (cAdvisor has it)

cadvisor

/~https://github.com/google/cadvisor/tree/master/deploy/kubernetes

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2019
@afbjorklund
Copy link
Collaborator

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2019
@afbjorklund
Copy link
Collaborator

afbjorklund commented May 4, 2019

Here's a little hack to show the available information:

package main

import (
	"fmt"
	"os"
	"time"

	"github.com/shirou/gopsutil/cpu"
	"github.com/shirou/gopsutil/disk"
	"github.com/shirou/gopsutil/mem"
)

func busy(percent []float64) float64 {
	return percent[0] // single
}

func idle(percent []float64) float64 {
	return 100.0 - busy(percent)
}

func megs(bytes uint64) float64 {
	return float64(bytes) / 1024.0 / 1024.0
}

func main() {
	interval := time.Duration(1) * time.Second
	path := "/var/lib/minikube"
	if _, err := os.Stat(path); os.IsNotExist(err) {
		path = "/"
	}

	i, _ := cpu.Info()
	p, _ := cpu.Percent(interval, false)
	v, _ := mem.VirtualMemory()
	s, _ := mem.SwapMemory()
	d, _ := disk.Usage(path)

	fmt.Printf("CPU\tNumber:%d, Idle:%.1f%%, Busy:%.1f%%\n",
		len(i), idle(p), busy(p))
	fmt.Printf("MEM\tTotal:%.0f, Available:%.0f, Used:%.1f%%\n",
		megs(v.Total), megs(v.Available), v.UsedPercent)
	fmt.Printf("SWP\tTotal:%.0f, Free:%.0f, Used:%.1f%%\n",
		megs(s.Total), megs(s.Free), s.UsedPercent)
	fmt.Printf("HDD\tTotal:%.0f, Free:%.0f, Used:%.1f%%\n",
		megs(d.Total), megs(d.Free), d.UsedPercent)
}

Typical output:

$ ./health 
CPU	Number:2, Idle:95.2%, Busy:4.8%
MEM	Total:1991, Available:1285, Used:24.9%
SWP	Total:0, Free:0, Used:0.0%
HDD	Total:17368, Free:14847, Used:9.2%

@tstromberg tstromberg added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 6, 2019
@tstromberg tstromberg added the r/2019q2 Issue was last reviewed 2019q2 label May 22, 2019
@tstromberg
Copy link
Contributor Author

tstromberg commented Sep 20, 2019

I noticed today that kubectl describe node shows health data that almost certainly has an API call:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason       
                Message
  ----             ------  -----------------                 ------------------                ------       
                -------
  MemoryPressure   False   Fri, 20 Sep 2019 13:38:48 -0700   Fri, 20 Sep 2019 11:47:38 -0700   KubeletHasSuf
ficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 20 Sep 2019 13:38:48 -0700   Fri, 20 Sep 2019 11:47:38 -0700   KubeletHasNoD
iskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 20 Sep 2019 13:38:48 -0700   Fri, 20 Sep 2019 11:47:38 -0700   KubeletHasSuf
ficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 20 Sep 2019 13:38:48 -0700   Fri, 20 Sep 2019 11:50:07 -0700   KubeletReady 
                kubelet is posting ready status

@tstromberg
Copy link
Contributor Author

This issue is still relevent in minikube v1.6. It'd be nice if someone plumbed kubectl describe get node calls into minikube status. help wanted!

@prasadkatti
Copy link
Contributor

This issue is still relevent in minikube v1.6. It'd be nice if someone plumbed kubectl describe get node calls into minikube status. help wanted!

@tstromberg - kubectl desc node -A is now part of minikube logs, merged in #7105

@afbjorklund
Copy link
Collaborator

We can get some of this from the metrics-server, by using kubectl top node.

But it would be nice to be able to get the "raw" data from the VM as well ?
Using SSH could be a fallback, but it should be possible to ask hypervisor.

http://www.virtualbox.org/manual/ch08.html#vboxmanage-metrics
VBoxManage metrics query minikube

Some available:

Object          Metric                                   Values
--------------- ---------------------------------------- --------------------------------------------
minikube        Guest/CPU/Load/User                      8.00%
minikube        Guest/CPU/Load/Kernel                    4.00%
minikube        Guest/CPU/Load/Idle                      86.00%
minikube        Guest/RAM/Usage/Total                    5954124 kB
minikube        Guest/RAM/Usage/Free                     5166640 kB
minikube        Guest/RAM/Usage/Cache                    1079048 kB

When using KIC, we need to make sure to not display the data for the host.
So we need to ask the container runtime used, what applies to the container.

$ docker stats docker --no-stream --format '{{json .}}'
{"BlockIO":"245MB / 0B","CPUPerc":"23.86%","Container":"minikube","ID":"6c28dceb5e35","MemPerc":"9.91%","MemUsage":"792.8MiB / 7.812GiB","Name":"docker","NetIO":"92.1kB / 297kB","PIDs":"455"}

@NixBiks
Copy link

NixBiks commented Sep 22, 2020

What to do when running out of space?

> minikube ssh -- df -h /var/lib/docker

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        10G  9.9G     0 100% /var/lib/docker

I know I can run minikube delete of course but it is really time consuming to rebuild everything.

@tstromberg
Copy link
Contributor Author

@mr-bjerre - if you are on Windows/macOS, try giving Docker more space:

https://docs.docker.com/docker-for-mac/space/

@NixBiks
Copy link

NixBiks commented Sep 28, 2020

Skaffold more space, right?

What I usually do is

minikube ssh - docker system prune 

@sharifelgamal sharifelgamal removed the r/2019q2 Issue was last reviewed 2019q2 label Oct 21, 2020
@afbjorklund
Copy link
Collaborator

afbjorklund commented Oct 30, 2020

I added some parsers for free -m and df -m output, for showing the total available.
They could be used for getting the "available" and "/var/lib/docker" (or containers) too

4197935

The CPU usage is trickier, since /proc/stat only has counters (reason for "sleep" above)
Probably easier to just call some existing program, I do believe that most have "vmstat"

vmstat 1 2

  The first report produced gives averages since the last reboot.   Addi‐
  tional  reports  give information on a sampling period of length delay.
  The process and memory reports are instantaneous in either case.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 3501344  12540 1763764    0    0    30  7024 1491 2562  6  4 89  0  0
 2  0      0 3500864  12540 1763764    0    0     0    48 2417 4753  5  3 92  0  0

5% user, 3% system, 92% idle

@afbjorklund
Copy link
Collaborator

afbjorklund commented Nov 1, 2020

Added implementation of a vmstat parser as well, in 4c78a65

        count := runtime.NumCPU()
        busy, idle, _ := util.LocalCPU()
        fmt.Printf("Local CPU: #%d %d%% %d%%\n", count, busy, idle)

        rr, _ := cr.RunCmd(exec.Command("nproc"))
        count, _ = strconv.Atoi(strings.TrimSpace(rr.Stdout.String()))
        rr, _ = cr.RunCmd(exec.Command("vmstat", "1", "2"))
        busy, idle, _ = util.ParseVMStat(rr.Stdout.String())
        fmt.Printf("Remote CPU: #%d %d%% %d%%\n", count, busy, idle)

Since it is based on integers, the numbers don't always add up:

Local CPU: #8 7% 92%
Remote CPU: #2 4% 97%

The local implementation just uses gopsutil, per above: 03d950f

Put the code in "generic-status" for now, needs better status cmd...

@backtrackshubham
Copy link

Hi probably related off topic question, I was wondering if we could somehow use a spilt of memory for minikube like minimal amount from RAM + Swap sort of memory from disk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

8 participants