-
Notifications
You must be signed in to change notification settings - Fork 245
Ganglia Quick Start
Ganglia monitoring suite consists of three main parts: gmond, gmetad and web interface, usually called ganglia-web.
- gmond is a daemon which needs to sit on every single node which needs to be monitored, gather monitoring statistics, send as well as receive the stats to and from within the same multicast or unicast channel
- If it's a sender (mute=no) it will collect basic metrics such as System Load (load_one), CPU Utilization. It can also send user defined metrics through addition of C/Python modules.
- If it's a receiver (deaf=no) it will aggregate all metrics sent to it from other hosts. It will keep an in memory cache of all metrics
- gmetad - is a daemon that polls gmonds periodically and stores their metrics into a storage engine like RRD. It can poll multiple clusters and aggregate the metrics. It is also used by the web frontend in generating the UI.
- ganglia-web – this component explains itself. It should sit on the same machine as gmetad as it needs access to the RRD files.
In general you will need one receiving gmond per cluster and one gmetad per site
Easiest way to install it is to use binary packages. On Ubuntu/Debian you can install using apt-get e.g.
apt-get install ganglia-monitor ganglia-monitor-python gmetad
By default gmond communicates on UDP port 8649 (specified in udp_send_channel and udp_recv_channel) and gmetad downloads metrics data over TCP port 8649 (or different depending what is specified as tcp_accept_channel). If you have any rules that block traffic on those ports your metrics will not show up.
If you have only a handful machines we recommend using a single cluster as that is the easiest thing to set up and configure. The only other decision you need to make is whether you want to use multicast or unicast transport. Per the [wiki:FAQ]
Multicast mode is the default setting and is the simplest to setup and also provides redundancy. Environments that are sensitive to "jitter" may consider setting up Ganglia in unicast mode, which significantly reduces the chatter but is a bit more complex to configure. Environments such as Amazon's AWS EC2 offerings do not support multicast, so unicast is the only setup option available.
If you are using multicast transport you shouldn't need to configure anything as that is the default that Ganglia packages come with. The only thing you may need to do is point your gmetad to one or few of the hosts that are running gmond. There is no need to list every single host since a gmond set in receive mode will contain the list of all hosts and metrics in the cluster
# /etc/gmetad.conf on monhost data_source "MyCluster" monhost
To configure unicast you should designate one (or more) machines to be receivers. For example I will pick host mon1 to be my receiver. mon1's gmond.conf should look like this (I'm just showing portions above module { block)
globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /* Remove host from UI after it hasn't report for a day */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } cluster { name = "Production" owner = "unspecified" latlong = "unspecified" url = "unspecified" } host { location = "unspecified" } udp_send_channel { host = mon1 port = 8649 ttl = 1 } udp_recv_channel { port = 8649 } tcp_accept_channel { port = 8649 }
On all the other machines you will need to configure only this
globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = yes allow_extra_data = yes host_dmax = 86400 /* Remove host from UI after it hasn't report for a day */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } cluster { name = "Production" owner = "unspecified" latlong = "unspecified" url = "unspecified" } host { location = "unspecified" } udp_send_channel { host = mon1 port = 8649 ttl = 1 }
Please notice that send_metadata_interval is set to 30 (seconds). Metrics in Ganglia are sent separately from it's metadata. Metadata contains information like metric group, type etc. In case you restart receiving gmond metadata will be lost and gmond will not know what to do with the metric data and it will be discarded. This may result in blank graphs. In multicast mode gmonds can talk to each other and will ask for metadata if it's missing. This is not possible in unicast mode thus you need to instruct gmond to periodically send metadata.
Now in your gmetad.conf put
# /etc/gmetad.conf on mon1 data_source "Production" mon1
Restart everything and you should be set :-).
Image(ganglia_multiple_clusters1.png)
As you can see from the diagram above, let’s say we have three clusters on the same broadcast (same network), but instead of having three separate Ganglia web interfaces and gmetad collector daemons we can have one on node0.c1 node, which then can collect stats from three different multicast (in our case) channels.
So what components are needed on what server:
- ganglia-gmond is needed on every single node
- ganglia-gmetad and ganglia-web is needed on node0.c1 only (let’s say we want to dedicate
And here is the setup snippets of configuration files:
/etc/gmond.conf identical on ClusterOne nodes (node0, node1, node2, node3) – I will specify the part which is the most important:
# /etc/gmond.conf - on ClusterOne cluster { name = "ClusterOne" owner = "unspecified" latlong = "unspecified" url = "unspecified" } udp_send_channel { mcast_join = 239.2.11.71 port = 8661 ttl = 1 } udp_recv_channel { mcast_join = 239.2.11.71 port = 8661 bind = 239.2.11.71 } tcp_accept_channel { port = 8661 }
/etc/gmond.conf identical on ClusterTwo nodes (node0, node1, node2, node3):
# /etc/gmond.conf - on ClusterTwo cluster { name = "ClusterTwo" owner = "unspecified" latlong = "unspecified" url = "unspecified" } udp_send_channel { mcast_join = 239.2.11.71 port = 8662 ttl = 1 } udp_recv_channel { mcast_join = 239.2.11.71 port = 8662 bind = 239.2.11.71 } tcp_accept_channel { port = 8662 }
/etc/gmond.conf identical on ClusterThree nodes (node0, node1, node2, node3):
# /etc/gmond.conf - on ClusterThree cluster { name = "ClusterThree" owner = "unspecified" latlong = "unspecified" url = "unspecified" } udp_send_channel { mcast_join = 239.2.11.71 port = 8663 ttl = 1 } udp_recv_channel { mcast_join = 239.2.11.71 port = 8663 bind = 239.2.11.71 } tcp_accept_channel { port = 8663 }
/etc/gmetad.conf – only exists on node0.c1 (again the most important part below):
# /etc/gmetad.conf on node0.c1 data_source "ClusterOne" node0.c1:8661 node1.c1:8661 data_source "ClusterTwo" node0.c2:8662 node1.c2:8662 data_source "ClusterThree" node3.c2:8663 node1.c3:8663
Notice, we did not list all the nodes as data sources above for each cluster (imagine if you had like a thousand nodes per cluster :-) ), the reason why it is not necessary. Imagine this as a three different pools, every one of them has its own virtual boundaries. So what happens is, the gmetad daemon accesses the configured data sources for data, say if one node dies the other one will still be able to provide stats to gmetad, because gmond nodes exchange stats within their configured UDP channels.
Now all you have to do is to configure your web server on node0.c1, start gmetad (default location for RRDs is /var/lib/ganglia/rrds) and start gmond services on all the clusters. You should have working monitoring system for your three clusters on a single node.
Multiple cluster configuration has been adapted from post written by Vaidas Jablonskis http://jablonskis.org/2011/monitoring-multiple-clusters-using-ganglia/