Mesos

Mesos

Mesos Master

Mesos master exposes metrics via HTTP/Json out of the box, so all you need is just to symlink checks_available/check_mesos_master.py to checks_enabled and restart Agent.

Install
cd ${OE_AGENT_HOME}/checks_enabled
ln -s ../checks_available/check_mesos_master.py ./
Configure

At most of cases there is no need to configure Agent, but if you have non default installation of Mesos, or if you need to monitor Mesos Master, which is not running locally for Agent node, edit conf/bigdata.ini and make your changes at Mesos-Master section.

[Mesos-Master]
stats: http://127.0.0.1:5050/metrics/snapshot
Restart
${OE_AGENT_HOME}/oddeye.sh restart
Provides
Name Description Type Unit
mesos_allocator_mesos_allocation_run_ms Duration of running mesos allocations gauge Milliseconds
mesos_allocator_mesos_allocation_run_ms_p99 99 Percentile Duration of running mesos allocations gauge Milliseconds
mesos_master_cpus_percent Percent of in Use mesos master CPUs gauge Percent
mesos_master_cpus_revocable_percent Mesos Master's CPUs revocable percentage gauge Percent
mesos_master_cpus_used Amount used Mesos master CPUs gauge Integer
mesos_master_disk_percent Mesos Master's dids usage in percent gauge Percent
mesos_master_disk_revocable_percent Mesos Master disks revocable in percent gauge Percent
mesos_master_disk_used Meso Master disks usage in bytes gauge Bytes
mesos_master_event_queue_dispatches Master's dispatched event queue gauge Integer
mesos_master_event_queue_http_requests Master's HTTP requests to event queue gauge Integer
mesos_master_event_queue_messages Message in Masters even queue gauge Integer
mesos_master_frameworks_active Amount of current active frameworks gauge Integer
mesos_master_gpus_percent Percent of in Use mesos master GPUs gauge Percent
mesos_master_gpus_used Amount used Mesos master GPUs gauge Integer
mesos_master_mem_percent Masters memory usage in percent gauge Percent
mesos_master_mem_used Masters memory usage in bytes gauge Bytes
mesos_master_messages_kill_task Mater kill tasks gauge Integer
mesos_master_messages_reregister_framework Master registering frameworks gauge Integer
mesos_master_slaves_connected Amount of connected slaves gauge Integer
mesos_master_tasks_dropped Amount of dropped tasks counter Integer
mesos_master_tasks_error Amount of tasks with errors counter Integer
mesos_master_tasks_failed Amount of failed tasks counter Integer
mesos_master_tasks_finished Amount of finished tasks counter Integer
mesos_master_tasks_gone Amount of gone tasks gauge Integer
mesos_master_tasks_lost Amount of lost tasks gauge Integer
mesos_master_tasks_running Amount of running tasks gauge Integer
mesos_master_tasks_staging Amount of staging tasks gauge Integer
mesos_master_tasks_starting Amount of starting tasks gauge Integer
mesos_master_tasks_unreachable Amount of unreachable tasks gauge Integer
mesos_registrar_state_fetch_ms Registrars state fetched in milliseconds gauge Milliseconds
mesos_registrar_state_store_ms_p99 99 percentile of registrars state fetched in milliseconds gauge Milliseconds

Mesos Slave

Install
cd ${OE_AGENT_HOME}/checks_enabled
ln -s ../checks_available/check_mesos_slave.py ./
Configure

At most of cases there is no need to configure Agent, but if you have non default installation of Mesos, or if you need to monitor Mesos Master, which is not running locally for Agent node, edit conf/bigdata.ini and make your changes at Mesos-Master section.

[Mesos-Slave]
stats: http://127.0.0.1:5051/metrics/snapshot
Restart
${OE_AGENT_HOME}/oddeye.sh restart
Provides
Name Description Type Unit
mesos_slave_cpus_percent Slaves CPU usage in percent gauge Percent
mesos_slave_cpus_revocable_used Slaves CPU revocable usage gauge Integer
mesos_slave_cpus_used Slave CPUs used gauge Integer
mesos_slave_disk_percent Slave disks usage percentage gauge Percent
mesos_slave_disk_used Slave disks usage bytes gauge Bytes
mesos_slave_executors_running Amount of running executors gauge Integer
mesos_slave_executors_terminated Amount of terminated executors counter Integer
mesos_slave_executors_terminating Amount of terminating executors gauge Integer
mesos_slave_frameworks_active Active frameworks on current node gauge Integer
mesos_slave_gpus_revocable_used Slaves GPU revocable usage gauge Integer
mesos_slave_gpus_used Slaves GPU usage in percent gauge Percent
mesos_slave_invalid_status_updates Invalid status update gauge Integer
mesos_slave_mem_percent Slave used memory percentage gauge Percent
mesos_slave_mem_revocable_percent Slave memory revocable percentage gauge Percent
mesos_slave_mem_used Slave used memory in bytes gauge Bytes
mesos_slave_recovery_errors Recover errors gauge Integer
mesos_slave_tasks_failed Amount of failed tasks on current node gauge Integer
mesos_slave_tasks_killing Amount of failed killing on current node gauge Integer
mesos_slave_tasks_lost Amount of lost killed on current node gauge Integer
mesos_slave_tasks_running Amount of running tasks on current node counter Integer