»Nomad Autoscaler Telemetry

The Nomad Autoscaler agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. To configure the telemetry output please see the agent configuration.

This data can be accessed via the /v1/metrics HTTP endpoint, via sending a signal to the Nomad Autoscaler process or via a number of integrations.

To view this data via sending a signal to the Nomad Autoscaler process: on Unix, this is USR1 while on Windows it is BREAK. Once Nomad Autoscaler receives the signal, it will dump the current telemetry information to the agent's stderr.

This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.

Below is sample output of a telemetry dump:

[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 219856.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4316568.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 36243.000
[2020-08-25 10:01:20 +0100 BST][S] 'nomad-autoscaler.runtime.gc_pause_ns': Count: 5 Min: 38083.000 Mean: 69764.400 Max: 122291.000 Stddev: 31487.808 Sum: 348822.000 LastUpdated: 2020-08-25 10:01:26.574809 +0100 BST m=+1.241576679
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4370504.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 220853.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 37240.000

»Runtime Metrics

The runtime metrics help understand the Nomad Autoscaler agent's memory and load pressure performance.

nomad-autoscaler.runtime.num_goroutinesNumber of running goroutinesGauge
nomad-autoscaler.runtime.alloc_bytesThe number of allocated heap bytesGauge
nomad-autoscaler.runtime.sys_bytesThe total bytes of memory obtained from the OSGauge
nomad-autoscaler.runtime.malloc_countCumulative count of heap objects allocatedGauge
nomad-autoscaler.runtime.free_countCumulative count of heap objects freedGauge
nomad-autoscaler.runtime.heap_objectsNumber of allocated heap objectsGauge
nomad-autoscaler.runtime.total_gc_pause_nsCumulative nanoseconds in GC stop-the-world pausesGauge
nomad-autoscaler.runtime.total_gc_runsNumber of completed GC cyclesGauge
nomad-autoscaler.runtime.gc_pause_nsNumber of nanoseconds to complete the last GC cycleTimer

»Policy Metrics

Policy metrics provide insights into the performance of the Nomad Autoscaler's policy handling.

nomad-autoscaler.policy.total_numThe number of policies currently held within the autoscalerGauge
nomad-autoscaler.policy.source.error_countTracks the number of errors generated by the policy sourcesCounterpolicy_source

»Scaling Metrics

Scaling metrics provide insight into the performance of scaling actions as well as overall success and failure counters.

nomad-autoscaler.scale.evaluate_msThe time taken to evaluate the checks within a single policyTimerpolicy_id, target_name
nomad-autoscaler.scale.invoke_msThe time taken to invoke scaling based on the scaling evaluationsTimerpolicy_id, target_name
nomad-autoscaler.scale.invoke.success_countTracks the number of successful scaling actions triggeredCounter
nomad-autoscaler.scale.invoke.error_countTracks the number of unsuccessful scaling actions triggeredCounter

»Plugin Metrics

Plugin metrics provide insight into the performance of Nomad Autoscaler plugins and help identify potential bottle necks or latency issues.

nomad-autoscaler.plugin.manager.access_msThe time taken to dispense a pluginTimer
nomad-autoscaler.target.status.invoke_msThe time taken to perform the target plugin status callTimerpolicy_id, plugin_name
nomad-autoscaler.target.scale.invoke_msThe time taken to perform the target plugin scale callTimerpolicy_id, plugin_name
nomad-autoscaler.apm.query.invoke_msThe time taken to perform the APM plugin query callTimerpolicy_id, plugin_name
nomad-autoscaler.strategy.run.invoke_msThe time taken to perform the strategy plugin run callTimerpolicy_id, plugin_name