• Overview
    • Batch Processing Workloads
    • Edge Workload Management
    • Non-Containerized Application Orchestration
    • Simple Container Orchestration
  • Enterprise
  • Tutorials
  • Docs
  • API
  • Plugins
  • Tools
  • Community
GitHub—Stars on GitHub
Download
    • v1.3.x (latest)
    • v1.2.x
    • v1.1.x
    • v1.0.x
    • v0.12.x
    • v0.11.x
    • Overview
    • Quickstart
      • Overview
      • Requirements
      • Reference Architecture
      • Deployment Guide
    • Windows Service
    • Overview
    • Specific Version Details
    • Overview
    • Consul
    • Consul Service Mesh
    • Vault Integration

    • Overview
    • Architecture
      • Overview
      • Base
      • Task Drivers
      • Devices
      • Storage
      • Overview
      • Internals
      • Preemption
    • Consensus Protocol
    • Filesystem
    • Gossip Protocol
    • Security Model
    • Overview
    • acl
    • audit
    • autopilot
    • client
    • consul
    • plugin
    • sentinel
    • search
    • server
    • server_join
    • telemetry
    • tls
    • ui
    • vault
    • Overview
      • Overview
      • bootstrap
      • policy apply
      • policy delete
      • policy info
      • policy list
      • token create
      • token delete
      • token info
      • token list
      • token self
      • token update
    • agent
    • agent-info
      • Overview
      • exec
      • fs
      • logs
      • restart
      • signal
      • status
      • stop
      • Overview
      • validate
      • Overview
      • fail
      • list
      • pause
      • promote
      • resume
      • status
      • unblock
      • Overview
      • list
      • status
      • Overview
      • allocs
      • deployments
      • dispatch
      • eval
      • history
      • init
      • inspect
      • plan
      • periodic force
      • promote
      • revert
      • run
      • scale
      • scaling-events
      • status
      • stop
      • validate
      • Overview
      • get
    • monitor
      • Overview
      • apply
      • delete
      • inspect
      • list
      • status
      • Overview
      • config
      • drain
      • eligibility
      • status
      • Overview
      • api
      • autopilot get-config
      • autopilot set-config
      • debug
      • keygen
      • keyring
      • metrics
      • raft info
      • raft list-peers
      • raft logs
      • raft remove-peer
      • raft state
      • snapshot agent
      • snapshot inspect
      • snapshot restore
      • snapshot save
      • snapshot state
      • Overview
      • status
      • Overview
      • apply
      • delete
      • init
      • inspect
      • list
      • status
      • Overview
      • apply
      • dismiss
      • info
      • list
      • Overview
      • policy info
      • policy list
      • Overview
      • apply
      • delete
      • list
      • read
      • Overview
      • force-leave
      • join
      • members
      • Overview
      • service delete
      • service info
      • service list
    • status
      • Overview
      • gc
      • reconcile summaries
    • ui
    • version
      • Overview
      • create
      • delete
      • deregister
      • detach
      • init
      • register
      • snapshot create
      • snapshot delete
      • snapshot list
      • status

    • Overview
      • Overview
      • Expressions
        • Overview
          • chunklist
          • coalesce
          • coalescelist
          • compact
          • concat
          • contains
          • distinct
          • element
          • flatten
          • index
          • keys
          • length
          • lookup
          • merge
          • range
          • reverse
          • setintersection
          • setproduct
          • setunion
          • slice
          • sort
          • values
          • zipmap
          • can
          • convert
          • try
          • bcrypt
          • md5
          • rsadecrypt
          • sha1
          • sha256
          • sha512
          • formatdate
          • timeadd
          • base64decode
          • base64encode
          • csvdecode
          • jsondecode
          • jsonencode
          • urlencode
          • yamldecode
          • yamlencode
          • abspath
          • basename
          • dirname
          • file
          • fileexists
          • fileset
          • pathexpand
          • cidrhost
          • cidrnetmask
          • cidrsubnet
          • cidrsubnets
          • abs
          • ceil
          • floor
          • log
          • max
          • min
          • parseint
          • pow
          • signum
          • chomp
          • format
          • formatlist
          • indent
          • join
          • lower
          • regex_replace
          • replace
          • split
          • strrev
          • substr
          • title
          • trim
          • trimprefix
          • trimspace
          • trimsuffix
          • upper
          • uuidv4
          • uuidv5
      • Locals
      • Syntax
      • Variables
    • artifact
    • affinity
    • check_restart
    • connect
    • constraint
    • csi_plugin
    • device
    • dispatch_payload
    • env
    • ephemeral_disk
    • expose
    • gateway
    • group
    • job
    • lifecycle
    • logs
    • meta
    • migrate
    • multiregion
    • network
    • parameterized
    • periodic
    • proxy
    • reschedule
    • resources
    • restart
    • scaling
    • service
    • sidecar_service
    • sidecar_task
    • spread
    • task
    • template
    • update
    • upstreams
    • vault
    • volume
    • volume_mount
    • Overview
      • Overview
      • capability
      • mount_options
      • topology_request
    • Overview
    • Docker
    • Isolated Fork/Exec
    • Java
    • Podman
    • QEMU
    • Raw Fork/Exec
      • Overview
      • containerd
      • Firecracker driver
      • Jailtask driver
      • Lightrun
      • LXC
      • Pot
      • Rkt Deprecated
      • Rookout
      • Singularity
      • systemd-nspawn
      • Windows IIS
      • Overview
      • ECS
    • Overview
      • Overview
      • Nvidia
      • USB Beta
  • Schedulers
    • Overview
    • Runtime Environment
    • Variable Interpolation
    • Overview
      • Overview
      • apm
      • dynamic_application_sizing
      • http
      • nomad
      • policy
      • policy_eval
      • source
      • strategy
      • target
      • telemetry
    • API
    • CLI
    • Policy
    • Telemetry
      • Overview
        • Overview
        • Datadog
        • Nomad API
        • Prometheus
        • Overview
        • Dynamic Application Sizing Average
        • Dynamic Application Sizing Max
        • Dynamic Application Sizing Percentile
        • Fixed Value
        • Pass-Through
        • Target Value
        • Threshold
        • Overview
        • Amazon Web Services Autoscaling Group
        • Azure Virtual Machine Scale Set
        • Dynamic Application Sizing
        • Google Cloud Engine Managed Instance Group
        • Nomad Task Group
      • Community
      • Overview
      • Checks
      • Node Selector Strategy
        • Overview
        • Base
        • APM
        • Strategy
        • Target
    • Overview
    • Operating Nomad Agents
    • Monitoring Nomad
    • Metrics Reference
    • Cluster Management
    • Transport Security
    • Access Control

    • Overview
    • Alternative to Kubernetes
    • Supplement to Kubernetes
  • Nomad Ecosystem
  • Nomad Partnerships
  • Who Uses Nomad
    • Overview
      • Overview
      • FAQ
  • FAQ
Type '/' to Search

»Metrics Reference

The Nomad agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute.

This data can be accessed via an HTTP endpoint or via sending a signal to the Nomad process. This data is available via HTTP at /metrics. See Metrics for more information.

To view this data via sending a signal to the Nomad process: on Unix, this is USR1 while on Windows it is BREAK. Once Nomad receives the signal, it will dump the current telemetry information to the agent's stderr.

This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.

Telemetry information can be streamed to both statsite as well as statsd based on providing the appropriate configuration options.

To configure the telemetry output please see the agent configuration.

Below is sample output of a telemetry dump:

[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204

»Metric Types

TypeDescriptionQuantiles
GaugeGauge types report an absolute number at the end of the aggregation intervalfalse
CounterCounts are incremented and flushed at the end of the aggregation interval and then are reset to zerotrue
TimerTimers measure the time to complete a task and will include quantiles, means, standard deviation, etc per interval.true

»Tagged Metrics

Nomad emits metrics in a tagged format. Each metric can support more than one tag, meaning that it is possible to do a match over metrics for datapoints such as a particular datacenter, and return all metrics with this tag. Nomad supports labels for namespaces as well.

»Key Metrics

The metrics in the table below are the most important metrics for monitoring the overall health of a Nomad cluster.

When telemetry is being streamed to statsite or statsd, interval in the table below is defined to be their flush interval. Otherwise, the interval can be assumed to be 10 seconds when retrieving metrics using the above described signals.

MetricsDescriptionUnitType
nomad.runtime.alloc_bytesMemory utilization# of bytesGauge
nomad.runtime.heap_objectsNumber of objects on the heap. General memory pressure indicator# of heap objectsGauge
nomad.runtime.num_goroutinesNumber of goroutines and general load pressure indicator# of goroutinesGauge
nomad.nomad.broker.total_blockedEvaluations that are blocked until an existing evaluation for the same job completes# of evaluationsGauge
nomad.nomad.broker.total_readyNumber of evaluations ready to be processed# of evaluationsGauge
nomad.nomad.broker.total_unackedEvaluations dispatched for processing but incomplete# of evaluationsGauge
nomad.nomad.heartbeat.activeNumber of active heartbeat timers. Each timer represents a Nomad Client connection# of heartbeat timersGauge
nomad.nomad.heartbeat.invalidateThe length of time it takes to invalidate a Nomad Client due to failed heartbeatsms / Heartbeat InvalidationTimer
nomad.nomad.plan.evaluateTime to validate a scheduler Plan. Higher values cause lower scheduling throughput. Similar to nomad.plan.submit but does not include RPC time or time in the Plan Queuems / Plan EvaluationTimer
nomad.nomad.plan.node_rejectedNumber of times a node has had a plan rejected. A node with a high rate of rejections may have an underlying issue causing it to be unschedulable. Refer to this link for more information# of rejected plansCounter
nomad.nomad.plan.queue_depthNumber of scheduler Plans waiting to be evaluated# of plansGauge
nomad.nomad.plan.submitTime to submit a scheduler Plan. Higher values cause lower scheduling throughputms / Plan SubmitTimer
nomad.nomad.rpc.queryNumber of RPC queriesRPC Queries / intervalCounter
nomad.nomad.rpc.request_errorNumber of RPC requests being handled that result in an errorRPC Errors / intervalCounter
nomad.nomad.rpc.requestNumber of RPC requests being handledRPC Requests / intervalCounter
nomad.nomad.vault.token_last_renewalTime since last successful Vault token renewalMillisecondsGauge
nomad.nomad.vault.token_next_renewalTime until next Vault token renewal attemptMillisecondsGauge
nomad.nomad.worker.invoke_scheduler.<type>Time to run the scheduler of the given typems / Scheduler RunTimer
nomad.nomad.worker.wait_for_indexTime waiting for Raft log replication from leader. High delays result in lower scheduling throughputms / Raft Index WaitTimer
nomad.raft.applyNumber of Raft transactionsRaft transactions / intervalCounter
nomad.raft.leader.lastContactTime since last contact to leader. General indicator of Raft latencyms / Leader ContactTimer
nomad.raft.replication.appendEntriesRaft transaction commit timems / Raft Log AppendTimer
nomad.license.expiration_time_epochTime as epoch (seconds since Jan 1 1970) at which license will expireSecondsGauge

»Client Metrics

The Nomad client emits metrics related to the resource usage of the allocations and tasks running on it and the node itself. Operators have to explicitly turn on publishing host and allocation metrics. Publishing allocation and host metrics can be turned on by setting the value of publish_allocation_metrics publish_node_metrics to true.

By default the collection interval is 1 second but it can be changed by the changing the value of the collection_interval key in the telemetry configuration block.

Please see the agent configuration page for more details.

As of Nomad 0.9, Nomad will emit additional labels for parameterized and periodic jobs. Nomad emits the parent job id as a new label parent_id. Also, the labels dispatch_id and periodic_id are emitted, containing the ID of the specific invocation of the parameterized or periodic job respectively. For example, a dispatch job with the id myjob/dispatch-1312323423423, will have the following labels.

LabelValue
jobmyjob/dispatch-1312323423423
parent_idmyjob
dispatch_id1312323423423

»Host Metrics

Nomad will emit tagged metrics, in the below format:

MetricDescriptionUnitTypeLabels
nomad.client.allocated.cpuTotal amount of CPU shares the scheduler has allocated to tasksMhzGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocated.memoryTotal amount of memory the scheduler has allocated to tasksMegabytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocated_diskTotal amount of disk space the scheduler has allocated to tasksMegabytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocations.blockedNumber of allocations blockedIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocations.migratingNumber of allocations migratingIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocations.pendingNumber of allocations pendingIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocations.runningNumber of allocations runningIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocations.startNumber of allocations startingIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocations.terminalNumber of allocations terminalIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.allocs.oom_killedNumber of allocations OOM killedIntegerGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.cpu.idleCPU utilization in idle statePercentageGaugecpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.cpu.systemCPU utilization in system spacePercentageGaugecpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.cpu.totalTotal CPU utilizationPercentageGaugecpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.cpu.userCPU utilization in user spacePercentageGaugecpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.disk.availableAmount of space which is availableBytesGaugedatacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.disk.inodes_percentDisk space consumed by the inodesPercentageGaugedatacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.disk.sizeTotal size of the deviceBytesGaugedatacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.disk.used_percentPercentage of disk space usedPercentageGaugedatacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.disk.usedAmount of space which has been usedBytesGaugedatacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.memory.availableTotal amount of memory available to processes which includes free and cached memoryBytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.memory.freeAmount of memory which is freeBytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.memory.totalTotal amount of physical memory on the nodeBytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.host.memory.usedAmount of memory used by processesBytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.unallocated.cpuTotal amount of CPU shares free for the scheduler to allocate to tasksMhzGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.unallocated.diskTotal amount of disk space free for the scheduler to allocate to tasksMegabytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.unallocated.memoryTotal amount of memory free for the scheduler to allocate to tasksMegabytesGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status
nomad.client.uptimeUptime of the host running the Nomad clientSecondsGaugedatacenter, host, node_class, node_id, node_scheduling_eligibility, node_status

»Allocation Metrics

The following metrics are emitted for each allocation if allocation metrics are enabled. Note that allocation metrics available may be dependent on the task driver; not all task drivers can provide all metrics.

MetricDescriptionUnitTypeLabels
nomad.client.allocs.cpu.allocatedTotal CPU resources allocated by the task across all coresMHzGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.cpu.systemTotal CPU resources consumed by the task in system spacePercentageGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.cpu.throttled_periodsTotal number of CPU periods that the task was throttledNanosecondsGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.cpu.throttled_timeTotal time that the task was throttledNanosecondsGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.cpu.total_percentTotal CPU resources consumed by the task across all coresPercentageGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.cpu.total_ticksCPU ticks consumed by the process in the last collection intervalIntegerGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.cpu.userTotal CPU resources consumed by the task in the user spacePercentageGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.allocatedAmount of memory allocated by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.cacheAmount of memory cached by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.kernel_max_usageMaximum amount of memory ever used by the kernel for this taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.kernel_usageAmount of memory used by the kernel for this taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.max_usageMaximum amount of memory ever used by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.rssAmount of RSS memory consumed by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.swapAmount of memory swapped by the taskBytesGaugealloc_id, host, job, namespace, task, task_group
nomad.client.allocs.memory.usageTotal amount of memory used by the taskBytesGaugealloc_id, host, job, namespace, task, task_group

»Job Summary Metrics

Job summary metrics are emitted by the Nomad leader server.

MetricDescriptionUnitTypeLabels
nomad.nomad.job_summary.completeNumber of complete allocations for a jobIntegerGaugehost, job, namespace, task_group
nomad.nomad.job_summary.failedNumber of failed allocations for a jobIntegerGaugehost, job, namespace, task_group
nomad.nomad.job_summary.lostNumber of lost allocations for a jobIntegerGaugehost, job, namespace, task_group
nomad.nomad.job_summary.unknownNumber of unknown allocations for a jobIntegerGaugehost, job, namespace, task_group
nomad.nomad.job_summary.queuedNumber of queued allocations for a jobIntegerGaugehost, job, namespace, task_group
nomad.nomad.job_summary.runningNumber of running allocations for a jobIntegerGaugehost, job, namespace, task_group
nomad.nomad.job_summary.startingNumber of starting allocations for a jobIntegerGaugehost, job, namespace, task_group

»Job Status Metrics

Job status metrics are emitted by the Nomad leader server.

MetricDescriptionUnitTypeLabels
nomad.nomad.job_status.deadNumber of dead jobsIntegerGaugehost
nomad.nomad.job_status.pendingNumber of pending jobsIntegerGaugehost
nomad.nomad.job_status.runningNumber of running jobsIntegerGaugehost

»Server Metrics

The following table includes metrics for overall cluster health in addition to those listed in Key Metrics above.

MetricDescriptionUnitTypeLabels
nomad.memberlist.gossipTime elapsed to broadcast gossip messagesNanosecondsSummaryhost
nomad.nomad.acl.bootstrapTime elapsed for ACL.Bootstrap RPC callNanosecondsSummaryhost
nomad.nomad.acl.delete_policiesTime elapsed for ACL.DeletePolicies RPC callNanosecondsSummaryhost
nomad.nomad.acl.delete_tokensTime elapsed for ACL.DeleteTokens RPC callNanosecondsSummaryhost
nomad.nomad.acl.get_policiesTime elapsed for ACL.GetPolicies RPC callNanosecondsSummaryhost
nomad.nomad.acl.get_policyTime elapsed for ACL.GetPolicy RPC callNanosecondsSummaryhost
nomad.nomad.acl.get_tokenTime elapsed for ACL.GetToken RPC callNanosecondsSummaryhost
nomad.nomad.acl.get_tokensTime elapsed for ACL.GetTokens RPC callNanosecondsSummaryhost
nomad.nomad.acl.list_policiesTime elapsed for ACL.ListPolicies RPC callNanosecondsSummaryhost
nomad.nomad.acl.list_tokensTime elapsed for ACL.ListTokens RPC callNanosecondsSummaryhost
nomad.nomad.acl.resolve_tokenTime elapsed for ACL.ResolveToken RPC callNanosecondsSummaryhost
nomad.nomad.acl.upsert_policiesTime elapsed for ACL.UpsertPolicies RPC callNanosecondsSummaryhost
nomad.nomad.acl.upsert_tokensTime elapsed for ACL.UpsertTokens RPC callNanosecondsSummaryhost
nomad.nomad.alloc.execTime elapsed to establish alloc execNanosecondsSummaryhost
nomad.nomad.alloc.get_allocTime elapsed for Alloc.GetAlloc RPC callNanosecondsSummaryhost
nomad.nomad.alloc.get_allocsTime elapsed for Alloc.GetAllocs RPC callNanosecondsSummaryhost
nomad.nomad.alloc.listTime elapsed for Alloc.List RPC callNanosecondsSummaryhost
nomad.nomad.alloc.stopTime elapsed for Alloc.Stop RPC callNanosecondsSummaryhost
nomad.nomad.alloc.update_desired_transitionTime elapsed for Alloc.UpdateDesiredTransition RPC callNanosecondsSummaryhost
nomad.nomad.blocked_evals.cpuAmount of CPU shares requested by blocked evalsIntegerGaugedatacenter, host, node_class
nomad.nomad.blocked_evals.memoryAmount of memory requested by blocked evalsIntegerGaugedatacenter, host, node_class
nomad.nomad.blocked_evals.job.cpuAmount of CPU shares requested by blocked evals of a jobIntegerGaugehost, job, namespace
nomad.nomad.blocked_evals.job.memoryAmount of memory requested by blocked evals of a jobIntegerGaugehost, job, namespace
nomad.nomad.blocked_evals.total_blockedCount of evals in the blocked stateIntegerGaugehost
nomad.nomad.blocked_evals.total_escapedCount of evals that have escaped computed node classes. This indicates a scheduler optimization was skipped and is not usually a source of concern.IntegerGaugehost
nomad.nomad.blocked_evals.total_quota_limitCount of blocked evals due to quota limitsIntegerGaugehost
nomad.nomad.broker.batch_readyCount of batch evals ready to be scheduledIntegerGaugehost
nomad.nomad.broker.batch_unackedCount of unacknowledged batch evalsIntegerGaugehost
nomad.nomad.broker.eval_waitingTime elapsed with evaluation waiting to be enqueuedNanosecondsGaugeeval_id, job, namespace
nomad.nomad.broker.service_readyCount of service evals ready to be scheduledIntegerGaugehost
nomad.nomad.broker.service_unackedCount of unacknowledged service evalsIntegerGaugehost
nomad.nomad.broker.system_readyCount of system evals ready to be scheduledIntegerGaugehost
nomad.nomad.broker.system_unackedCount of unacknowledged system evalsIntegerGaugehost
nomad.nomad.broker.total_readyCount of evals in the ready stateIntegerGaugehost
nomad.nomad.broker.total_waitingCount of evals waiting to be enqueuedIntegerGaugehost
nomad.nomad.client.batch_deregisterTime elapsed for Node.BatchDeregister RPC callNanosecondsSummaryhost
nomad.nomad.client.deregisterTime elapsed for Node.Deregister RPC callNanosecondsSummaryhost
nomad.nomad.client.derive_si_tokenTime elapsed for Node.DeriveSIToken RPC callNanosecondsSummaryhost
nomad.nomad.client.derive_vault_tokenTime elapsed for Node.DeriveVaultToken RPC callNanosecondsSummaryhost
nomad.nomad.client.emit_eventsTime elapsed for Node.EmitEvents RPC callNanosecondsSummaryhost
nomad.nomad.client.evaluateTime elapsed for Node.Evaluate RPC callNanosecondsSummaryhost
nomad.nomad.client.get_allocsTime elapsed for Node.GetAllocs RPC callNanosecondsSummaryhost
nomad.nomad.client.get_client_allocsTime elapsed for Node.GetClientAllocs RPC callNanosecondsSummaryhost
nomad.nomad.client.get_nodeTime elapsed for Node.GetNode RPC callNanosecondsSummaryhost
nomad.nomad.client.listTime elapsed for Node.List RPC callNanosecondsSummaryhost
nomad.nomad.client.registerTime elapsed for Node.Register RPC callNanosecondsSummaryhost
nomad.nomad.client.statsTime elapsed for Client.Stats RPC callNanosecondsSummaryhost
nomad.nomad.client.update_allocTime elapsed for Node.UpdateAlloc RPC callNanosecondsSummaryhost
nomad.nomad.client.update_drainTime elapsed for Node.UpdateDrain RPC callNanosecondsSummaryhost
nomad.nomad.client.update_eligibilityTime elapsed for Node.UpdateEligibility RPC callNanosecondsSummaryhost
nomad.nomad.client.update_statusTime elapsed for Node.UpdateStatus RPC callNanosecondsSummaryhost
nomad.nomad.client_allocations.garbage_collect_allTime elapsed for ClientAllocations.GarbageCollectAll RPC callNanosecondsSummaryhost
nomad.nomad.client_allocations.garbage_collectTime elapsed for ClientAllocations.GarbageCollect RPC callNanosecondsSummaryhost
nomad.nomad.client_allocations.restartTime elapsed for ClientAllocations.Restart RPC callNanosecondsSummaryhost
nomad.nomad.client_allocations.signalTime elapsed for ClientAllocations.Signal RPC callNanosecondsSummaryhost
nomad.nomad.client_allocations.statsTime elapsed for ClientAllocations.Stats RPC callNanosecondsSummaryhost
nomad.nomad.client_csi_controller.attach_volumeTime elapsed for Controller.AttachVolume RPC callNanosecondsSummaryhost
nomad.nomad.client_csi_controller.detach_volumeTime elapsed for Controller.DetachVolume RPC callNanosecondsSummaryhost
nomad.nomad.client_csi_controller.validate_volumeTime elapsed for Controller.ValidateVolume RPC callNanosecondsSummaryhost
nomad.nomad.client_csi_node.detach_volumeTime elapsed for Node.DetachVolume RPC callNanosecondsSummaryhost
nomad.nomad.deployment.allocationsTime elapsed for Deployment.Allocations RPC callNanosecondsSummaryhost
nomad.nomad.deployment.cancelTime elapsed for Deployment.Cancel RPC callNanosecondsSummaryhost
nomad.nomad.deployment.failTime elapsed for Deployment.Fail RPC callNanosecondsSummaryhost
nomad.nomad.deployment.get_deploymentTime elapsed for Deployment.GetDeployment RPC callNanosecondsSummaryhost
nomad.nomad.deployment.listTime elapsed for Deployment.List RPC callNanosecondsSummaryhost
nomad.nomad.deployment.pauseTime elapsed for Deployment.Pause RPC callNanosecondsSummaryhost
nomad.nomad.deployment.promoteTime elapsed for Deployment.Promote RPC callNanosecondsSummaryhost
nomad.nomad.deployment.reapTime elapsed for Deployment.Reap RPC callNanosecondsSummaryhost
nomad.nomad.deployment.runTime elapsed for Deployment.Run RPC callNanosecondsSummaryhost
nomad.nomad.deployment.set_alloc_healthTime elapsed for Deployment.SetAllocHealth RPC callNanosecondsSummaryhost
nomad.nomad.deployment.unblockTime elapsed for Deployment.Unblock RPC callNanosecondsSummaryhost
nomad.nomad.eval.ackTime elapsed for Eval.Ack RPC callNanosecondsSummaryhost
nomad.nomad.eval.allocationsTime elapsed for Eval.Allocations RPC callNanosecondsSummaryhost
nomad.nomad.eval.createTime elapsed for Eval.Create RPC callNanosecondsSummaryhost
nomad.nomad.eval.dequeueTime elapsed for Eval.Dequeue RPC callNanosecondsSummaryhost
nomad.nomad.eval.get_evalTime elapsed for Eval.GetEval RPC callNanosecondsSummaryhost
nomad.nomad.eval.listTime elapsed for Eval.List RPC callNanosecondsSummaryhost
nomad.nomad.eval.nackTime elapsed for Eval.Nack RPC callNanosecondsSummaryhost
nomad.nomad.eval.reapTime elapsed for Eval.Reap RPC callNanosecondsSummaryhost
nomad.nomad.eval.reblockTime elapsed for Eval.Reblock RPC callNanosecondsSummaryhost
nomad.nomad.eval.updateTime elapsed for Eval.Update RPC callNanosecondsSummaryhost
nomad.nomad.file_system.listTime elapsed for FileSystem.List RPC callNanosecondsSummaryhost
nomad.nomad.file_system.logsTime elapsed to establish FileSystem.Logs RPCNanosecondsSummaryhost
nomad.nomad.file_system.statTime elapsed for FileSystem.Stat RPC callNanosecondsSummaryhost
nomad.nomad.file_system.streamTime elapsed to establish FileSystem.Stream RPCNanosecondsSummaryhost
nomad.nomad.fsm.alloc_client_updateTime elapsed to apply AllocClientUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.alloc_update_desired_transitionTime elapsed to apply AllocUpdateDesiredTransition raft entryNanosecondsSummaryhost
nomad.nomad.fsm.alloc_updateTime elapsed to apply AllocUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_acl_policy_deleteTime elapsed to apply ApplyACLPolicyDelete raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_acl_policy_upsertTime elapsed to apply ApplyACLPolicyUpsert raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_acl_token_bootstrapTime elapsed to apply ApplyACLTokenBootstrap raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_acl_token_deleteTime elapsed to apply ApplyACLTokenDelete raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_acl_token_upsertTime elapsed to apply ApplyACLTokenUpsert raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_csi_plugin_deleteTime elapsed to apply ApplyCSIPluginDelete raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_csi_volume_batch_claimTime elapsed to apply ApplyCSIVolumeBatchClaim raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_csi_volume_claimTime elapsed to apply ApplyCSIVolumeClaim raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_csi_volume_deregisterTime elapsed to apply ApplyCSIVolumeDeregister raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_csi_volume_registerTime elapsed to apply ApplyCSIVolumeRegister raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_deployment_alloc_healthTime elapsed to apply ApplyDeploymentAllocHealth raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_deployment_deleteTime elapsed to apply ApplyDeploymentDelete raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_deployment_promotionTime elapsed to apply ApplyDeploymentPromotion raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_deployment_status_updateTime elapsed to apply ApplyDeploymentStatusUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_job_stabilityTime elapsed to apply ApplyJobStability raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_namespace_deleteTime elapsed to apply ApplyNamespaceDelete raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_namespace_upsertTime elapsed to apply ApplyNamespaceUpsert raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_plan_resultsTime elapsed to apply ApplyPlanResults raft entryNanosecondsSummaryhost
nomad.nomad.fsm.apply_scheduler_configTime elapsed to apply ApplySchedulerConfig raft entryNanosecondsSummaryhost
nomad.nomad.fsm.autopilotTime elapsed to apply Autopilot raft entryNanosecondsSummaryhost
nomad.nomad.fsm.batch_deregister_jobTime elapsed to apply BatchDeregisterJob raft entryNanosecondsSummaryhost
nomad.nomad.fsm.batch_deregister_nodeTime elapsed to apply BatchDeregisterNode raft entryNanosecondsSummaryhost
nomad.nomad.fsm.batch_node_drain_updateTime elapsed to apply BatchNodeDrainUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.cluster_metaTime elapsed to apply ClusterMeta raft entryNanosecondsSummaryhost
nomad.nomad.fsm.delete_evalTime elapsed to apply DeleteEval raft entryNanosecondsSummaryhost
nomad.nomad.fsm.deregister_jobTime elapsed to apply DeregisterJob raft entryNanosecondsSummaryhost
nomad.nomad.fsm.deregister_nodeTime elapsed to apply DeregisterNode raft entryNanosecondsSummaryhost
nomad.nomad.fsm.deregister_si_accessorTime elapsed to apply DeregisterSITokenAccessor raft entryNanosecondsSummaryhost
nomad.nomad.fsm.deregister_vault_accessorTime elapsed to apply DeregisterVaultAccessor raft entryNanosecondsSummaryhost
nomad.nomad.fsm.node_drain_updateTime elapsed to apply NodeDrainUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.node_eligibility_updateTime elapsed to apply NodeEligibilityUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.node_status_updateTime elapsed to apply NodeStatusUpdate raft entryNanosecondsSummaryhost
nomad.nomad.fsm.persistTime elapsed to apply Persist raft entryNanosecondsSummaryhost
nomad.nomad.fsm.register_jobTime elapsed to apply RegisterJob raft entryNanosecondsSummaryhost
nomad.nomad.fsm.register_nodeTime elapsed to apply RegisterNode raft entryNanosecondsSummaryhost
nomad.nomad.fsm.update_evalTime elapsed to apply UpdateEval raft entryNanosecondsSummaryhost
nomad.nomad.fsm.upsert_node_eventsTime elapsed to apply UpsertNodeEvents raft entryNanosecondsSummaryhost
nomad.nomad.fsm.upsert_scaling_eventTime elapsed to apply UpsertScalingEvent raft entryNanosecondsSummaryhost
nomad.nomad.fsm.upsert_si_accessorTime elapsed to apply UpsertSITokenAccessors raft entryNanosecondsSummaryhost
nomad.nomad.fsm.upsert_vault_accessorTime elapsed to apply UpsertVaultAccessor raft entryNanosecondsSummaryhost
nomad.nomad.job.allocationsTime elapsed for Job.Allocations RPC callNanosecondsSummaryhost
nomad.nomad.job.batch_deregisterTime elapsed for Job.BatchDeregister RPC callNanosecondsSummaryhost
nomad.nomad.job.deploymentsTime elapsed for Job.Deployments RPC callNanosecondsSummaryhost
nomad.nomad.job.deregisterTime elapsed for Job.Deregister RPC callNanosecondsSummaryhost
nomad.nomad.job.dispatchTime elapsed for Job.Dispatch RPC callNanosecondsSummaryhost
nomad.nomad.job.evaluateTime elapsed for Job.Evaluate RPC callNanosecondsSummaryhost
nomad.nomad.job.evaluationsTime elapsed for Job.Evaluations RPC callNanosecondsSummaryhost
nomad.nomad.job.get_job_versionsTime elapsed for Job.GetJobVersions RPC callNanosecondsSummaryhost
nomad.nomad.job.get_jobTime elapsed for Job.GetJob RPC callNanosecondsSummaryhost
nomad.nomad.job.latest_deploymentTime elapsed for Job.LatestDeployment RPC callNanosecondsSummaryhost
nomad.nomad.job.listTime elapsed for Job.List RPC callNanosecondsSummaryhost
nomad.nomad.job.planTime elapsed for Job.Plan RPC callNanosecondsSummaryhost
nomad.nomad.job.registerTime elapsed for Job.Register RPC callNanosecondsSummaryhost
nomad.nomad.job.revertTime elapsed for Job.Revert RPC callNanosecondsSummaryhost
nomad.nomad.job.scale_statusTime elapsed for Job.ScaleStatus RPC callNanosecondsSummaryhost
nomad.nomad.job.scaleTime elapsed for Job.Scale RPC callNanosecondsSummaryhost
nomad.nomad.job.stableTime elapsed for Job.Stable RPC callNanosecondsSummaryhost
nomad.nomad.job.validateTime elapsed for Job.Validate RPC callNanosecondsSummaryhost
nomad.nomad.job_summary.get_job_summaryTime elapsed for Job.Summary RPC callNanosecondsSummaryhost
nomad.nomad.leader.barrierTime elapsed to establish a raft barrier during leader transitionNanosecondsSummaryhost
nomad.nomad.leader.reconcileMemberTime elapsed to reconcile a serf peer with state storeNanosecondsSummaryhost
nomad.nomad.leader.reconcileTime elapsed to reconcile all serf peers with state storeNanosecondsSummaryhost
nomad.nomad.namespace.delete_namespacesTime elapsed for Namespace.DeleteNamespacesNanosecondsSummaryhost
nomad.nomad.namespace.get_namespaceTime elapsed for Namespace.GetNamespaceNanosecondsSummaryhost
nomad.nomad.namespace.get_namespacesTime elapsed for Namespace.GetNamespacesNanosecondsSummaryhost
nomad.nomad.namespace.list_namespaceTime elapsed for Namespace.ListNamespacesNanosecondsSummaryhost
nomad.nomad.namespace.upsert_namespacesTime elapsed for Namespace.UpsertNamespacesNanosecondsSummaryhost
nomad.nomad.periodic.forceTime elapsed for Periodic.Force RPC callNanosecondsSummaryhost
nomad.nomad.plan.applyTime elapsed to apply a planNanosecondsSummaryhost
nomad.nomad.plan.evaluateTime elapsed to evaluate a planNanosecondsSummaryhost
nomad.nomad.plan.node_rejectedNumber of times a node has had a plan rejectedIntegerCounterhost, node_id
nomad.nomad.plan.queue_depthCount of evals in the plan queueIntegerGaugehost
nomad.nomad.plan.submitTime elapsed for Plan.Submit RPC callNanosecondsSummaryhost
nomad.nomad.plan.wait_for_indexTime elapsed that planner waits for the raft index of the plan to be processedNanosecondsSummaryhost
nomad.nomad.plugin.deleteTime elapsed for CSIPlugin.Delete RPC callNanosecondsSummaryhost
nomad.nomad.plugin.getTime elapsed for CSIPlugin.Get RPC callNanosecondsSummaryhost
nomad.nomad.plugin.listTime elapsed for CSIPlugin.List RPC callNanosecondsSummaryhost
nomad.nomad.scaling.get_policyTime elapsed for Scaling.GetPolicy RPC callNanosecondsSummaryhost
nomad.nomad.scaling.list_policiesTime elapsed for Scaling.ListPolicies RPC callNanosecondsSummaryhost
nomad.nomad.search.prefix_searchTime elapsed for Search.PrefixSearch RPC callNanosecondsSummaryhost
nomad.nomad.vault.create_tokenTime elapsed to create Vault tokenNanosecondsGaugehost
nomad.nomad.vault.distributed_tokens_revokedCount of revoked tokensIntegerGaugehost
nomad.nomad.vault.lookup_tokenTime elapsed to lookup Vault tokenNanosecondsGaugehost
nomad.nomad.vault.renew_failedCount of failed attempts to renew Vault tokenIntegerGaugehost
nomad.nomad.vault.renewTime elapsed to renew Vault tokenNanosecondsGaugehost
nomad.nomad.vault.revoke_tokensTime elapsed to revoke Vault tokensNanosecondsGaugehost
nomad.nomad.vault.token_last_renewalTime since last successful Vault token renewalMillisecondsGaugehost
nomad.nomad.vault.token_next_renewalTime until next Vault token renewal attemptMillisecondsGaugehost
nomad.nomad.vault.token_ttlTime to live for Vault tokenMillisecondsGaugehost
nomad.nomad.vault.undistributed_tokens_abandonedCount of abandoned tokensIntegerGaugehost
nomad.nomad.volume.claimTime elapsed for CSIVolume.Claim RPC callNanosecondsSummaryhost
nomad.nomad.volume.deregisterTime elapsed for CSIVolume.Deregister RPC callNanosecondsSummaryhost
nomad.nomad.volume.getTime elapsed for CSIVolume.Get RPC callNanosecondsSummaryhost
nomad.nomad.volume.listTime elapsed for CSIVolume.List RPC callNanosecondsSummaryhost
nomad.nomad.volume.registerTime elapsed for CSIVolume.Register RPC callNanosecondsSummaryhost
nomad.nomad.volume.unpublishTime elapsed for CSIVolume.Unpublish RPC callNanosecondsSummaryhost
nomad.nomad.worker.create_evalTime elapsed for worker to create an evalNanosecondsSummaryhost
nomad.nomad.worker.dequeue_evalTime elapsed for worker to dequeue an evalNanosecondsSummaryhost
nomad.nomad.worker.invoke_scheduler_serviceTime elapsed for worker to invoke the schedulerNanosecondsSummaryhost
nomad.nomad.worker.send_ackTime elapsed for worker to send acknowledgementNanosecondsSummaryhost
nomad.nomad.worker.submit_planTime elapsed for worker to submit planNanosecondsSummaryhost
nomad.nomad.worker.update_evalTime elapsed for worker to submit updated evalNanosecondsSummaryhost
nomad.nomad.worker.wait_for_indexTime elapsed that worker waits for the raft index of the eval to be processedNanosecondsSummaryhost
nomad.raft.appliedIndexCurrent index applied to FSMIntegerGaugehost
nomad.raft.barrierCount of blocking raft API callsIntegerCounterhost
nomad.raft.commitNumLogsCount of logs enqueuedIntegerGaugehost
nomad.raft.commitTimeTime elapsed to commit writesNanosecondsSummaryhost
nomad.raft.fsm.applyTime elapsed to apply write to FSMNanosecondsSummaryhost
nomad.raft.fsm.enqueueTime elapsed to enqueue write to FSMNanosecondsSummaryhost
nomad.raft.lastIndexMost recent index seenIntegerGaugehost
nomad.raft.leader.dispatchLogTime elapsed to write log, mark in flight, and start replicationNanosecondsSummaryhost
nomad.raft.leader.dispatchNumLogsCount of logs dispatchedIntegerGaugehost
nomad.raft.replication.appendEntriesRaft transaction commit timems / Raft Log AppendTimer
nomad.raft.state.candidateCount of entering candidate stateIntegerGaugehost
nomad.raft.state.followerCount of entering follower stateIntegerGaugehost
nomad.raft.state.leaderCount of entering leader stateIntegerGaugehost
nomad.raft.transition.heartbeat_timeoutCount of failing to heartbeat and starting electionIntegerGaugehost
nomad.raft.transition.leader_lease_timeoutCount of stepping down as leader after losing quorumIntegerGaugehost
nomad.runtime.free_countCount of objects freed from heap by go runtime GCIntegerGaugehost
nomad.runtime.gc_pause_nsGo runtime GC pause timesNanosecondsSummaryhost
nomad.runtime.sys_bytesGo runtime GC metadata size# of bytesGaugehost
nomad.runtime.total_gc_pause_nsTotal elapsed go runtime GC pause timesNanosecondsGaugehost
nomad.runtime.total_gc_runsCount of go runtime GC runsIntegerGaugehost
nomad.serf.queue.EventCount of memberlist events receivedIntegerSummaryhost
nomad.serf.queue.IntentCount of memberlist changesIntegerSummaryhost
nomad.serf.queue.QueryCount of memberlist queriesIntegerSummaryhost
nomad.scheduler.allocs.rescheduled.attemptedCount of attempts to reschedule an allocationIntegerCountalloc_id, job, namespace, task_group
nomad.scheduler.allocs.rescheduled.limitMaximum number of attempts to reschedule an allocationIntegerCountalloc_id, job, namespace, task_group
nomad.scheduler.allocs.rescheduled.wait_untilTime that a rescheduled allocation will be delayedFloatGaugealloc_id, job, namespace, task_group, follow_up_eval_id
nomad.state.snapshotIndexCurrent snapshot indexIntegerGaugehost

»Raft BoltDB Metrics

Raft database metrics are emitted by the raft-boltdb library.

MetricDescriptionUnitType
nomad.raft.boltdb.numFreePagesNumber of free pagesIntegerGauge
nomad.raft.boltdb.numPendingPagesNumber of pending pagesIntegerGauge
nomad.raft.boltdb.freePageBytesNumber of free page bytesIntegerGauge
nomad.raft.boltdb.freelistBytesNumber of freelist bytesIntegerGauge
nomad.raft.boltdb.totalReadTxnCount of total read transactionsIntegerCounter
nomad.raft.boltdb.openReadTxnNumber of current open read transactionsIntegerGauge
nomad.raft.boltdb.txstats.pageCountNumber of pages in useIntegerGauge
nomad.raft.boltdb.txstats.pageAllocNumber of page allocationsIntegerGauge
nomad.raft.boltdb.txstats.cursorCountCount of total database cursorsIntegerCounter
nomad.raft.boltdb.txstats.nodeCountCount of total database nodesIntegerCounter
nomad.raft.boltdb.txstats.nodeDerefCount of total database node dereferencesIntegerCounter
nomad.raft.boltdb.txstats.rebalanceCount of total rebalance operationsIntegerCounter
nomad.raft.boltdb.txstats.rebalanceTimeSample of rebalance operation timesNanosecondsSummary
nomad.raft.boltdb.txstats.splitCount of total split operationsIntegerCounter
nomad.raft.boltdb.txstats.spillCount of total spill operationsIntegerCounter
nomad.raft.boltdb.txstats.spillTimeSample of spill operation timesNanosecondsSummary
nomad.raft.boltdb.txstats.writeCount of total write operationsIntegerCounter
nomad.raft.boltdb.txstats.writeTimeSample of write operation timesNanosecondsSummary
github logoEdit this page
DocsAPIResourcesPrivacySecurityPress KitConsent Manager