June 20-22 Announcing HashiConf Europe full schedule: keynotes, sessions, labs & more Register Now
  • Overview
    • Batch Processing Workloads
    • Edge Workload Management
    • Non-Containerized Application Orchestration
    • Simple Container Orchestration
  • Enterprise
  • Tutorials
  • Docs
  • API
  • Plugins
  • Tools
  • Community
GitHub—Stars on GitHub
Download
    • v1.3.x (latest)
    • v1.2.x
    • v1.1.x
    • v1.0.x
    • v0.12.x
    • v0.11.x
    • Overview
    • Quickstart
      • Overview
      • Requirements
      • Reference Architecture
      • Deployment Guide
    • Windows Service
    • Overview
    • Specific Version Details
    • Overview
    • Consul
    • Consul Service Mesh
    • Vault Integration

    • Overview
    • Architecture
      • Overview
      • Base
      • Task Drivers
      • Devices
      • Storage
      • Overview
      • Internals
      • Preemption
    • Consensus Protocol
    • Filesystem
    • Gossip Protocol
    • Security Model
    • Overview
    • acl
    • audit
    • autopilot
    • client
    • consul
    • plugin
    • sentinel
    • search
    • server
    • server_join
    • telemetry
    • tls
    • ui
    • vault
    • Overview
      • Overview
      • bootstrap
      • policy apply
      • policy delete
      • policy info
      • policy list
      • token create
      • token delete
      • token info
      • token list
      • token self
      • token update
    • agent
    • agent-info
      • Overview
      • exec
      • fs
      • logs
      • restart
      • signal
      • status
      • stop
      • Overview
      • validate
      • Overview
      • fail
      • list
      • pause
      • promote
      • resume
      • status
      • unblock
      • Overview
      • list
      • status
      • Overview
      • allocs
      • deployments
      • dispatch
      • eval
      • history
      • init
      • inspect
      • plan
      • periodic force
      • promote
      • revert
      • run
      • scale
      • scaling-events
      • status
      • stop
      • validate
      • Overview
      • get
    • monitor
      • Overview
      • apply
      • delete
      • inspect
      • list
      • status
      • Overview
      • config
      • drain
      • eligibility
      • status
      • Overview
      • api
      • autopilot get-config
      • autopilot set-config
      • debug
      • keygen
      • keyring
      • metrics
      • raft info
      • raft list-peers
      • raft logs
      • raft remove-peer
      • raft state
      • snapshot agent
      • snapshot inspect
      • snapshot restore
      • snapshot save
      • snapshot state
      • Overview
      • status
      • Overview
      • apply
      • delete
      • init
      • inspect
      • list
      • status
      • Overview
      • apply
      • dismiss
      • info
      • list
      • Overview
      • policy info
      • policy list
      • Overview
      • apply
      • delete
      • list
      • read
      • Overview
      • force-leave
      • join
      • members
      • Overview
      • service delete
      • service info
      • service list
    • status
      • Overview
      • gc
      • reconcile summaries
    • ui
    • version
      • Overview
      • create
      • delete
      • deregister
      • detach
      • init
      • register
      • snapshot create
      • snapshot delete
      • snapshot list
      • status

    • Overview
      • Overview
      • Expressions
        • Overview
          • chunklist
          • coalesce
          • coalescelist
          • compact
          • concat
          • contains
          • distinct
          • element
          • flatten
          • index
          • keys
          • length
          • lookup
          • merge
          • range
          • reverse
          • setintersection
          • setproduct
          • setunion
          • slice
          • sort
          • values
          • zipmap
          • can
          • convert
          • try
          • bcrypt
          • md5
          • rsadecrypt
          • sha1
          • sha256
          • sha512
          • formatdate
          • timeadd
          • base64decode
          • base64encode
          • csvdecode
          • jsondecode
          • jsonencode
          • urlencode
          • yamldecode
          • yamlencode
          • abspath
          • basename
          • dirname
          • file
          • fileexists
          • fileset
          • pathexpand
          • cidrhost
          • cidrnetmask
          • cidrsubnet
          • cidrsubnets
          • abs
          • ceil
          • floor
          • log
          • max
          • min
          • parseint
          • pow
          • signum
          • chomp
          • format
          • formatlist
          • indent
          • join
          • lower
          • regex_replace
          • replace
          • split
          • strrev
          • substr
          • title
          • trim
          • trimprefix
          • trimspace
          • trimsuffix
          • upper
          • uuidv4
          • uuidv5
      • Locals
      • Syntax
      • Variables
    • artifact
    • affinity
    • check_restart
    • connect
    • constraint
    • csi_plugin
    • device
    • dispatch_payload
    • env
    • ephemeral_disk
    • expose
    • gateway
    • group
    • job
    • lifecycle
    • logs
    • meta
    • migrate
    • multiregion
    • network
    • parameterized
    • periodic
    • proxy
    • reschedule
    • resources
    • restart
    • scaling
    • service
    • sidecar_service
    • sidecar_task
    • spread
    • task
    • template
    • update
    • upstreams
    • vault
    • volume
    • volume_mount
    • Overview
    • Docker
    • Isolated Fork/Exec
    • Java
    • Podman
    • QEMU
    • Raw Fork/Exec
      • Overview
      • containerd
      • Firecracker driver
      • Jailtask driver
      • Lightrun
      • LXC
      • Pot
      • Rkt Deprecated
      • Rookout
      • Singularity
      • systemd-nspawn
      • Windows IIS
      • Overview
      • ECS
    • Overview
      • Overview
      • Nvidia
      • USB Beta
  • Schedulers
    • Overview
    • Runtime Environment
    • Variable Interpolation
    • Overview
      • Overview
      • apm
      • dynamic_application_sizing
      • http
      • nomad
      • policy
      • policy_eval
      • source
      • strategy
      • target
      • telemetry
    • API
    • CLI
    • Policy
    • Telemetry
      • Overview
        • Overview
        • Datadog
        • Nomad API
        • Prometheus
        • Overview
        • Dynamic Application Sizing Average
        • Dynamic Application Sizing Max
        • Dynamic Application Sizing Percentile
        • Fixed Value
        • Pass-Through
        • Target Value
        • Threshold
        • Overview
        • Amazon Web Services Autoscaling Group
        • Azure Virtual Machine Scale Set
        • Dynamic Application Sizing
        • Google Cloud Engine Managed Instance Group
        • Nomad Task Group
      • Community
      • Overview
      • Checks
      • Node Selector Strategy
        • Overview
        • Base
        • APM
        • Strategy
        • Target
    • Overview
    • Operating Nomad Agents
    • Monitoring Nomad
    • Metrics Reference
    • Cluster Management
    • Transport Security
    • Access Control

    • Overview
    • Alternative to Kubernetes
    • Supplement to Kubernetes
  • Nomad Ecosystem
  • Nomad Partnerships
  • Who Uses Nomad
    • Overview
      • Overview
      • FAQ
  • FAQ
Type '/' to Search

»Devices

Nomad has built-in support for scheduling compute resources such as CPU, memory, and networking. Nomad device plugins are used to support scheduling tasks with other devices, such as GPUs. They are responsible for fingerprinting these devices and working with the Nomad client to make them available to assigned tasks.

For a real world example of a Nomad device plugin implementation, see the Nvidia GPU plugin.

»Authoring Device Plugins

Authoring a device plugin in Nomad consists of implementing the DevicePlugin interface alongside a main package to launch the plugin.

The device plugin skeleton project exists to help bootstrap the development of new device plugins. It provides most of the boilerplate necessary for a device plugin, along with detailed comments.

»Lifecycle and State

A device plugin is long-lived. Nomad will ensure that one instance of the plugin is running. If the plugin crashes or otherwise terminates, Nomad will launch another instance of it.

However, unlike task drivers, device plugins do not currently have an interface for persisting state to the Nomad client. Instead, the device plugin API emphasizes fingerprinting devices and reporting their status. After helping to provision a task with a scheduled device, a device plugin does not have any responsibility (or ability) to monitor the task.

»Device Plugin API

The base plugin must be implemented in addition to the following functions.

»Fingerprint(context.Context) (<-chan *FingerprintResponse, error)

The Fingerprint function is called by the client when the plugin is started. It allows the plugin to provide Nomad with a list of discovered devices, along with their attributes, for the purpose of scheduling workloads using devices. The channel returned should immediately send an initial FingerprintResponse, then send periodic updates at an appropriate interval until the context is canceled.

Each fingerprint response consists of either an error or a list of device groups. A device group is a list of detected devices that are identical for the purpose of scheduling; that is, they will have identical attributes.

»Stats(context.Context, time.Duration) (<-chan *StatsResponse, error)

The Stats function returns a channel on which the plugin should emit device statistics, at the specified interval, until either an error is encountered or the specified context is cancelled. The StatsReponse object allows dimensioned statistics to be returned for each device in a device group.

»Reserve(deviceIDs []string) (*ContainerReservation, error)

The Reserve function accepts a list of device IDs and returns the information necessary for the client to make those devices available to a task. Currently, the ContainerReservation object allows the plugin to specify environment variables for the task, as well as a list of host devices and files to be mounted into the task's filesystem. Any orchestration required to prepare the device for use should also be performed in this function.

github logoEdit this page
DocsAPIResourcesPrivacySecurityPress KitConsent Manager