Kubernetes is a powerful platform that when it comes to the deployment, scaling and the management of containerized applications as it helps the user to automate them all. One of the most difficult problems in dealing with large Kubernetes clusters is the maintenance of resources such as the power and hardware capabilities of the nodes where workloads can be scheduled. NFD or Node Feature Discovery is one of the most important components of kubernetes because it enables kubernetes to discover features of a node’s hardware and automatically assign it labels, thus addressing the problem stated earlier.

In the case of Kubernetes 1.32, Node Feature Discovery has come out with several important core additions. In this post, we will discuss the core features of NFD, what are the improvements, why are these notable, and how you can utilize them to make workload scheduling more efficient within Kubernetes.


What is Node Feature Discovery (NFD)?

Node Feature Discovery (NFD) is a Kubernetes add-on that helps you discover and label node features automatically based on the underlying hardware. These features can include things like CPU models, GPU availability, local storage, network interfaces, and more. By labeling nodes with these features, Kubernetes can ensure that pods requiring specific hardware resources (such as GPUs for AI/ML workloads) are scheduled on the right nodes.

Why is NFD Important?

In a large Kubernetes cluster, you may have nodes with diverse hardware configurations. Some nodes may have GPUs for machine learning workloads, while others may only have general-purpose CPUs. Without proper discovery and labeling, scheduling workloads efficiently becomes a complex task. NFD automates the process of discovering these hardware features and labels the nodes accordingly, making scheduling decisions more efficient and automated.


Key Enhancements in Node Feature Discovery (NFD) in Kubernetes 1.32

Kubernetes 1.32 introduces several enhancements to Node Feature Discovery that improve its accuracy, extensibility, and usability. Let's explore the key improvements.

Expanded Hardware Feature Detection 💻

In Kubernetes 1.32, NFD expands its capabilities to discover a wider range of hardware features. These features now include:

  • Extended GPU detection: Kubernetes 1.32 improves NFD's ability to detect GPUs, including NVIDIA GPUs and other hardware accelerators, which are commonly used for AI/ML workloads.
  • Storage support: NFD now automatically detects local storage devices like SSDs or NVMe drives, which can be particularly useful for high-performance workloads that require fast local storage.

Why is this important?

These extended capabilities ensure that workloads requiring specific hardware features, such as GPUs for machine learning or fast storage for databases, are scheduled on the correct nodes. This helps optimize resource usage and ensures that workloads run efficiently.

NFD Integration with Extended Resource Labels 🏷️

In Kubernetes 1.32, NFD enhances its ability to label nodes with extended resources such as:

  • GPU resources: Nodes with GPUs are now labeled accordingly, enabling the Kubernetes scheduler to place GPU-intensive workloads, such as AI models or data pipelines, on the correct nodes.

  • Local storage labels: NFD now also labels nodes with local storage (e.g., SSD or NVMe), which is essential for workloads that require fast, high-throughput storage solutions.

These labels can be used in node affinity or taints and tolerations, allowing workloads to be scheduled based on the available resources.

Real-World Example:

If you're running a custom machine learning accelerator or specialized network hardware in your data center, you can extend NFD to detect these features and label your nodes accordingly, ensuring that the correct workloads are scheduled on these nodes.

Improved Performance and Scalability ⚡

As Kubernetes clusters grow in size, scalability and performance become critical. Kubernetes 1.32 improves NFD’s performance by reducing the overhead of feature detection, particularly when running on large clusters with hundreds or thousands of nodes. The optimizations ensure that the feature discovery process does not become a bottleneck in your cluster.


Real-World Use Cases for NFD Enhancements in Kubernetes 1.32

Let’s explore a few real-world use cases to understand how the enhancements in NFD can be applied.

1) Running AI/ML Workloads with GPUs 🤖

In modern AI/ML applications, GPUs are critical for training complex models. With Kubernetes 1.32 and the enhancements to NFD, you can easily detect GPU-enabled nodes and schedule your machine learning models or AI workloads on these nodes. The NFD GPU labels make it easier for Kubernetes to schedule workloads on GPU-enabled nodes, optimizing resource usage.

Example:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ml-model
spec:
  selector:
    matchLabels:
      app: ml-model
  replicas: 1
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-container
        image: my-ml-image
        resources:
          limits:
            nvidia.com/gpu: 1  # Scheduling on a node with GPU

2) Optimizing Storage for High-Throughput Applications 💾

For workloads that require high-throughput storage, such as databases or big data processing, Kubernetes 1.32’s enhancements to NFD allow you to schedule pods on nodes with fast local storage like SSDs or NVMe drives.

Why is this important?
Kubernetes' NFD feature ensures that workloads requiring high-performance storage (like databases or data pipelines) are scheduled on the appropriate nodes with SSD or NVMe storage, ensuring higher throughput and reduced latency for storage-bound applications.

Example:
In a Kubernetes deployment, you can use node affinity to schedule database pods on nodes with local SSD storage:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: storage-type
              operator: In
              values:
                - "ssd"

3) Efficient Scheduling for Custom Hardware 🛠️

If you have custom hardware in your data center, such as specialized network devices or accelerators, you can extend NFD to discover and label these features. This makes Kubernetes more flexible and adaptable to your custom infrastructure.

Example:
Let’s say you have specialized network hardware for high-speed data processing. You can extend NFD to label nodes with this feature and then use node affinity to ensure that data-processing workloads are scheduled on the correct nodes.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: network-speed
              operator: In
              values:
                - "high-speed"

4) How to Set Up Node Feature Discovery in Kubernetes 1.32

Here’s a simple guide to getting started with Node Feature Discovery in Kubernetes 1.32.

  1. Install Node Feature Discovery (NFD) To install NFD, you can apply the official Kubernetes manifests:
kubectl apply -f https://github.com/kubernetes-sigs/node-feature-discovery/releases/download/v0.9.0/nfd.yaml
  1. Verify NFD is Running Check if the NFD DaemonSet is running:
kubectl get pods -n node-feature-discovery
  1. Using Node Affinity with NFD Labels

Once NFD is running, you can start using the labels it generates. For example, to schedule a pod on a node with a GPU:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-app
  template:
    metadata:
      labels:
        app: gpu-app
    spec:
      containers:
      - name: gpu-container
        image: my-gpu-image
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
                - key: nfd.example.com/gpu
                  operator: In
                  values:
                    - "true"

Conclusion

With the introduction of new features in Kubernetes 1.32 NFD, heterogeneous hardware resource management for large-scale Kubernetes clusters has just been made easier. These new features include improved hardware feature detection, GPU detection, and the ability for users to generate their own feature discover; all of these improvements render Kubernetes highly flexible and optimized for demanding workloads such as AI/ML, high-throughput storage, and custom hardware.

As the complexity and size of clusters increase, features like NFD become paramount in ensuring hardware allocation for maximum efficiency in workload scheduling, thus minimizing wastage. With such enhancements, this makes Kubernetes still the best candidate for heterogeneous large-scale workload management in diverse environments.