I. Deploy GPU Sharing Plugin in Kubernetes

Before deployment, ensure that nvidia-driver and nvidia-docker are installed on your Kubernetes nodes, and Docker’s default runtime has been set to nvidia.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# cat /etc/docker/daemon.json
{
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

1. Install gpushare-device-plugin via Helm

1
2
3
$ git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git
$ cd gpushare-scheduler-extender/deployer/chart
$ helm install --name gpushare --namespace kube-system --set masterCount=3 gpushare-installer

2. Label GPU Nodes

1
2
$ kubectl label node sd-cluster-04 gpushare=true
$ kubectl label node sd-cluster-05 gpushare=true

3. Install kubectl-inspect-gpushare

Ensure kubectl is already installed (omitted here).

1
2
3
$ cd /usr/bin/
$ wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
$ chmod u+x /usr/bin/kubectl-inspect-gpushare

Check current GPU resource usage in the cluster:

1
2
3
4
5
6
7
$ kubectl inspect gpushare
NAME           IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
sd-cluster-04  192.168.1.214  0/14                   0/14
sd-cluster-05  192.168.1.215  8/14                   8/14
----------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
8/28 (28%)

To disable GPU sharing on a node, simply set gpushare=false:

1
2
$ kubectl label node sd-cluster-04 gpushare=false
$ kubectl label node sd-cluster-05 gpushare=false

II. Validation Test

1. Deploy First Application

Request 2GiB GPU memory; the application should be scheduled onto one GPU card.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: apps/v1
kind: Deployment

metadata:
  name: binpack-1
  labels:
    app: binpack-1

spec:
  replicas: 1

  selector:
    matchLabels:
      app: binpack-1

  template:
    metadata:
      labels:
        app: binpack-1

    spec:
      containers:
      - name: binpack-1
        image: cheyang/gpu-player:v2
        resources:
          limits:
            # GiB
            aliyun.com/gpu-mem: 2
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
$ kubectl apply -f 1.yaml -n test
$ kubectl inspect gpushare
NAME           IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
sd-cluster-04  192.168.1.214  0/14                   0/14
sd-cluster-05  192.168.1.215  2/14                   2/14
----------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
2/28 (7%)
$ kubectl get pod -n test
NAME                               READY   STATUS    RESTARTS   AGE
binpack-1-6d6955c487-j4c4b         1/1     Running   0          28m
$ kubectl logs -f binpack-1-6d6955c487-j4c4b -n test
ALIYUN_COM_GPU_MEM_DEV=14
ALIYUN_COM_GPU_MEM_CONTAINER=2
2021-08-13 02:47:22.395557: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-08-13 02:47:22.552831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:af:00.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2021-08-13 02:47:22.552873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla T4, pci bus id: 0000:af:00.0, compute capability: 7.5)

2. Deploy Second Application

Request 8GiB GPU memory per pod, with 2 replicas → total 16GiB.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: apps/v1
kind: Deployment

metadata:
  name: binpack-2
  labels:
    app: binpack-2

spec:
  replicas: 2

  selector:
    matchLabels:
      app: binpack-2

  template:
    metadata:
      labels:
        app: binpack-2

    spec:
      containers:
      - name: binpack-2
        image: cheyang/gpu-player:v2
        resources:
          limits:
            aliyun.com/gpu-mem: 8
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$ kubectl apply -f 2.yaml -n test
$ kubectl inspect gpushare
NAME           IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
sd-cluster-04  192.168.1.214  8/14                   8/14
sd-cluster-05  192.168.1.215  10/14                  10/14
----------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
18/28 (64%)
$ kubectl get pod -n test
NAME                               READY   STATUS    RESTARTS   AGE
binpack-1-6d6955c487-j4c4b         1/1     Running   0          28m
binpack-2-58579b95f7-4wpbl         1/1     Running   0          27m
$ kubectl logs -f binpack-2-58579b95f7-4wpbl -n test
ALIYUN_COM_GPU_MEM_DEV=14
ALIYUN_COM_GPU_MEM_CONTAINER=8
2021-08-13 02:48:41.246585: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-08-13 02:48:41.338992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:af:00.0
totalMemory: 14.75GiB freeMemory: 13.07GiB
2021-08-13 02:48:41.339031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla T4, pci bus id: 0000:af:00.0, compute capability: 7.5)

As shown by resource usage, the second application is now using 8GiB from two separate GPUs.

3. Deploy Third Application

Request 2GiB GPU memory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: apps/v1
kind: Deployment

metadata:
  name: binpack-3
  labels:
    app: binpack-3

spec:
  replicas: 1

  selector:
    matchLabels:
      app: binpack-3

  template:
    metadata:
      labels:
        app: binpack-3

    spec:
      containers:
      - name: binpack-3
        image: cheyang/gpu-player:v2
        resources:
          limits:
            aliyun.com/gpu-mem: 2
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ kubectl apply -f 3.yaml -n test
$ kubectl inspect gpushare
NAME           IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
sd-cluster-04  192.168.1.214  8/14                   8/14
sd-cluster-05  192.168.1.215  12/14                  12/14
----------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
20/28 (71%)
$ kubectl get pod -n test
NAME                               READY   STATUS    RESTARTS   AGE
binpack-1-6d6955c487-j4c4b         1/1     Running   0          28m
binpack-2-58579b95f7-4wpbl         1/1     Running   0          27m
binpack-2-58579b95f7-sjhwt         1/1     Running   0          27m
binpack-3-556bbd84f9-9xqg7         1/1     Running   0          14m
$ kubectl logs -f binpack-3-556bbd84f9-9xqg7 -n test
ALIYUN_COM_GPU_MEM_DEV=14
ALIYUN_COM_GPU_MEM_CONTAINER=2
2021-08-13 03:01:53.897423: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-08-13 03:01:54.008665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:af:00.0
totalMemory: 14.75GiB freeMemory: 7.08GiB
2021-08-13 03:01:54.008716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla T4, pci bus id: 0000:af:00.0, compute capability: 7.5)

After deploying the third application, the maximum available GPU memory is 8GiB. However, actual usage is limited to 6GiB per task because a single task cannot span across multiple GPU cards.

4. Deploy Fourth Application

Request 5GiB GPU memory — should be scheduled on sd-cluster-04.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: apps/v1
kind: Deployment

metadata:
  name: binpack-4
  labels:
    app: binpack-4

spec:
  replicas: 1

  selector:
    matchLabels:
      app: binpack-4

  template:
    metadata:
      labels:
        app: binpack-4

    spec:
      containers:
      - name: binpack-4
        image: cheyang/gpu-player:v2
        resources:
          limits:
            aliyun.com/gpu-mem: 5
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ kubectl apply -f 4.yaml -n test
$ kubectl inspect gpushare
NAME           IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
sd-cluster-04  192.168.1.214  13/14                  13/14
sd-cluster-05  192.168.1.215  12/14                  12/14
------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
25/28 (89%)
$ kubectl get pod -n test
NAME                               READY   STATUS    RESTARTS   AGE
binpack-1-6d6955c487-j4c4b         1/1     Running   0          26m
binpack-2-58579b95f7-4wpbl         1/1     Running   0          24m
binpack-2-58579b95f7-sjhwt         1/1     Running   0          24m
binpack-3-556bbd84f9-9xqg7         1/1     Running   0          11m
binpack-4-6956458f85-cv62j         1/1     Running   0          6s
$ kubectl logs -f binpack-4-6956458f85-cv62j -n test
ALIYUN_COM_GPU_MEM_DEV=14
ALIYUN_COM_GPU_MEM_CONTAINER=5
2021-08-13 03:13:20.208122: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-08-13 03:13:20.361391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:af:00.0
totalMemory: 14.75GiB freeMemory: 6.46GiB
2021-08-13 03:13:20.361481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla T4, pci bus id: 0000:af:00.0, compute capability: 7.5)

III. Summary

The gpushare-device-plugin has the following limitations:

  • A single task cannot utilize GPUs across multiple machines (shared).
  • GPU resource allocation cannot be based on utilization percentage within a single GPU.

However, it is sufficient for algorithm team model testing scenarios. There are two alternative GPU sharing solutions, which are not covered here—refer directly to their official repositories if needed:

https://github.com/tkestack/gpu-manager
https://github.com/vmware/bitfusion-with-kubernetes-integration

References:
https://github.com/AliyunContainerService/gpushare-scheduler-extender/tree/master/deployer