首页 > 极客资料博客日记

如何在kubernetes环境中共享GPU

2024-10-13 12:30:02极客资料围观25次

这篇文章介绍了如何在kubernetes环境中共享GPU，分享给大家做个参考，收藏极客之家收获更多编程知识

随着人工智能和大模型的快速发展，云上GPU资源共享变得必要，因为它可以降低硬件成本，提升资源利用效率，并满足模型训练和推理对大规模并行计算的需求。

在kubernetes内置的资源调度功能中，GPU调度只能根据“核数”进行调度，但是深度学习等算法程序执行过程中，资源占用比较高的是显存，这样就形成了很多的资源浪费。

目前的GPU资源共享方案有两种。一种是将一个真正的GPU分解为多个虚拟GPU，即vGPU，这样就可以基于vGPU的数量进行调度；另一种是根据GPU的显存进行调度。

本文将讲述如何安装kubernetes组件实现根据GPU显存调度资源。

系统信息

系统：centos stream8
内核：4.18.0-490.el8.x86_64
驱动：NVIDIA-Linux-x86_64-470.182.03
docker：20.10.24
kubernetes版本：1.24.0

1. 驱动安装

请登录nvida官网自行安装：https://www.nvidia.com/Download/index.aspx?lang=en-us

2. docker安装

请自行安装docker或其他容器运行时，如果使用其他容器运行时，第三步配置请参考NVIDA官网 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installation-guide

注意：官方支持docker、containerd、podman，但本文档只验证过docker的使用，如果使用其他容器运行时，请注意差异性。

3. NVIDIA Container Toolkit 安装

设置仓库与GPG Key

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

开始安装

sudo dnf clean expire-cache --refresh
sudo dnf install -y nvidia-container-toolkit

修改docker配置文件添加容器运行时实现

sudo nvidia-ctk runtime configure --runtime=docker

修改/etc/docker/daemon.json,设置nvidia为默认容器运行时（必需）

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

重启docker并开始验证是否生效

sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

如果返回如下数据，说明配置成功

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

4. 安装K8S GPU调度器

首先执行以下yaml，部署调度器

# rbac.yaml
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: gpushare-schd-extender
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - update
      - patch
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - bindings
      - pods/binding
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - get
      - list
      - watch
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gpushare-schd-extender
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: gpushare-schd-extender
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gpushare-schd-extender
subjects:
  - kind: ServiceAccount
    name: gpushare-schd-extender
    namespace: kube-system

# deployment yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: gpushare-schd-extender
  namespace: kube-system
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: gpushare
      component: gpushare-schd-extender
  template:
    metadata:
      labels:
        app: gpushare
        component: gpushare-schd-extender
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      hostNetwork: true
      tolerations:
        - effect: NoSchedule
          operator: Exists
          key: node-role.kubernetes.io/master
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
          operator: Exists
        - effect: NoSchedule
          operator: Exists
          key: node.cloudprovider.kubernetes.io/uninitialized
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      serviceAccount: gpushare-schd-extender
      containers:
        - name: gpushare-schd-extender
          image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
          env:
            - name: LOG_LEVEL
              value: debug
            - name: PORT
              value: "12345"

# service.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: gpushare-schd-extender
  namespace: kube-system
  labels:
    app: gpushare
    component: gpushare-schd-extender
spec:
  type: NodePort
  ports:
    - port: 12345
      name: http
      targetPort: 12345
      nodePort: 32766
  selector:
    # select app=ingress-nginx pods
    app: gpushare
    component: gpushare-schd-extender

在/etc/kubernetes目录下添加调度策略配置文件

#scheduler-policy-config.yaml
---
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /etc/kubernetes/scheduler.conf
extenders:
    # 不知道为什么不支持svc的方式调用，必须用nodeport
  - urlPrefix: "http://gpushare-schd-extender.kube-system:12345/gpushare-scheduler"
    filterVerb: filter
    bindVerb: bind
    enableHTTPS: false
    nodeCacheCapable: true
    managedResources:
      - name: aliyun.com/gpu-mem
        ignoredByScheduler: false
    ignorable: false

上面的 http://gpushare-schd-extender.kube-system:12345 注意要替换为你本地部署的{nodeIP}:{gpushare-schd-extender的nodeport端口}，否则会访问不到

查询命令如下：

kubectl get service gpushare-schd-extender -n kube-system -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}'

修改kubernetes调度配置 /etc/kubernetes/manifests/kube-scheduler.yaml

1. 在commond中添加
 - --config=/etc/kubernetes/scheduler-policy-config.yaml

2. 添加pod挂载目录
在volumeMounts:中添加
- mountPath: /etc/kubernetes/scheduler-policy-config.yaml
  name: scheduler-policy-config
  readOnly: true
在volumes:中添加
- hostPath:
      path: /etc/kubernetes/scheduler-policy-config.yaml
      type: FileOrCreate
  name: scheduler-policy-config

注意：这里千万不要改错，否则可能会出现莫名其妙的错误
示例如下：

配置rbac及安装device插件

kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml

5. 在GPU节点上添加标签

kubectl label node <target_node> gpushare=true

6. 安装kubectl Gpu 插件

cd /usr/bin/
wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
chmod u+x /usr/bin/kubectl-inspect-gpushare

7. 验证

使用kubectl查询GPU资源使用情况

# kubectl inspect gpushare
NAME                                IPADDRESS     GPU0(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.i-uf61h64dz1tmlob9hmtb  192.168.0.71  6/15                   6/15
cn-shanghai.i-uf61h64dz1tmlob9hmtc  192.168.0.70  3/15                   3/15
------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
9/30 (30%)

创建一个有GPU需求的资源，查看其资源调度情况

apiVersion: apps/v1
kind: Deployment
metadata:
  name: binpack-1
  labels:
    app: binpack-1
spec:
  replicas: 1
  selector: # define how the deployment finds the pods it manages
    matchLabels:
      app: binpack-1
  template: # define the pods specifications
    metadata:
      labels:
        app: binpack-1
    spec:
      tolerations:
        - effect: NoSchedule
          key: cloudClusterNo
          operator: Exists        
      containers:
        - name: binpack-1
          image: cheyang/gpu-player:v2
          resources:
            limits:
              # 单位GiB
              aliyun.com/gpu-mem: 3

8. 问题排查

如果在安装过程中发现资源未安装成功，可以通过pod查看日志

kubectl get po -n kube-system -o=wide | grep gpushare-device 
kubecl logs -n kube-system <pod_name>

参考地址：
NVIDA官网container-toolkit安装文档: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
阿里云GPU插件安装：https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md

标签：

上一篇：Oracle问题：alter update modify 的区别是什么？
下一篇：Nuxt.js 应用中的 close 事件钩子详解

首页 > 极客资料博客日记

如何在kubernetes环境中共享GPU

系统信息

1. 驱动安装

2. docker安装

3. NVIDIA Container Toolkit 安装

4. 安装K8S GPU调度器

5. 在GPU节点上添加标签

6. 安装kubectl Gpu 插件

7. 验证

8. 问题排查

相关文章

最新发布

点击排行

本站推荐

标签云

首页 > 极客资料 博客日记

如何在kubernetes环境中共享GPU

系统信息

1. 驱动安装

2. docker安装

3. NVIDIA Container Toolkit 安装

4. 安装K8S GPU调度器

5. 在GPU节点上添加标签

6. 安装kubectl Gpu 插件

7. 验证

8. 问题排查

相关文章

最新发布

点击排行

本站推荐

标签云

首页 > 极客资料博客日记