运行 RayService
在 Kueue 上运行 RayService 的指南。
本页演示如何利用 Kueue 的调度与资源管理能力运行 RayService 。
Kueue 通过为 RayService 创建的 RayCluster 来管理 RayService。
因此,RayService 需要在 metadata.labels 中包含 kueue.x-k8s.io/queue-name: user-queue 标签,该标签会被传递到相应的 RayCluster,以触发 Kueue 的管理。
本指南面向对 Kueue 有基本了解的、对外提供服务的用户。 更多信息,请参见 Kueue 概览。
开始之前
-
请确保你使用的是 Kueue v0.6.0 版本或更高版本,以及 KubeRay v1.3.0 或更高版本。
-
请参见 管理集群配额了解初始 Kueue 设置的详细信息。
-
请参见 KubeRay 安装说明了解 KubeRay 的安装和配置详情。
注意
RayService 通过 RayCluster 由 Kueue 管理; 在 v0.8.1 之前,你需要在完成安装后重启 Kueue 才能使用 RayCluster。你可以通过运行kubectl delete pods -l control-plane=controller-manager -n kueue-system 来完成此操作。
RayService 定义
在 Kueue 上运行 RayService 时,请考虑以下方面:
a. 队列选择
目标 本地队列应在 RayService 配置的 metadata.labels
部分指定,该标签会被传递到其 RayCluster。
metadata:
labels:
kueue.x-k8s.io/queue-name: user-queue
b. 配置资源需求
工作负载的资源需求可以在 spec.rayClusterConfig 中配置。
spec:
rayClusterConfig:
headGroupSpec:
template:
spec:
containers:
- resources:
requests:
cpu: "1"
workerGroupSpecs:
- template:
spec:
containers:
- resources:
requests:
cpu: "1"
c. 限制事项
- 有限的 Worker Group:由于 Kueue 工作负载最多可以有 8 个 PodSet,
所以
spec.rayClusterConfig.workerGroupSpecs的最大数量为 7。 - 内建自动扩缩禁用:Kueue 管理 RayService 的资源分配,因此,集群的内部自动扩缩机制需要禁用。
RayService 示例
RayService 如下所示:
apiVersion: ray.io/v1
kind: RayService
metadata:
name: test-rayservice
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue
spec:
# serveConfigV2 takes a yaml multi-line scalar, which should be a Ray Serve multi-application config. See https://docs.ray.io/en/latest/serve/multi-app.html.
serveConfigV2: |
applications:
- name: fruit_app
import_path: fruit.deployment_graph
route_prefix: /fruit
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
deployments:
- name: MangoStand
num_replicas: 2
max_replicas_per_node: 1
user_config:
price: 3
ray_actor_options:
num_cpus: 0.1
- name: OrangeStand
num_replicas: 1
user_config:
price: 2
ray_actor_options:
num_cpus: 0.1
- name: PearStand
num_replicas: 1
user_config:
price: 1
ray_actor_options:
num_cpus: 0.1
- name: FruitMarket
num_replicas: 1
ray_actor_options:
num_cpus: 0.1
- name: math_app
import_path: conditional_dag.serve_dag
route_prefix: /calc
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
deployments:
- name: Adder
num_replicas: 1
user_config:
increment: 3
ray_actor_options:
num_cpus: 0.1
- name: Multiplier
num_replicas: 1
user_config:
factor: 5
ray_actor_options:
num_cpus: 0.1
- name: Router
num_replicas: 1
rayClusterConfig:
rayVersion: '2.46.0' # should match the Ray version in the image of the containers
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.46.0
resources:
limits:
cpu: 4
memory: 6Gi
requests:
cpu: 2
memory: 4Gi
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 5
# logical group name, for this called small-group, also can be functional
groupName: small-group
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: rayproject/ray:2.46.0
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "1"
memory: "2Gi"反馈
这个页面有帮助吗?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.