Skip to content

Allocate failed due to rpc error: code = Unknown desc = no free node, which is unexpected #191

@mingkai-yang

Description

@mingkai-yang

我在集群同一批机器上使用 [单卡共享]、[单卡独占]、[多卡独占] 混合任务时,某些机器上vcua core满足,pod调度成功,但创建失败,gpu-manager 日志在一直提示 no free node,我猜测可能是机器上混合使用不同模式导致的,是不是需要不同模式任务 要做隔离呢?

pod状态:UnexpectedAdmissionError
创建失败的pod事件信息:Allocate failed due to rpc error: code = Unknown desc = no free node, which is unexpected
image
nofreenode

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions