[perf]Support MOE Multi-stream in Deepseek #947

David9857 · 2025-05-24T10:19:41Z

What this PR does / why we need it?

Support MOE inner Multi-stream for Deepseek

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: David9857 <985700846@qq.com>

wangxiyuan · 2025-05-29T13:14:15Z

vllm_ascend/quantization/w8a8_dynamic.py

    global_bs = 0
    moe_expert_num = len(expert_map)
    # hidden_states = hidden_states.bfloat16()
-    kwargs = {
+    kwargs1 = {


rename to a readable name

wangxiyuan · 2025-05-29T13:14:45Z

vllm_ascend/envs.py

@@ -36,6 +36,8 @@
    lambda: bool(int(os.getenv("COMPILE_CUSTOM_KERNELS", "1"))),
    "VLLM_ENABLE_MC2":
    lambda: bool(int(os.getenv("VLLM_ENABLE_MC2", '0'))),
+    "VLLM_ENABLE_CV_PARALLEL":


use additional_config instead of env, since this change is only used for torchair GE mode. like #839 does, there are another 3 new config option coming.

how about

{ "additional_config": { "torchair_graph_config": { "enable": True, "enable_cv_parallet": True, "batch_sizes": "12345", "batch_sizes_init": True } } }

cc @zzzzwwjj

wangxiyuan · 2025-05-29T13:28:08Z

And don't forget add e2e test. The model weight is here: https://www.modelscope.cn/models/vllm-ascend/DeepSeek-V2-Lite-W8A8

take https://github.com/vllm-project/vllm-ascend/blob/main/tests/multicard/test_offline_inference_distributed.py#L49 as an example

Signed-off-by: David9857 <985700846@qq.com>

github-actions bot added module:ops module:quantization labels May 24, 2025

David9857 changed the title ~~[perf][WIP] Support MOE Multi-stream in Deepseek~~ [perf]Support MOE Multi-stream in Deepseek May 26, 2025

David9857 force-pushed the cv branch from 25e3d2c to 78a00c3 Compare May 28, 2025 07:42

github-actions bot added module:core ci/build module:tests module:tools labels May 28, 2025

David9857 added 2 commits May 29, 2025 11:38

support moe multistream in deepseek

f6e30d2

Signed-off-by: David9857 <985700846@qq.com>

Merge remote-tracking branch 'upstream/main' into cv

9ddb591

David9857 force-pushed the cv branch from 88b4098 to 9ddb591 Compare May 29, 2025 03:42

github-actions bot removed ci/build module:tests module:tools labels May 29, 2025

fix codecheck

d118d63

Signed-off-by: David9857 <985700846@qq.com>

wangxiyuan reviewed May 29, 2025

View reviewed changes

wangxiyuan mentioned this pull request May 29, 2025

feat: support compile torchair graph while warming up #839

Open

David9857 added 3 commits May 29, 2025 22:10

use additional_config to enable cv parallel

b01cc94

Signed-off-by: David9857 <985700846@qq.com>

rename kwargs1 in fused_experts_with_mc2

a7195df

Signed-off-by: David9857 <985700846@qq.com>

add ut fot cv parallel

8f2e33e

Signed-off-by: David9857 <985700846@qq.com>

github-actions bot added module:tests and removed module:core labels May 29, 2025

support cv parallel for float model

34df77e

Signed-off-by: David9857 <985700846@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf]Support MOE Multi-stream in Deepseek #947

[perf]Support MOE Multi-stream in Deepseek #947

David9857 commented May 24, 2025

Uh oh!

wangxiyuan May 29, 2025

Uh oh!

wangxiyuan May 29, 2025 •

edited

Loading

Uh oh!

wangxiyuan commented May 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

[perf]Support MOE Multi-stream in Deepseek #947

Are you sure you want to change the base?

[perf]Support MOE Multi-stream in Deepseek #947

Conversation

David9857 commented May 24, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wangxiyuan May 29, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wangxiyuan May 29, 2025 •

edited

Loading

wangxiyuan commented May 29, 2025 •

edited

Loading