HyperAIHyperAI
Tutorials

Tutorial: Deploying DeepSeek V3 with SGLang on HyperAI

Detailed guide on how to deploy and use the DeepSeek V3 large language model with SGLang on the HyperAI platform

Environment Preparation

Use "Workspace Cluster" to launch a batch workspace containing two nodes:

  1. Each job uses 8 x H800 compute resources.
  2. Use sglang 0.4.1 image
  3. Bind the DeepSeek V3 model dataset

After successful launch, enter the master container.

Start the Service

Prepare the following two commands and place them in the /home directory which is opened by default in jupyter:

run.sh

export NCCL_IB_DISABLE=0
export NCCL_IB_HCA=mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1
export NCCL_IB_GID_INDEX=5
export NCCL_SOCKET_IFNAME="eth0"
export NCCL_DEBUG=info

python -m sglang.launch_server \
    --model-path /input0/DeepSeek-V3 --served-model-name deepseek-v3 --tp 16 \
    --nccl-init $MASTER_IP:5000 --nnodes $NNODES --node-rank $NODE_RANK \
    --trust-remote-code \
    --host 0.0.0.0 --port 8080

master_run.py

import json
import subprocess

# Step 1: Read hostfile.json
with open('/hostfile.json') as f:
    hosts = json.load(f)

MASTER_IP = hosts[0]['ip']
NNODES = len(hosts)

# Step 2: Set environment variables on master node and execute run.sh
subprocess.run([
    'tmux', 'new-session', '-d', '-s', 'node_0', '-n', 'run_tab',
    f'bash -c "export MASTER_IP={MASTER_IP} && export NNODES={NNODES} && export NODE_RANK=0 && bash /openbayes/home/run.sh; exec bash"'
])

print(f"Master IP: {MASTER_IP}, NNODES: {NNODES}, NODE_RANK: 0")

# Step 3: Iterate over worker nodes and configure them
for rank, node in enumerate(hosts[1:], start=1):
    node_ip = node['ip']
    print(f"Configuring node {rank} at {node_ip}")
    
    # Copy run.sh to the remote node
    subprocess.run([
        'scp', 'run.sh', f'root@{node_ip}:/openbayes/home/run.sh'
    ])
    
    # Set environment variables on remote node and start the script in a tmux session with a new window
    subprocess.run([
        'ssh', f'root@{node_ip}',
        f'tmux new-session -d -s node_{rank} \; new-window -n run_tab bash -c "export MASTER_IP={MASTER_IP} && export NNODES={NNODES} && export NODE_RANK={rank} && bash /openbayes/home/run.sh; exec bash"'
    ])

print("All nodes have been configured and started in tmux sessions with a dedicated window for run.sh.")

Execute the command python master_run.py to start the service. You can use the command tmux a to view the service startup process and running status.

Note: Since the DeepSeek V3 model is large, starting the service requires a considerable amount of time, typically 30-40 minutes for startup. Please be patient.

When you see the following information, it indicates that the service has started successfully:

[2025-01-07 09:38:20] INFO:     Started server process [9209]
[2025-01-07 09:38:20] INFO:     Waiting for application startup.
[2025-01-07 09:38:20] INFO:     Application startup complete.
[2025-01-07 09:38:20] INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
[2025-01-07 09:38:21] INFO:     127.0.0.1:46856 - "GET /get_model_info HTTP/1.1" 200 OK

Testing Results

After enabling the Jupyter service, the service API address can be found in the "API Address" section on the right sidebar. The sglang used here provides an OpenAI-compatible API interface, which can be called by referring to OpenAI's API documentation.