Deploy mistral 7b with vLLM using SkyPilot 🚀

In this tutorial, we will be deploying the large language model (LLM) mistral 7b for inference using SkyPilot SDK.

SkiPilot is a package that allows easy training / inference of deep learning models across multiple cloud providers (AWS, Azure, GCP). You can save a lot of costs using this sdk.

Mistral 7b is the new cool kid in the town of large language models and it’s showing impressive results across multiple benchmarks.

Set up Skypilot

Let’s install skypilot in your machine. We will use AWS cloud in this example.

pip install skypilot[aws]

You also need to download the AWS CLI in case you don’t have it. You can follow this instructions to download and set up the AWS CLI.

Let’s check if skypilot has access to AWS resources.

sky check

If you see the following information it means that skypilot has access to your AWS resources.

Untitled

Launch Mistral 7b with skypilot

Mistral team was cool enough to release a docker container using mistral 7b version with vLLM implemented. vLLM speeds up a lot the token throughput of LLMs.

Skypilot needs a yaml file to execute instructions. Use the following inference.yaml to launch mistral in an NVIDIA A10G.

envs:
  MODEL_NAME: mistralai/Mistral-7B-v0.1

resources:
  cloud: aws
  accelerators: A10G:1
  ports:
    - 8000

run: |
  docker run --gpus all -p 8000:8000 ghcr.io/mistralai/mistral-src/vllm:latest \\
                   --host 0.0.0.0 \\
                   --model $MODEL_NAME \\
                   --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE

Skypilot will automatically search an instance for you. In case the deploy fails due to quota limits, please follow these steps:

Go to the EC2 Quotas console.