Environment Setup

  1. Clone the repo using git:

    cd /Users/<user name>/Documents/GitHub/
    
    git clone <https://github.com/PromtEngineer/localGPT.git> localgpt_llama2
    cd localgpt_llama2
    
  2. In Virtual Studio Code terminal, do the following

    1. Create and activate your virtual environment

      conda create -n localgpt_llama2 python=3.10.0
      conda activate localgpt_llama2
      
    2. Install dependencies

      python -m pip install -r requirements.txt
      
    3. Enable GPU support, optmized for Apple Metal (M1/M2).

      CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
      

      Untitled

      Note:

      • *BLAS = 1 means that the GPU on your Apple M1 is being used*
      • Llama.cpp package allows you to run GGML or GGUF models using Apple Silicon GPU

Ingesting and processing your document

  1. Move documents that you’d like to chat with into the Source Documents folder.

    Untitled

  2. Create a vector representation (embeddings) of your documents

    python [ingest.py](<http://ingest.py>) --device_type mps
    

    Note: This code uses the HuggingFaceInstructEmbeddings to create the embeddings for the file in your folders; once the embeddings are created, a vector store is created on-top of the embeddings so that you could more easily access the embeddings


Running the models using 2-7b-Chat-GGUF

  1. Go to HuggingFace and pick the model ID and model basename

  2. Go to the constants.py file and update the model ID and model Basename appropriately

    For my purposes, I used the following settings

    MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
    MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
    
    # MODEL_ID = "TheBloke/Llama-2-13b-Chat-GGUF"
    # MODEL_BASENAME = "llama-2-13b-chat.Q4_K_M.gguf"
    #*Note: —device_type mps does not seem to work*
    

    Note: GPTQ version models are for Nvidia GPUs, while GGML versions can work on Apple Silicon

  3. Run the model

    python run_localGPT.py --device_type mps          #Run using Apple Silicon
    python run_localGPT.py --device_type mps -s -h    #Run using Apple Silicon; Show Source; Use History
    streamlit run localGPT_UI.py                      #Start the GUI
    

Outcome using 2-7b-Chat-GGUF

AugustMille Teks Chat.mov