TEKS CHAT | How-To Guide Using Llama-2 with LocalGPT on Apple Silicon

Environment Setup

Clone the repo using git:

cd /Users/<user name>/Documents/GitHub/

git clone <https://github.com/PromtEngineer/localGPT.git> localgpt_llama2
cd localgpt_llama2

In Virtual Studio Code terminal, do the following
1. Create and activate your virtual environment
```
conda create -n localgpt_llama2 python=3.10.0
conda activate localgpt_llama2
```
2. Install dependencies
```
python -m pip install -r requirements.txt
```
3. Enable GPU support, optmized for Apple Metal (M1/M2).
```
CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
```
  Note:
  - *BLAS = 1 means that the GPU on your Apple M1 is being used*
  - Llama.cpp package allows you to run GGML or GGUF models using Apple Silicon GPU

Ingesting and processing your document

Move documents that you’d like to chat with into the Source Documents folder.
Create a vector representation (embeddings) of your documents
```
python [ingest.py](<http://ingest.py>) --device_type mps
```
Note: This code uses the HuggingFaceInstructEmbeddings to create the embeddings for the file in your folders; once the embeddings are created, a vector store is created on-top of the embeddings so that you could more easily access the embeddings

Running the models using 2-7b-Chat-GGUF

Go to HuggingFace and pick the model ID and model basename

Go to the constants.py file and update the model ID and model Basename appropriately

For my purposes, I used the following settings

MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"

# MODEL_ID = "TheBloke/Llama-2-13b-Chat-GGUF"
# MODEL_BASENAME = "llama-2-13b-chat.Q4_K_M.gguf"
#*Note: —device_type mps does not seem to work*

Note: GPTQ version models are for Nvidia GPUs, while GGML versions can work on Apple Silicon

Run the model

python run_localGPT.py --device_type mps          #Run using Apple Silicon
python run_localGPT.py --device_type mps -s -h    #Run using Apple Silicon; Show Source; Use History
streamlit run localGPT_UI.py                      #Start the GUI

Outcome using 2-7b-Chat-GGUF

AugustMille Teks Chat.mov