Clone the repo using git:
cd /Users/<user name>/Documents/GitHub/
git clone <https://github.com/PromtEngineer/localGPT.git> localgpt_llama2
cd localgpt_llama2
In Virtual Studio Code terminal, do the following
Create and activate your virtual environment
conda create -n localgpt_llama2 python=3.10.0
conda activate localgpt_llama2
Install dependencies
python -m pip install -r requirements.txt
Enable GPU support, optmized for Apple Metal (M1/M2).
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
Note:
*BLAS = 1
means that the GPU on your Apple M1 is being used*Move documents that you’d like to chat with into the Source Documents
folder.
Create a vector representation (embeddings) of your documents
python [ingest.py](<http://ingest.py>) --device_type mps
Note: This code uses the HuggingFaceInstructEmbeddings to create the embeddings for the file in your folders; once the embeddings are created, a vector store is created on-top of the embeddings so that you could more easily access the embeddings
Go to HuggingFace and pick the model ID and model basename
Go to the constants.py file and update the model ID and model Basename appropriately
For my purposes, I used the following settings
MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
# MODEL_ID = "TheBloke/Llama-2-13b-Chat-GGUF"
# MODEL_BASENAME = "llama-2-13b-chat.Q4_K_M.gguf"
#*Note: —device_type mps does not seem to work*
Note: GPTQ
version models are for Nvidia GPUs, while GGML versions can work on Apple Silicon
Run the model
python run_localGPT.py --device_type mps #Run using Apple Silicon
python run_localGPT.py --device_type mps -s -h #Run using Apple Silicon; Show Source; Use History
streamlit run localGPT_UI.py #Start the GUI