In this tutorial, we'll explore how to use the Moondream 2 vision model with llama-cpp-python to generate text descriptions from images. We'll cover the installation of required libraries, setting up the MoondreamVision class, and using it to process images and generate text.
Step 1: Install Required Libraries
To get started, we need to install the required libraries. Run the following command in your terminal or Python environment:
Update 2: DON’T FORGET accelerate!!!!! It takes the model from somewhere o the order of ~1m install ~45s inference to ~1.5m install, 3s inference on a Colab T4. Had under 1s inference one time when Ieft the llama-cpp-python ( The MoondreamModel class ) instance laying around and ran from that, so I suspect ~1.5s to be safe is the approximate runtime during sustained, efficient operation. No promises, and i’m sure there are myriads of ways to make that a lot faster, so don’t trust my clumsy hands too far.
This doesn’t sound like a big deal, but the exciting thing about this model is the ability to obtain verbose descriptions w/ details && in-depth visual question answering, all within 1s, all with Colab-ready technologies, all VERY quickly. Think of what you could do if any picture or screenshot was an instant repository of information for free, forever. And could run on your Android phone ( that one’s coming soon ;)
!pip install -U llama-cpp-python huggingface-hub accelerate
This will install the llama-cpp-python
librar(ies), which are required for this tutorial.
Step 2: Import Libraries, Define class structure
Next, we'll import the required libraries and deine the MoondreamVision
class:
import os
import requests
from llama_cpp import Llama
from llama_cpp.llama_chat_format import MoondreamChatHandler
class MoondreamVision():
def __init__(self, **kwargs):
pass
def __call__(self, **kwargs):
pass
Step 3: Class Definition
Step 3.1: init ( Python class constructor )
So, several things happening here, nothing that should be too crazy.
def __init__(self, path: str, text_model: str, mmproj: str):
self.path = path
self.text_model = text_model
self.mmproj = mmproj
self.chat_handler = MoondreamChatHandler.from_pretrained(
repo_id=self.path,
filename=self.mmproj,
)
# Initialize our LLM instance with out Chat Handler.
self.llm = Llama.from_pretrained(
repo_id=self.path,
filename=self.text_model,
chat_handler=self.chat_handler, # Trust me, you don't want to write one for a mere example.
n_ctx=32786, # n_ctx should be maxxed out to accomodate the image embedding
)
Step 3.2: call ( Python instance call override operator )
For this one, the code description was just the tutorial description verbatim; So it’s a pure code explanation! Wheee!!!