Before diving in, if you experience any difficulties or have any questions, join our Slack-Community and we’ll be there to help! Please post them into the ‘marqo-courses’ channel 🚀 If you’re new to Slack, don’t worry, join the channel and Ellie will send you a ‘getting started with Slack’ guide! 😊

If you want to build your own embedding search applications, try out Marqo for free!

Introduction to Vector Databases

Vector databases have been growing in popularity throughout the last few years with a surge in interest coinciding with the release of large language models (LLMs) like ChatGPT. They captured the wider developer community’s attention when developers began to realise the impact that vector databases can have on such models.

This article will guide you through the fundamentals of vector databases and vector search. Specifically, we will look at concepts such as vector embeddings, vector indexes, search, similarity measures and nearest neighbor methods. In the next article we will look at how we can implement our own vector search system using Marqo!

What is a Vector Database?

The clue is in the name: a vector database is a database that stores information as vectors. Vectors are numerical representations of data objects, also known as vector embeddings.

The biggest advantage of a vector database is that it allows for precise and fast similarity search and retrieval of data. This is different to traditional methods that query databases based on exact matches; vector databases can be used to find the most similar or relevant data based on their contextual meaning (also known as semantic meaning). Vector databases index and store vector embeddings. Let’s take a look at what these are.

What are Vector Embeddings?

Vector embeddings (as discussed in detail in Module 1 of our Embedding Model Course) are numerical representations of data e.g., images, text and audio. When these vectors are created, they capture the semantic meaning of the data. This in turn, allows for better search and retrieval results. Representing data in such a way is crucial as it allows the data to be more easily understood by computer systems.

The figure below illustrates the idea behind vector embeddings. We can take some data i.e. text or audio, pass it through their respective encoder models which produces vector embeddings. These then represent the input data in a numerical format. For more information on these embedding models, read our previous article.

Figure 3a: Illustration of vector embedding generation for both text and audio.

Figure 3a: Illustration of vector embedding generation for both text and audio.

Vector databases, as mentioned, store these vector embeddings. They take advantage of the mathematical properties of embeddings where similar items are stored together. This is where vector indexing becomes important.

Owing to these embeddings capturing the contextual meaning behind words (and other forms of data), we are able to generate queries and search results in a human-like way. This makes vector search engines the preferred way of searching especially when considering applications that might be sensitive to incorrect spelling. In the next article, we’ll look at building our own vector search engine using Marqo.

What is a Vector Index?