pooling
parameter for huggingFace
embeddersPATCH /indexes/{:indexUid}/settings
and PATCH /indexes/{:indexUid}/settings/embedders
routes, is modified as follows:
pooling
is added to the embedder object and allows to override the pooling method of huggingFace
embedders.
huggingFace
embedders transforms a text into an embedding, it starts by transforming the text into tokens, then it computes an embedding for each of these tokens. Lastly, it computes a single sentence embedding from the token embeddings by using a pooling method.pooling
is a string with values "useModel"
, "forceMean"
or "forceCls"
.
"useModel"
: fetch the pooling method from the model configuration"forceMean"
: always use mean pooling"forceCls"
always use CLS poolingpooling
is optional and defaults to "useModel"
huggingFace
embedders that were created in a previous version of Meilisearch, and imported using a dump or the dumpless upgrade feature will have pooling
set to "forceMean"
, as this was the behavior of these embedders in previous versions of Meilisearch.pooling
is only available for embedders with source huggingFace
pooling
always triggers a full reindexing.compositeEmbedders
is added to the /experimental-features
route.PATCH /indexes/{:indexUid}/settings
and PATCH /indexes/{:indexUid}/settings/embedders
routes, is modified as follows:
source
parameter is allowed: "composite"
. This value is selectable when the compositeEmbedders
feature is set to true
."composite"
:
searchEmbedder
: an object whose keys are the same as an embedder object. The embedder it describes will be used at search time.indexingEmbedder
: an object whose keys are the same as an embedder object. The embedder it describes will be used at search time.compositeEmbedders
feature is set to true
.searchEmbedder
and the indexingEmbedder
are “similar enough”: Meilisearch computes the angular distance for both embeddings in each test case, and checks that that distance is < 0.01.This feature allows using different embedders at search and indexing time, which can be used to optimize the embedder to each use case:
huggingFace
) to a Hugging Face inference endpoint{
"embedders": {
"text": {
"source": "composite",
"searchEmbedder": {
"source": "huggingFace", // locally computed embeddings using a model from the Hugging Face Hub
"model": "baai/bge-base-en-v1.5",
"revision": "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a"
},
"indexingEmbedder": {
"source": "rest", // remotely computed embeddings using Hugging Face inference endpoints
"url": "<https://URL.endpoints.huggingface.cloud>",
"apiKey": "hf_XXXXXXX",
"documentTemplate": "Your {{doc.template}}",
"request": {
"inputs": [
"{{text}}",
"{{..}}"
]
},
"response": [
"{{embedding}}",
"{{..}}"
]
}
}
}
}
huggingFace
source) with a Cloudflare AI worker{
"embedders": {
"text": {
"source": "composite",
"searchEmbedder": {
"source": "huggingFace",
"model": "baai/bge-base-en-v1.5",
"revision": "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a",
"pooling": "forceMean"
},
"indexingEmbedder": {
"source": "rest",
"url": "<https://api.cloudflare.com/client/v4/accounts/ACCOUNT_NUMBER/ai/run/@cf/baai/bge-base-en-v1.5>",
"apiKey": "API_KEY",
"documentTemplate": "Your {{doc.template}}",
"request": {
"text": [
"{{text}}",
"{{..}}"
]
},
"response": {
"result": {
"data": [
"{{embedding}}",
"{{..}}"
]
}
}
}
}
}
}
embedders.sources
can now contain the value composite
Name | Description | Example |
---|---|---|
e.g. infos.log_level | e.g. “value of --log-level” | e.g. “debug” |
infos.experimental_composite_embedders |
true if the compositeEmbedders feature is set to true for this instance, otherwise false |
false |
composite_embedders sent with Experimental features Updated event |
true if the compositeEmbedders feature is set to true after that call to /experimental-features |
true |
"composite"
source cannot be nested inside of a composite embedder, trying to set searchEmbedder.source
or indexingEmbedder.source
to "composite"
will return a 400 invalid_settings_embedder
.embedders.test.searchEmbedder.source
: Source composite
is not available in a nested embedder