Contributors: @Anonymous @Anonymous @Masih Derkani @Will Scott
Date: 29/11/2022
Network indexers build their indexes by ingesting chains of Advertisements. Advertisement is a construct that allows Storage Providers to publish their CIDs in bulk (FIL deals) instead of doing that individually for each CID. A group of CIDs is represented by a unique ContextID as can be seen on the diagram below:
To optimise for look up time and storage space, the Indexer’s data model is optimised for storing large number of Multihashes that point to a small number of ProviderRecords that point to a even smaller number of ProviderInfos:
Lookup for each multihash involves three internal lookups (either on disk or caches):
An example of a network indexer query output can be seen here.
At the current rate production indexers ingest ≈40 billion CIDs per week, however not all of them result into new records being created (double advertisements). A number of providers where a CID is available from also differs. At the moment of writing the production indexers have ≈50 bil unique multihashes indexed over ≈350 providers. In the analysis below we assume that the indexers will index ≈4 billion new CIDs per week available over 5 unique providers on average.