Indexers
An Indexer is a thread that retrieves processed Documents from the end of a Pipeline and sends them in batches to a specific destination. For users of Lucille, this destination will most commonly be a search engine.
Only one Indexer can be defined in a Lucille run. All pipelines will feed to the same Indexer.
Indexer configuration has two parts:
the generic
indexerconfigurationconfiguration for the implementation you are using.
For example, if you are using Solr, you’d provide
solrconfig, orelasticfor Elasticsearch,csvfor CSV, etc.
Here’s what using the SolrIndexer might look like:
# Generic indexer config
indexer {
type: "solr"
ignoreFields: ["city_temp"]
batchSize: 100
}
# Specific implementation (Solr) config
solr {
useCloudClient: true
url: "localhost:9200"
defaultCollection: "test_index"
}
At a minimum, indexer must contain either type or class. type is shorthand for an indexer provided by lucille-core -
it can be "Solr", "OpenSearch", "ElasticSearch", or "CSV". indexer can contain a variety of additional properties as well.
Some Indexers do not support certain properties, however. For example, OpenSearchIndexer and ElasticsearchIndexer do not support
indexer.indexOverrideField.
The lucille-core module contains a number of commonly used indexers. Additional indexers with a large number of dependencies are provided as optional plugin modules.