Connectors

A component that retrieves data from a source system, packages the data into “documents,” and publishes them.

Lucille Connectors

Lucille Connectors are components that retrieve data from a source system, packages the data into “Documents”, and publishes them to a pipeline.

To configure a Connector, you have to provide its class (under class) in its config. You also need to specify a name for the Connector. Optionally, you can specify the pipeline, a docIdPrefix, and whether the Connector requires a Publisher to collapse.

You’ll also provide the parameters needed by the Connector as well. For example, the SequenceConnector requires one parameter, numDocs, and accepts an optional parameter, startWith. So, a SequenceConnector Config would look something like this:

{
  name: "Sequence-Connector-1"
  class: "com.kmwllc.lucille.connector.SequenceConnector"
  docIdPrefix: "sequence-connector-1-"
  pipeline: "pipeline1"
  numDocs: 500
  startWith: 50
}

The lucille-core module contains a number of commonly used connectors. Additional connectors with a large number of dependencies are provided as optional plugin modules.

Lucille Connectors (Core)

The following connectors are deprecated. Use FileConnector instead, along with a corresponding FileHandler.

Lucille Connectors (Plugins)


RSS Connector

A Connector that publishes Documents representing items found in an RSS feed.

Database Connector

Database Connector

File Connector

A Connector that, given a path to S3, Azure, Google Cloud, or the local file system, traverses the content at the given path and publishes Lucille documents representing its findings.