This is the multi-page printable view of this section. Click here to print.
Contribution Guidelines
1 - Setup & Standards
Local Developer Setup
Prerequisite(s):
- IntelliJ application installed on machine
- Java project
Setting up Google Code Formatting Scheme
- Make sure that Intellij is open
- Go to the following link: styleguide/intellij-java-google-style.xml at gh-pages · google/styleguide
- Download the .xml file
- Open the file in an editor of your choice
- Navigate to the <option …> tag with name ‘Right Margin’ and edit the value to be 132 (it should default as 100)
- Save the file
- In Intellij IDEA, navigate to Settings | Preferences → Code Style → Editor → Java
- Click on the gear icon on the right panel and drill down to the option Import Scheme and then to Intellij IDEA Code Style XML
- In the file explorer that opens, navigate to where you stored the aforementioned .xml file we downloaded
- After selecting the file, you should see a pop-up allowing you to name the scheme; select a name and click ‘Okay’
- Click ‘Apply’ in the Settings panel
- Restart the IDE; You can use the ‘Reformat Code’ option to apply the plug-in on your code
Excluding Non-Java Files
Assuming that we don’t want to auto-format non-java files via a directory level ‘Reformat Code’ option, we need to exclude all other files from being reformatted
Navigate to Settings | Preferences in Intellij IDEA
Navigate to Editor → Code Style
Click on the tab on the right window labeled ‘Formatter’
In the ‘Do Not Format’ text box, paste the following and click ‘Apply'
*.{yml,xml,md,json,yaml,jsonl,sql}A restart of Intellij may be required to see changes
This method may prove to be too complicated, especially when new file types are added to the codebase, therefore, consider the following, simpler method instead:
- When clicking on ‘Reformat Code’ at the directory level, a window will pop up
- Under the filter sections in the window, select the ‘File Mask(s)’ option and set the value to ‘*.java’
- This will INCLUDE all .java files in your reformatting
Eclipse Users
Eclipse import conf .xml files
The linked post details some useful information for how Eclipse users can use the same .xml for their code formatting on Eclipse IDE.
2 - Developing New Components
Introduction
This guide covers the basics of how to develop new components for Lucille along with an understanding of required and optional components. After reading this, you should be able to start development with a good foundation on how testing works, how configuration is handled, and what features the base classes affords us.
Prerequisites
An up-to-date version of Lucille that has been appropriately installed
Understanding of Java programming language
Understanding of Lucille
Project and Package Layout
You can add Stages, Connectors, and Indexers to Lucille in many ways. All approaches require you to reference the component’s fully qualified class name (class = "...") in the config.
Contribute to lucille-core (PR to core)
- Stages:
lucille/lucille-core/src/main/java/com/kmwllc/lucille/stage/ - Connectors:
lucille/lucille-core/src/main/java/com/kmwllc/lucille/connector/ - Indexers:
lucille/lucille-core/src/main/java/com/kmwllc/lucille/indexer/ - Tests/resources mirror these under
src/test/javaandsrc/test/resources.
- Stages:
Create a Lucille plugin (PR to lucille repo under lucille-plugins)
- Example layout:
lucille-plugins/ my-plugin/ src/main/java/com/kmwllc/lucille/my-plugin/stage/... src/main/java/com/name/lucille/my-plugin/connector/... src/main/java/com/name/lucille/my-plugin/indexer/... src/test/java/... src/test/resources/... pom.xml
- Example layout:
Use your own local code
- Put classes anywhere in your own package; e.g.,
com.name.ingest.MyStage. - Requirement: Ensure the compiled JAR is on the classpath when running Lucille and reference the fully qualified class name in your config. *
- Put classes anywhere in your own package; e.g.,
Developing Stages
Stage Skeleton
Every Stage must expose a static SPEC that declares its config schema. Use SpecBuilder to define required/optional fields, lists, parents, and types. The base class consumes this to validate user config at load time. See the configuration docs for information on specs.
Every stage must follow the Javadoc Standards.
package com.kmwllc.lucille.stage;
import com.kmwllc.lucille.core.Stage;
import com.kmwllc.lucille.core.StageException;
import com.kmwllc.lucille.core.spec.Spec;
import com.kmwllc.lucille.core.spec.SpecBuilder;
import com.kmwllc.lucille.core.ConfigUtils;
import com.typesafe.config.Config;
import java.util.Iterator;
import com.kmwllc.lucille.core.Document;
/**
* One‑line summary.
* <p>
* Config Parameters -
* <ul>
* <li>foo (String, Required) : Description.</li>
* <li>bar (Integer, Optional) : Description. Defaults to 10.</li>
* </ul>
*/
public class ExampleStage extends Stage {
public static final Spec SPEC = SpecBuilder.stage()
.requiredString("foo")
.optionalNumber("bar")
.build();
private final String foo;
private final int bar;
public ExampleStage(Config config) throws StageException {
super(config);
this.foo = config.getString("foo");
this.bar = ConfigUtils.getOrDefault(config, "bar", 10);
}
@Override
public Iterator<Document> processDocument(Document doc) throws StageException {
// mutate doc as needed
doc.setField("out", foo + ":" + bar);
// return null unless emitting child docs
return null;
}
}
Lifecycle Methods
start()for allocating resources and precomputing data structures.processDocument(Document doc)for transforming the current document and (optionally) returning child docs.stop()for releasing resources on shutdown.
Reading & Writing Fields
Lucille’s Document API supports single-valued and multi-valued fields with strong typing and convenience updaters.
Supported types: String, Boolean, Integer, Double, Float, Long, Instant, byte[], JsonNode, Timestamp, Date.
Getting Values
- Single Value:
getString(name),getInt(name), etc. - Lists:
getStringList(name),getIntList(name), etc. - Nested JSON:
getNestedJson("a.b[2].c")orgetNestedJson(List<Segment>).
Writing Values
- Overwrite (single-valued):
setField(name, value)replaces any existing values and makes the field single valued. - Append (multi-valued):
addToField(name, value)converts to a list if needed and appends. - Create or append:
setOrAdd(name, value)creates as single-valued if missing, otherwise appends.
Updating Values
Use update(name, mode, values...):
OVERWRITE: first value overwrites, the rest appendAPPEND: all values appendSKIP: no‑op if the field already exists
Nested JSON (Objects & Arrays)
- Set:
setNestedJson("a.b[2].c", jsonNode)orsetNestedJson(List<Segment>, jsonNode). - Remove:
removeNestedJson("a.b[2].c")removes the last segment from its parent. - Segments:
Document.Segment.parse("a.b[2].c")⇄Document.Segment.stringify(segments)helps convert between string paths and structured paths.
Unit Testing
See the Testing Standards.
Developing Connectors
Connector Skeleton
Every Connector must expose a static SPEC that declares its config schema. Use SpecBuilder to define required/optional fields, lists, parents, and types. The base class consumes this to validate user config at load time. See the configuration docs for information on specs.
Every connector must follow the Javadoc Standards.
package com.kmwllc.lucille.connector;
import com.kmwllc.lucille.core.ConnectorException;
import com.kmwllc.lucille.core.Document;
import com.kmwllc.lucille.core.Publisher;
import com.kmwllc.lucille.core.spec.Spec;
import com.kmwllc.lucille.core.spec.SpecBuilder;
import com.typesafe.config.Config;
/**
* One-line summary of what this Connector reads and how it emits Documents.
* <p>
* Config Parameters -
* <ul>
* <li>sourceUri (String, Required) : Where to read from (file://, s3://, http://, etc.).</li>
* <li>batchSize (Integer, Optional) : Max items to read before publishing a batch. Defaults to 100.</li>
* </ul>
*/
public class ExampleConnector extends AbstractConnector {
public static final Spec SPEC = SpecBuilder.connector()
.requiredString("sourceUri")
.optionalNumber("batchSize")
.build();
private final String sourceUri;
private final int batchSize;
public ExampleConnector(Config config) {
super(config);
this.sourceUri = config.getString("sourceUri");
this.batchSize = config.hasPath("batchSize") ? config.getInt("batchSize") : 100;
}
@Override
public void execute(Publisher publisher) throws ConnectorException {
// Read from sourceUri and publish Documents.
for (int i = 0; i < batchSize; i++) {
Document d = Document.create(createDocId("item-" + i));
// Populate fields on d as needed, e.g.: d.setField("source_uri", sourceUri);
try {
publisher.publish(d);
} catch (Exception e) {
throw new ConnectorException("Failed to publish document " + d.getId(), e);
}
}
@Override
public void close() throws ConnectorException {
// Optional: Close network or file handlers.
}
}
Lifecycle & Behavior Tips
preExecute(runId)for preparing external connections.execute(publisher)for reading from your source and callpublisher.publish(doc)for eachDocument.postExecute(runId)for optional cleanup or follow-up actions afterexecutecompletes successfully.close()for releasing resources.
Unit Testing
See the Testing Standards.
Developing Indexers
Indexer Skeleton
Every Indexer must expose a static SPEC that declares its config schema. Use SpecBuilder to define required/optional fields, lists, parents, and types. The base class consumes this to validate user config at load time. See the configuration docs for information on specs.
Every indexer must follow the Javadoc Standards.
package com.kmwllc.lucille.indexer;
import com.kmwllc.lucille.core.Document;
import com.kmwllc.lucille.core.Indexer;
import com.kmwllc.lucille.core.ConfigUtils;
import com.kmwllc.lucille.core.spec.Spec;
import com.kmwllc.lucille.core.spec.SpecBuilder;
import com.kmwllc.lucille.message.IndexerMessenger;
import com.typesafe.config.Config;
import java.util.List;
import java.util.Set;
import org.apache.commons.lang3.tuple.Pair;
/**
* One-line summary of what this Indexer does and where it sends documents. Additional details may go here as needed.
* <p>
* Config Parameters -
* <ul>
* <li>url (String, Required) : Destination endpoint (e.g., base URL).</li>
* <li>index (String, Optional) : Default index/collection name. Defaults to "index1".</li>
* <li>batchSize (Integer, Optional) : Max docs per request. Defaults to 100.</li>
* </ul>
*/
public class ExampleIndexer extends Indexer {
public static final Spec SPEC = SpecBuilder.indexer()
.requiredString("url")
.optionalString("index")
.optionalNumber("batchSize")
.build();
private final String url;
private final String defaultIndex;
private final int batchSize;
public ExampleIndexer(Config config, IndexerMessenger messenger, String metricsPrefix, String localRunId) {
super(config, messenger, metricsPrefix, localRunId);
this.url = config.getString("url");
this.defaultIndex = ConfigUtils.getOrDefault(config, "index", "index1");
this.batchSize = ConfigUtils.getOrDefault(config, "batchSize", 100);
}
@Override
public boolean validateConnection() {
// Health check to the destination
return true;
}
@Override
protected Set<Pair<Document, String>> sendToIndex(List<Document> documents) throws Exception {
// Send the batch using your destination client
// Return any failed docs as pairs of (Document, reason)
return Set.of();
}
@Override
public void closeConnection() {
// Close client resources
}
}
Lifecycle Methods
validateConnection()for a quick destination availability check before starting the main loop.sendToIndex(List<Document> docs)to perform a write and return any per-document failures.closeConnection()for releasing resources on shutdown.
Unit Testing
See the Testing Standards.
Testing Standards
Test Layout & Naming
- One test class per component (e.g., MyStageTest).
- Group related assertions into focused test methods with descriptive names.
- Place configs under a matching resources folder.
Locations
- Tests:
lucille/lucille-core/src/test/java/com/kmwllc/lucille/<stage || indexer || connector>/
- Per-test resources:
lucille/lucille-core/src/test/resources/<StageName || IndexerName || ConnectorName>Test/
General Testing Guidelines
- Maximize coverage: Aim to cover as many branches, error paths, and edge cases as practical.
- Fast and offline: No network or external services. Use mocks/spies only.
- Exercise every parameter: Ensure each parameter is covered by at least one test path.
- Test failures: Ensure bad configs, exceptions, empty inputs, etc. are tested.
- Assert behavior: Prefer testing state/interactions over log output.
- Time: Avoid sleeps to prevent longer test runs.
- Configuration clarity: Aim to make test configuration explicit and readable. Use inline config factories, descriptive config names, and inline scripts where applicable.
JaCoCo Coverage Report
- Run:
mvn clean install. - Open report:
lucille-core/target/jacoco-ut/index.html. - Interpretation: Summarizes test coverage across packages and classes, highlighting covered and missed lines and branches so you can see what executed during the test run at a glance.
Javadoc Standards
Lucille includes a small internal parser used during documentation builds to extract class-level Javadoc from components (Connectors, Stages, Indexers) and render their config fields in the UI. It runs as part of the docs generation tooling, not at runtime, and expects the exact formatting described below so the parameters can be displayed correctly. For reference, see the parser implementation.
Rules:
- Put a clear description before the
<p>tag (can be multi-sentence). - After
<p>, include the literal headingConfig Parameters -and a<ul>list. - Each item must be:
name (Type, Required | Optional) : Description.- Use exact casing.
- Use escape generics (e.g., List<String>).
- Don’t add extra blank lines. Keep consistent punctuation.
Template:
/**
* Description of what this stage/connector/indexer does. This text can span
* multiple sentences and be as long as you want as long as it appears before <p>.
* <p>
* Config Parameters -
* <ul>
* <li>paramA (String, Required) : Example description.</li>
* <li>paramB (Integer, Optional) : Example description.</li>
* <li>flags (List<String>, Optional) : Example description.</li>
* <li>options (Map<String, Object>, Optional) : Example description.</li>
* </ul>
*/