Stages
A Stage performs a specific transformation on a Document.
Lucille Stages
Stages are the building blocks of a Lucille pipeline. Each Stage performs a specific transformation on a Document.
Lucille Stages should have JavaDocs that describe their purpose and the parameters acceptable in their Config. On this site,
you’ll find more in-depth documentation for some more advanced / complex Lucille Stages.
To configure a stage, you have to provide its class (under class
) in its config. You can also specify a name
for the Stage as well,
in addition to conditions
and conditionPolicy
(described below).
You’ll also provide the parameters needed by the Stage as well. For example, the AddRandomBoolean
Stage accepts two optional parameters -
field_name
and percent_true
. So, an AddRandomBoolean
Config would look something like this:
{
name: "AddRandomBoolean-First"
class: "com.kmwllc.lucille.stage.AddRandomBoolean"
field_name: "rand_bool_1"
percent_true: 65
}
Conditions
For any Stage, you can specify “conditions” in its Config, controlling when the Stage will process a Document. Each
condition has a required parameter, fields
, and two optional parameters, operator
and values
.
fields
is a list of field names that will determine whether the Stage applies to a Document.
values
is a list of values that the conditional fields will be searched for. (If not specified, only the existence of fields is checked.)
operator
is either "must"
or "must_not"
(defaults to "must"
).
In the root of the Stage’s Config, you can also specify a conditionPolicy
- either "any"
or "all"
, specifying whether
any or all of your conditions must be met for the Stage to process a Document. (Defaults to "any"
.)
Let’s say we are running the Print
Stage, but we only want it to execute on a Document where city = Boston
or city = New York
.
Our Config for this Stage would look something like this:
{
name: "print-1"
class: "com.kmwllc.lucille.stage.Print"
conditions: [
{
fields: ["city"]
values: ["Boston", "New York"]
}
]
}
1 - PromptOllama
Connect to Ollama Server and send a Document to an LLM for enrichment.
What if you could just, actually, put an LLM on everything?
Ollama
Ollama allows you to run a variety of Large Language Models (LLMs) with minimal setup. You can also create custom models
using Modelfiles and system prompts.
The PromptOllama
Stage allows you to connect to a running instance of Ollama Server, which communicates with an LLM through a simple API.
The Stage sends part (or all) of a Document to the LLM for generic enrichment. You’ll want to create a custom model (with a Modelfile)
or provide a System Prompt in the Stage Config that is tailored to your pipeline.
We strongly recommend you have the LLM output only a JSON object for two main reasons: Firstly, LLMs tend to follow instructions better when
instructed to do so. Secondly, Lucille can then parse the JSON response and fully integrate it into your Document.
Example
Let’s say you are working with Documents which represent emails, and you want to monitor them for potential signs of fraud. Lucille doesn’t
have a DetectFraud
Stage (at time of writing), but you can use PromptOllama
to add this information with an LLM.
- Modelfile: Let’s say you created a custom model,
fraud_detector
, in your instance of Ollama Server. As part of the modelfile,
you instruct the model to check the contents for fraud and output a JSON object containing just a boolean value (under fraud
).
Your Stage would be configured like so:
{
name: "Ollama-Fraud"
class: "com.kmwllc.lucille.stage.PromptOllama"
hostURL: "http://localhost:9200"
modelName: "fraud_detector"
fields: ["email_text"]
}
- System Prompt: You can also just reference a specific LLM directly, and provide a system prompt in the Stage configuration.
{
name: "Ollama-Fraud"
class: "com.kmwllc.lucille.stage.PromptOllama"
hostURL: "http://localhost:9200"
modelName: "gemma3"
systemPrompt: "You are to read the text inside \"email_text\" and output a JSON object containing only one field, fraud, a boolean, representing whether the text contains evidence of fraud or not."
fields: "email_text"
}
Regardless of the approach you choose, the LLM will receive a request that looks like this:
{
"email_text": "Let's be sure to juice the numbers in our next quarterly earnings report."
}
(Since fields: ["email_text"]
, any other fields on this Document are not part of the request.)
And the response from the LLM should look like this:
Lucille will then add all key-value pairs in this response JSON into your Document. So, the Document will become:
{
"id": "emails.csv-85",
"run-id": "f9538992-5900-459a-90ce-2e8e1a85695c",
"email_text": "Let's be sure to juice the numbers in our next quarterly earnings report.",
"fraud": true
}
As you can see, PromptOllama
is very versatile, and can be used to enrich your Documents in a lot of ways.
2 - QueryOpensearch
Execute an OpenSearch Template using information from a Document, and add the response to it.
OpenSearch Templates
You can use templates in OpenSearch to repeatedly run a certain query using different parameters. For example,
if we have an index full of parks, and we want to search for a certain park, we might use a template like this:
{
"source": {
"query": {
"match_phrase": {
"park_name": "{{park_to_search}}"
}
}
}
}
In Opensearch, you could then call this template (providing it park_to_search
) instead of writing out the full query each time you want to search.
Templates can also have default values. For example, if you want park_to_search
to default to “Central Park” when a value is not provided,
it would be written as: "park_name": "{{park_to_search}}{{^park_to_search}}Central Park{{/park_to_search}}"
QueryOpensearch Stage
The QueryOpensearch
Stage executes a search template using certain fields from a Document as your parameters and adding OpenSearch’s response to the Document.
You’ll specify either templateName
, the name of a search template you’ve saved, or searchTemplate
, the template you want to execute, in your Config.
You’ll also need to specify the names of parameters in your search template. These will need to match the names of fields on your Documents.
If your names don’t match, you can use the RenameFields
Stage first.
In particular, you have to specify which parameters are required and which are optional. If a required name in requiredParamNames
is
missing from a Document, an Exception will be thrown, and the template will not be executed. If an optional name in optionalParamNames
is missing they (naturally) won’t be part of the template execution, so the default value will be used by OpenSearch.
If a parameter without a default value is missing, OpenSearch doesn’t throw an Exception - it just returns an empty response with zero hits.
So, it is very important that requiredParamNames
and optionalParamNames
are defined very carefully!