Stages

A Stage performs a specific transformation on a Document.

1: PromptOllama
2: QueryOpensearch

Lucille Stages

Stages are the building blocks of a Lucille pipeline. Each Stage performs a specific transformation on a Document.

Lucille Stages should have JavaDocs that describe their purpose and the parameters acceptable in their Config. On this site, you’ll find more in-depth documentation for some more advanced / complex Lucille Stages.

To configure a stage, you have to provide its class (under class) in its config. You can also specify a name for the Stage as well, in addition to conditions and conditionPolicy (described below).

You’ll also provide the parameters needed by the Stage as well. For example, the AddRandomBoolean Stage accepts two optional parameters - field_name and percent_true. So, an AddRandomBoolean Config would look something like this:

{
  name: "AddRandomBoolean-First"
  class: "com.kmwllc.lucille.stage.AddRandomBoolean"
  field_name: "rand_bool_1"
  percent_true: 65
}

Conditions

For any Stage, you can specify “conditions” in its Config, controlling when the Stage will process a Document. Each condition has a required parameter, fields, and two optional parameters, operator and values.

fields is a list of field names that will determine whether the Stage applies to a Document.
values is a list of values that the conditional fields will be searched for. (If not specified, only the existence of fields is checked.)
operator is either "must" or "must_not" (defaults to "must").

In the root of the Stage’s Config, you can also specify a conditionPolicy - either "any" or "all", specifying whether any or all of your conditions must be met for the Stage to process a Document. (Defaults to "any".)

Let’s say we are running the Print Stage, but we only want it to execute on a Document where city = Boston or city = New York. Our Config for this Stage would look something like this:

{
name: "print-1"
class: "com.kmwllc.lucille.stage.Print"
conditions: [
  {
    fields: ["city"]
    values: ["Boston", "New York"]
  }
]
}

1 - PromptOllama

Connect to Ollama Server and send a Document to an LLM for enrichment.

What if you could just, actually, put an LLM on everything?

Ollama

Ollama allows you to run a variety of Large Language Models (LLMs) with minimal setup. You can also create custom models using Modelfiles and system prompts.

The PromptOllama Stage allows you to connect to a running instance of Ollama Server, which communicates with an LLM through a simple API. The Stage sends part (or all) of a Document to the LLM for generic enrichment. You’ll want to create a custom model (with a Modelfile) or provide a System Prompt in the Stage Config that is tailored to your pipeline.

We strongly recommend you have the LLM output only a JSON object for two main reasons: Firstly, LLMs tend to follow instructions better when instructed to do so. Secondly, Lucille can then parse the JSON response and fully integrate it into your Document.

Example

Let’s say you are working with Documents which represent emails, and you want to monitor them for potential signs of fraud. Lucille doesn’t have a DetectFraud Stage (at time of writing), but you can use PromptOllama to add this information with an LLM.

Modelfile: Let’s say you created a custom model, fraud_detector, in your instance of Ollama Server. As part of the modelfile, you instruct the model to check the contents for fraud and output a JSON object containing just a boolean value (under fraud). Your Stage would be configured like so:

{
  name: "Ollama-Fraud"
  class: "com.kmwllc.lucille.stage.PromptOllama"
  hostURL: "http://localhost:9200"
  modelName: "fraud_detector"
  fields: ["email_text"]
}

System Prompt: You can also just reference a specific LLM directly, and provide a system prompt in the Stage configuration.

{
  name: "Ollama-Fraud"
  class: "com.kmwllc.lucille.stage.PromptOllama"
  hostURL: "http://localhost:9200"
  modelName: "gemma3"
  systemPrompt: "You are to read the text inside \"email_text\" and output a JSON object containing only one field, fraud, a boolean, representing whether the text contains evidence of fraud or not."
  fields: "email_text"
}

Regardless of the approach you choose, the LLM will receive a request that looks like this:

{
  "email_text": "Let's be sure to juice the numbers in our next quarterly earnings report."
}

(Since fields: ["email_text"], any other fields on this Document are not part of the request.)

And the response from the LLM should look like this:

{
  "fraud": true
}

Lucille will then add all key-value pairs in this response JSON into your Document. So, the Document will become:

{
  "id": "emails.csv-85",
  "run-id": "f9538992-5900-459a-90ce-2e8e1a85695c",
  "email_text": "Let's be sure to juice the numbers in our next quarterly earnings report.",
  "fraud": true
}

As you can see, PromptOllama is very versatile, and can be used to enrich your Documents in a lot of ways.

2 - QueryOpensearch

Execute an OpenSearch Template using information from a Document, and add the response to it.

OpenSearch Templates

You can use templates in OpenSearch to repeatedly run a certain query using different parameters. For example, if we have an index full of parks, and we want to search for a certain park, we might use a template like this:

{
  "source": {
    "query": {
      "match_phrase": {
        "park_name": "{{park_to_search}}"
      }
    }
  }
}

In Opensearch, you could then call this template (providing it park_to_search) instead of writing out the full query each time you want to search.

Templates can also have default values. For example, if you want park_to_search to default to “Central Park” when a value is not provided, it would be written as: "park_name": "{{park_to_search}}{{^park_to_search}}Central Park{{/park_to_search}}"

QueryOpensearch Stage

The QueryOpensearch Stage executes a search template using certain fields from a Document as your parameters and adding OpenSearch’s response to the Document. You’ll specify either templateName, the name of a search template you’ve saved, or searchTemplate, the template you want to execute, in your Config.

You’ll also need to specify the names of parameters in your search template. These will need to match the names of fields on your Documents. If your names don’t match, you can use the RenameFields Stage first.

In particular, you have to specify which parameters are required and which are optional. If a required name in requiredParamNames is missing from a Document, an Exception will be thrown, and the template will not be executed. If an optional name in optionalParamNames is missing they (naturally) won’t be part of the template execution, so the default value will be used by OpenSearch.

If a parameter without a default value is missing, OpenSearch doesn’t throw an Exception - it just returns an empty response with zero hits. So, it is very important that requiredParamNames and optionalParamNames are defined very carefully!