Coding shared image

In recent years, the integration of artificial intelligence (AI) in various fields has sparked new possibilities and advancements. One intriguing application of AI lies in generating domain-specific language (DSL) documents. DSLs are designed to address specific domains or problem spaces (such as enterprise integration), allowing developers to express solutions in a more natural way, removing the cognitive complexity related to boilerplate code. In this article, we will delve into the idea of leveraging AI to generate Apache Camel DSL.

ChatGPT and Camel

One of the primary areas where AI can be applied is in generating DSL code. By training AI models on existing Apache Camel DSL codebases, these models can learn the patterns, syntax, and semantic relationships unique to the language. With this knowledge, AI models can generate new DSL code snippets or even complete routes based on specific inputs or requirements.

To experiment with this idea, the following prompt was fed to ChatGPT:

Write an Apache Camel DSL that logs a message every 5 seconds.

The answer was as follows:

import org.apache.camel.builder.RouteBuilder;

public class LogRoute extends RouteBuilder {
    @Override
    public void configure() throws Exception {
        from("timer:myTimer?period=5000")
           .setBody().constant("Logging a message!")
           .to("log:myLogger");
    }
}

Encouraged by the result, I tried something more complex, a basic real world question you sometimes come across on mailing lists:

Also write an Apache Camel DSL that transforms CVS to XML.

public class CsvToXmlRoute extends RouteBuilder {
	@Override
	public void configure() throws Exception {
    	from("file:/inputDirectory?noop=true")
        	.unmarshal().csv() // Unmarshal CSV file into a List<List<String>>
        	.marshal().jaxb() // Marshal List<List<String>> into XML
        	.to("file:/outputDirectory?fileName=output.xml");
	}
}

Even if the answer is assuming a lot of details, it is still more or less correct. To be fair, the prompt used was quite generic (as are most questions requesting help on a given technology most of the time) and could have been improved to get more precise answers. Nevertheless, the results were so promising that I had to see if something similar could be achieved with open components running on my laptop.

PrivateGPT and Camel

The space of a self-hosted large language models (LLMs) solution is quite new and rapidly evolving with new projects almost on a daily basis. I decided to experiment with PrivateGPT despite other seemingly promising options such as serge chat and oobabooga text generation web UI.

According to its readme, PrivateGPT is configured through an .env file. First experimentations were done with the following default settings:

PERSIST_DIRECTORY=../db
MODEL_TYPE=GPT4All
MODEL_PATH=../models/ggml-gpt4all-j-v1.3-groovy.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1000
TARGET_SOURCE_CHUNKS=4

With these settings the results were not that encouraging due to producing too generic answers.

Fortunately, PrivateGPT makes it easy to fine-tune the training by using additional specific resources that you can provide to the model and changing it by simply changing this configuration.

After some experimenting with different models and adding the Apache Camel web site and examples as additional material for training, I ended up with the following configuration:

PERSIST_DIRECTORY=../db
MODEL_TYPE=LlamaCpp
MODEL_PATH=../models/WizardLM-13B-1.0.ggmlv3.q4_1.bin
EMBEDDINGS_MODEL_NAME=all-mpnet-base-v2
MODEL_N_CTX=1000
TARGET_SOURCE_CHUNKS=4

Using the https://huggingface.co/TheBloke/wizardLM-13B-1.0-GGML model that proved to work better at answering the following prompt:

DISCLAIMER: all the LLaMA models are licensed for research use only, which prevents commercial use of this model.

Write an Apache Camel DSL that logs a message every 5 seconds.

with this:

To log a message every 5 seconds using Apache Camel DSL, you can use a combination of the quartz and the log components. Here is an example of how to do this:

from("quartz://myJob?cron=0/5+0+0+1+0&jobName=MyJob")
	.to("log:info?showAll=true");

That is a correct snippet, using quartz component instead of the simpler timer component. Changing the prompt a little bit:

Write an Apache Camel route from timer each 5 seconds to logging.

This led to results pretty much identical to ChatGPT:

import org.apache.camel.builder.RouteBuilder;

public class MyRouteBuilder extends RouteBuilder {

	@Override
	public void configure() throws Exception {
    	from("timer:foo?period=5000")
        	.log("${body}");
	}
}

The time to generate the answers on my machine that has no dedicated nvidia CUDA capable GPU, took about 1.5 minutes.

More experiments and conclusions

With a more complex prompt, the quality of the answers decreased compared to ChatGPT, but there is room for improvement and experimentation with settings, training resources, and new models that come out daily (see open LLM leaderboard)

AI techniques, particularly Natural Language Processing (NLP), can also contribute to DSL document generation. NLP models can interpret natural language descriptions of desired DSL functionality, bridging the gap between non-technical stakeholders and developers. By analyzing and translating plain English descriptions into corresponding DSL code, AI can facilitate effective communication of domain requirements and simplify the development process.

Another area where AI can make an impact is in the generation of DSL documentation. AI models can analyze existing Apache Camel DSL documentation, such as official guides, user manuals, and community discussions. By understanding the structure and content of these documents, AI models can generate new and relevant documentation for DSL elements, options, and usage patterns. This automation can improve the availability and quality of documentation, benefiting both developers and end-users.

Similarly AI can be used to generate unit tests and usage examples, given an Apache Camel route.

Theoretically, AI techniques can be leveraged to perform DSL optimization and refactoring. AI models can analyze large code repositories, identifying performance bottlenecks, suggesting alternative constructs or components, and proposing architectural changes. By assisting developers in writing more efficient and maintainable DSL code, AI streamlines the development process and enhances the overall quality of DSL applications.