Sajiron
Retrieval-Augmented Generation (RAG) is an emerging technique in AI applications that enhances responses by retrieving relevant documents from a knowledge base before generating an answer. In this post, we'll walk through building a simple RAG system in Spring Boot using Spring AI and Ollama for intelligent document retrieval and response generation.
RAG combines two essential components:
Retrieval: Fetch relevant documents from a stored knowledge base based on the user's query.
Generation: Use a language model to generate responses based on retrieved documents, improving accuracy and reliability.
Spring AI is a project within the Spring ecosystem that provides integrations with AI models and frameworks. It enables AI-driven applications by allowing developers to seamlessly integrate machine learning models, generative AI, and natural language processing capabilities within Spring Boot applications.
Unified API for AI Models - Supports various AI models like OpenAI, Ollama, Hugging Face, and Vertex AI.
Seamless Integration with Spring Boot - Works like other Spring projects, using dependency injection and service-based architectures.
Support for Different AI Tasks - Can handle text generation, embeddings, retrieval-augmented generation (RAG), and image generation.
Pluggable Architecture - Enables switching between different AI providers without changing the application logic.
Spring AI makes it easier to build AI-powered applications within a Spring Boot environment, handling communication with AI services efficiently.
Before you begin, ensure you have the following installed on your system:
Ollama - Download and install from Ollama's official website.
Java 21+ - Ensure you have Java Development Kit (JDK) 21 or later installed.
Maven - Install Maven to manage dependencies and build the Spring Boot project.
To use the llama3.1 model with Ollama, follow these steps:
Install Ollama on your system if not already installed. You can download it from Ollama's official website.
Pull the llama3.1 model using the following command:
Verify the model is available by running:
Ensure that Ollama is running before starting your Spring Boot application.
To generate a Spring Boot application, you can use Spring Initializr. Follow these steps:
Open Spring Initializr in your browser.
Select Maven Project and Java as the language.
Choose Spring Boot 3.x (or the latest stable version).
Add the dependencies:
Spring Web (for building REST APIs)
Spring AI Ollama (for integrating AI models)
Click Generate to download the ZIP file.
Extract the ZIP file and open the project in your preferred IDE.
Let's start by implementing our RagService, which will manage document ingestion, retrieval, and response generation.
First, ensure you have the following dependencies in your pom.xml
:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
The RagService
class performs three key functions:
Ingesting documents: Stores text documents in memory.
Retrieving relevant documents: Searches for stored documents matching a given query.
Generating a response: Uses Ollama to generate a response using retrieved documents as context.
package com.springai.demo.services;
import org.springframework.ai.document.Document;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collectors;
@Service
public class RagService {
private final List<Document> documentStore = new ArrayList<>();
private final OllamaChatModel ollamaClient;
@Autowired
public RagService(OllamaChatModel ollamaClient) {
this.ollamaClient = ollamaClient;
}
public void ingestDocument(String content) {
documentStore.add(new Document(content));
}
public List<Document> retrieveRelevantDocs(String query) {
return documentStore.stream()
.filter(doc -> Objects.requireNonNull(doc.getText()).toLowerCase().contains(query.toLowerCase()))
.limit(3)
.collect(Collectors.toList());
}
public String generateRagResponse(String query) {
List<Document> retrievedDocs = retrieveRelevantDocs(query);
if (retrievedDocs.isEmpty()) {
return "No relevant documents found.";
}
StringBuilder context = new StringBuilder("Context:\n");
for (Document doc : retrievedDocs) {
context.append(doc.getText()).append("\n");
}
String prompt = context + "\nUser Query: " + query + "\nProvide a well-informed answer based on the above context.";
return ollamaClient.call(prompt);
}
}
The RagController
exposes three endpoints:
/api/rag/ingest (POST): Adds new documents to the store.
/api/rag/retrieve (POST): Fetches relevant documents.
/api/rag/chat (POST): Generates a response based on retrieved documents.
package com.springai.demo.controllers;
import com.springai.demo.services.RagService;
import org.springframework.ai.document.Document;
import org.springframework.web.bind.annotation.*;
import org.springframework.http.ResponseEntity;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final RagService ragService;
public RagController(RagService ragService) {
this.ragService = ragService;
}
@PostMapping("/ingest")
public ResponseEntity<String> ingestDocument(@RequestBody Map<String, String> request) {
String content = request.get("content");
if (content == null || content.isEmpty()) {
return ResponseEntity.badRequest().body("Missing 'content' field in request body");
}
ragService.ingestDocument(content);
return ResponseEntity.ok("Document ingested successfully!");
}
@PostMapping("/chat")
public ResponseEntity<String> getRagResponse(@RequestBody Map<String, String> request) {
String query = request.get("query");
if (query == null || query.isEmpty()) {
return ResponseEntity.badRequest().body("Missing 'query' field in request body");
}
return ResponseEntity.ok(ragService.generateRagResponse(query));
}
@PostMapping("/retrieve")
public ResponseEntity<List<String>> retrieveRelevantDocs(@RequestBody Map<String, String> request) {
String query = request.get("query");
List<String> results = ragService.retrieveRelevantDocs(query)
.stream()
.map(Document::getText)
.toList();
return ResponseEntity.ok(results);
}
}
Once your application is running, you can test the API using Postman or curl.
1. Ingest a Document
curl -X POST "http://localhost:8080/api/rag/ingest" \
-H "Content-Type: application/json" \
-d '{"content":"Spring AI is an abstraction layer for integrating AI models in Spring Boot applications."}'
2. Retrieve Relevant Documents
curl -X POST "http://localhost:8080/api/rag/retrieve" \
-H "Content-Type: application/json" \
-d '{"query": "Spring AI"}'
3. Get a RAG-based Response
curl -X POST "http://localhost:8080/api/rag/chat" \
-H "Content-Type: application/json" \
-d '{"query":"What is Spring AI?"}'
In this tutorial, we built a simple RAG system in Spring Boot using Spring AI and Ollama. This implementation allows users to ingest documents, retrieve relevant content, and generate AI-driven responses. While this is a basic implementation, you can extend it by:
Storing documents in a database for persistence.
Implementing more sophisticated document retrieval techniques (e.g., vector embeddings).
Using a more advanced language model for better responses.
If you're interested in AI-driven applications, RAG is a powerful approach for improving response quality using stored knowledge. Try building on this and experiment with different use cases!
Additionally, you can check out the project here: Spring AI RAG Demo
🚀 Happy Coding!