Sajiron

13 min readPublished on Feb 14, 2025

Building a Simple RAG System in Spring Boot with Ollama

DALL·E 2025-02-14 20.44.00 - An illustration representing a Retrieval-Augmented Generation (RAG) system in Spring Boot with Ollama AI. The image features a cloud-based AI system r.webp

Retrieval-Augmented Generation (RAG) is an emerging technique in AI applications that enhances responses by retrieving relevant documents from a knowledge base before generating an answer. In this post, we'll walk through building a simple RAG system in Spring Boot using Spring AI and Ollama for intelligent document retrieval and response generation.

What is RAG?

RAG combines two essential components:

Retrieval: Fetch relevant documents from a stored knowledge base based on the user's query.

Generation: Use a language model to generate responses based on retrieved documents, improving accuracy and reliability.

What is Spring AI?

Spring AI is a project within the Spring ecosystem that provides integrations with AI models and frameworks. It enables AI-driven applications by allowing developers to seamlessly integrate machine learning models, generative AI, and natural language processing capabilities within Spring Boot applications.

Key Features of Spring AI

Unified API for AI Models - Supports various AI models like OpenAI, Ollama, Hugging Face, and Vertex AI.

Seamless Integration with Spring Boot - Works like other Spring projects, using dependency injection and service-based architectures.

Support for Different AI Tasks - Can handle text generation, embeddings, retrieval-augmented generation (RAG), and image generation.

Pluggable Architecture - Enables switching between different AI providers without changing the application logic.

Spring AI makes it easier to build AI-powered applications within a Spring Boot environment, handling communication with AI services efficiently.

Prerequisites

Before you begin, ensure you have the following installed on your system:

Ollama - Download and install from Ollama's official website.

Java 21+ - Ensure you have Java Development Kit (JDK) 21 or later installed.

Maven - Install Maven to manage dependencies and build the Spring Boot project.

Setting Up Ollama Model

To use the llama3.1 model with Ollama, follow these steps:

Install Ollama on your system if not already installed. You can download it from Ollama's official website.

Pull the llama3.1 model using the following command:

Verify the model is available by running:

Ensure that Ollama is running before starting your Spring Boot application.

Generating a Spring Boot Application

To generate a Spring Boot application, you can use Spring Initializr. Follow these steps:

Open Spring Initializr in your browser.

Select Maven Project and Java as the language.

Choose Spring Boot 3.x (or the latest stable version).

Add the dependencies:

Spring Web (for building REST APIs)

Spring AI Ollama (for integrating AI models)

Click Generate to download the ZIP file.

Extract the ZIP file and open the project in your preferred IDE.

Setting Up the Spring Boot Application

Let's start by implementing our RagService, which will manage document ingestion, retrieval, and response generation.

Dependencies

First, ensure you have the following dependencies in your pom.xml:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Implementing the RAG Service

The RagService class performs three key functions:

Ingesting documents: Stores text documents in memory.

Retrieving relevant documents: Searches for stored documents matching a given query.

Generating a response: Uses Ollama to generate a response using retrieved documents as context.

package com.springai.demo.services;

import org.springframework.ai.document.Document;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collectors;

@Service
public class RagService {

    private final List<Document> documentStore = new ArrayList<>();
    private final OllamaChatModel ollamaClient;

    @Autowired
    public RagService(OllamaChatModel ollamaClient) {
        this.ollamaClient = ollamaClient;
    }

    public void ingestDocument(String content) {
        documentStore.add(new Document(content));
    }

    public List<Document> retrieveRelevantDocs(String query) {
        return documentStore.stream()
                .filter(doc -> Objects.requireNonNull(doc.getText()).toLowerCase().contains(query.toLowerCase()))
                .limit(3)
                .collect(Collectors.toList());
    }

    public String generateRagResponse(String query) {
        List<Document> retrievedDocs = retrieveRelevantDocs(query);

        if (retrievedDocs.isEmpty()) {
            return "No relevant documents found.";
        }

        StringBuilder context = new StringBuilder("Context:\n");
        for (Document doc : retrievedDocs) {
            context.append(doc.getText()).append("\n");
        }

        String prompt = context + "\nUser Query: " + query + "\nProvide a well-informed answer based on the above context.";

        return ollamaClient.call(prompt);
    }
}

Building the REST Controller

The RagController exposes three endpoints:

/api/rag/ingest (POST): Adds new documents to the store.

/api/rag/retrieve (POST): Fetches relevant documents.

/api/rag/chat (POST): Generates a response based on retrieved documents.

package com.springai.demo.controllers;

import com.springai.demo.services.RagService;
import org.springframework.ai.document.Document;
import org.springframework.web.bind.annotation.*;
import org.springframework.http.ResponseEntity;

import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final RagService ragService;

    public RagController(RagService ragService) {
        this.ragService = ragService;
    }

    @PostMapping("/ingest")
    public ResponseEntity<String> ingestDocument(@RequestBody Map<String, String> request) {
        String content = request.get("content");
        if (content == null || content.isEmpty()) {
            return ResponseEntity.badRequest().body("Missing 'content' field in request body");
        }
        ragService.ingestDocument(content);
        return ResponseEntity.ok("Document ingested successfully!");
    }

    @PostMapping("/chat")
    public ResponseEntity<String> getRagResponse(@RequestBody Map<String, String> request) {
        String query = request.get("query");
        if (query == null || query.isEmpty()) {
            return ResponseEntity.badRequest().body("Missing 'query' field in request body");
        }
        return ResponseEntity.ok(ragService.generateRagResponse(query));
    }

    @PostMapping("/retrieve")
    public ResponseEntity<List<String>> retrieveRelevantDocs(@RequestBody Map<String, String> request) {
        String query = request.get("query");

        List<String> results = ragService.retrieveRelevantDocs(query)
                .stream()
                .map(Document::getText)
                .toList();

        return ResponseEntity.ok(results);
    }
}

Testing the API

Once your application is running, you can test the API using Postman or curl.

1. Ingest a Document

curl -X POST "http://localhost:8080/api/rag/ingest" \
     -H "Content-Type: application/json" \
     -d '{"content":"Spring AI is an abstraction layer for integrating AI models in Spring Boot applications."}'

2. Retrieve Relevant Documents

curl -X POST "http://localhost:8080/api/rag/retrieve" \
     -H "Content-Type: application/json" \
     -d '{"query": "Spring AI"}'

3. Get a RAG-based Response

curl -X POST "http://localhost:8080/api/rag/chat" \
     -H "Content-Type: application/json" \
     -d '{"query":"What is Spring AI?"}'

Conclusion

In this tutorial, we built a simple RAG system in Spring Boot using Spring AI and Ollama. This implementation allows users to ingest documents, retrieve relevant content, and generate AI-driven responses. While this is a basic implementation, you can extend it by:

Storing documents in a database for persistence.

Implementing more sophisticated document retrieval techniques (e.g., vector embeddings).

Using a more advanced language model for better responses.

If you're interested in AI-driven applications, RAG is a powerful approach for improving response quality using stored knowledge. Try building on this and experiment with different use cases!

Additionally, you can check out the project here: Spring AI RAG Demo

🚀 Happy Coding!