# Indexing Workflow

## Why is the Indexing Workflow Needed?

The indexing workflow is a crucial phase in building a Retrieval-Augmented Generation (RAG) application. It involves preparing and organizing the data that the application will use to retrieve relevant information during the inference phase. This process ensures that the data is structured, accessible, and optimized for quick retrieval, which is essential for generating accurate and contextually relevant responses.

### Key Reasons for Indexing

1. **Efficiency**: By organizing data into a structured format, the indexing workflow allows for faster retrieval, reducing the time it takes to generate responses.
2. **Scalability**: Proper indexing enables the application to handle large volumes of data, making it scalable and capable of managing extensive knowledge bases.
3. **Accuracy**: By preprocessing and vectorizing data, the workflow ensures that the most relevant information is retrieved, enhancing the accuracy of the generated responses.

### Important Steps in the Indexing Workflow

<figure><img src="/files/i7zXi6FtgwlUAwcp3g7s" alt=""><figcaption></figcaption></figure>

#### 1. Pre-processing

* **Purpose**: Converts raw, unstructured data into a format suitable for further processing.
* **Importance**: Ensures consistency and quality of data, removing noise and irrelevant information.

#### 2. Chunking

* **Purpose**: Splits large documents into smaller, manageable pieces or chunks.
* **Importance**: Improves retrieval efficiency and accuracy by allowing the system to focus on relevant sections of a document.

#### 3. Vectorization

* **Purpose**: Transforms text data into vector representations using embeddings.
* **Importance**: Enables the system to perform similarity searches, matching queries with relevant data based on vector proximity.

#### 4. Vector Storage

* **Purpose**: Saves the vectorized data in a database or storage system optimized for retrieval.
* **Importance**: Ensures that data is easily accessible and can be quickly retrieved during the inference phase.

By following these steps, the indexing workflow sets the foundation for a robust and efficient RAG application, ensuring that the system can deliver precise and timely responses to user queries.

In the next sections, we will explore each of these steps in detail, providing guidance on how to implement them effectively within Dynamiq's Knowledge Base Builder.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.getdynamiq.ai/low-code-builder/rag-nodes/indexing-workflow.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
