Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  • Why is the Indexing Workflow Needed?
  • Key Reasons for Indexing
  • Important Steps in the Indexing Workflow
  1. Low-Code Builder
  2. RAG Nodes

Indexing Workflow

PreviousRAG NodesNextPre-processing Nodes

Last updated 6 months ago

Why is the Indexing Workflow Needed?

The indexing workflow is a crucial phase in building a Retrieval-Augmented Generation (RAG) application. It involves preparing and organizing the data that the application will use to retrieve relevant information during the inference phase. This process ensures that the data is structured, accessible, and optimized for quick retrieval, which is essential for generating accurate and contextually relevant responses.

Key Reasons for Indexing

  1. Efficiency: By organizing data into a structured format, the indexing workflow allows for faster retrieval, reducing the time it takes to generate responses.

  2. Scalability: Proper indexing enables the application to handle large volumes of data, making it scalable and capable of managing extensive knowledge bases.

  3. Accuracy: By preprocessing and vectorizing data, the workflow ensures that the most relevant information is retrieved, enhancing the accuracy of the generated responses.

Important Steps in the Indexing Workflow

1. Pre-processing

  • Purpose: Converts raw, unstructured data into a format suitable for further processing.

  • Importance: Ensures consistency and quality of data, removing noise and irrelevant information.

2. Chunking

  • Purpose: Splits large documents into smaller, manageable pieces or chunks.

  • Importance: Improves retrieval efficiency and accuracy by allowing the system to focus on relevant sections of a document.

3. Vectorization

  • Purpose: Transforms text data into vector representations using embeddings.

  • Importance: Enables the system to perform similarity searches, matching queries with relevant data based on vector proximity.

4. Vector Storage

  • Purpose: Saves the vectorized data in a database or storage system optimized for retrieval.

  • Importance: Ensures that data is easily accessible and can be quickly retrieved during the inference phase.

By following these steps, the indexing workflow sets the foundation for a robust and efficient RAG application, ensuring that the system can deliver precise and timely responses to user queries.

In the next sections, we will explore each of these steps in detail, providing guidance on how to implement them effectively within Dynamiq's Knowledge Base Builder.