Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  • Available Document Writers
  • How to Use Document Writers
  • Benefits of Document Writers
  1. Low-Code Builder
  2. RAG Nodes
  3. Indexing Workflow

Document Writers

PreviousDocument EmbeddersNextInference RAG workflow

Last updated 1 month ago

In the indexing workflow of a Retrieval-Augmented Generation (RAG) application, document writers are essential for storing vectorized data, ensuring efficient retrieval during the inference phase. By organizing and saving data in a structured manner, document writers enable the system to quickly access and utilize relevant information, enhancing the overall performance and accuracy of the application.

Available Document Writers

There are many options available in Dynamiq for document writers. Let's delve into the details of each to understand their unique features and configurations.

Weaviate Writer

Configuration

  • Name: Define a unique name for the writer to identify it within your workflow.

  • Connection: Establish a connection to Weaviate, a vector database optimized for storing and retrieving vectorized data.

  • Index Name: Specify the index name where the data will be stored. This helps in organizing and retrieving data efficiently.

  • Options:

    • Create if not exists: Automatically creates the index if it doesn't already exist, ensuring seamless data storage.

  • Advanced configuration:

    • Content Key: Specify custom field name used to store content in the storage.

Pinecone Writer

Configuration

  • Name: Assign a name to the writer for easy identification.

  • Connection: Set up a connection to Pinecone, a scalable vector database service.

  • Index Name: Enter the index name to organize your data.

  • Embedding Dimension: Default is 1536, defining the size of the vector space. This affects the granularity of data representation.

  • Metric: Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.

  • Namespace: Define a namespace to segment data within the index, allowing for better organization.

  • Batch Size: Set the batch size for data writing, which can optimize performance by processing multiple entries at once.

  • Options:

    • Create if not exists: Ensures the index is created if it doesn't exist, facilitating uninterrupted data storage.

  • Index Type: There are two deployment options:

    • Serverless: Requires specifying the Cloud URL and Region for optimal data locality and access speed.

    • Pod: Requires specifying the Environment, Pod Type, and number of Pods for deployment.

  • Depending on the chosen deployment option, provide the related fields when "Create if not exists" is enabled.

  • Advanced configuration:

    • Content Key: Specify the custom field name used to store content in the storage.

Chroma Writer

Configuration:

  • Name: Provide a name for the writer to distinguish it in your setup.

  • Connection: Connect to Chroma, a service for managing vector data.

  • Index Name: Specify the index name for data storage.

  • Options:

    • Create if not exists: Automatically sets up the index if it's not present, ensuring smooth data operations.

Qdrant Writer

Configuration

  • Name: Set a name for the writer for easy reference.

  • Connection: Establish a connection to Qdrant, a high-performance vector database.

  • Index Name: Enter the index name to categorize your data.

  • Embedding Dimension: Default is 1536, which defines the vector size and affects data detail.

  • Metric: Choose a metric, such as cosine, to measure vector similarity.

  • Options:

    • Create if not exists: Automatically creates the index if needed, ensuring continuous data flow.

  • Advanced configuration:

    • Content Key: Specify custom field name used to store content in the storage.

Milvus Writer

Configuration

  • Name: Set a name for the writer for easy reference.

  • Connection: Establish a connection to Milvus, a highly performant, scalable vector database.

  • Index Name: Enter the index name to categorize your data.

  • Options:

    • Create if not exists: Automatically creates the index if needed, ensuring continuous data flow.

  • Advanced configuration:

    • Content Key: Specify a unique name for the field in the storage used to keep content.

    • Embedding key: Specify a unique name for the field in the storage used to keep the vector.

Elasticsearch Writer

Configuration

  • Name: Set a name for the writer for easy reference.

  • Connection: Establish a connection to Elasticsearch, distributed search and analytics engine.

  • Index Name: Enter the index name to categorize your data.

  • Embedding Dimension: Default is 1536, which defines the vector size and affects data detail.

  • Similarity: Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.

  • Write Batch Size: Defines the number of records processed and written in a single batch.

  • Options:

    • Create if not exists: Automatically creates the index if needed.

  • Advanced configuration:

    • Content Key: Specify a unique name for the field in the storage used to keep content.

    • Embedding key: Specify a unique name for the field in the storage used to keep the vector.

PGvector Writer:

Configuration

  • Name: Set a name for the writer for easy reference.

  • Connection: Establish a connection to pgvector, open-source vector similarity search for Postgres.

  • Index Name: Enter the index name to categorize your data.

  • Table Name: Enter the name of the table where the vectors will be stored.

  • Schema Name: Enter the name of the schema in the database.

  • Embedding Dimension: Default is 1536, which defines the vector size and affects data detail.

  • Metric: Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.

  • Index Method: Choose the indexing approach used for vector search.

  • Keyword Index name: Enter the name of the index for keyword-based search.

  • Options:

    • Create extension: Enable automatic creation of the pgvector extension.

    • Create if not exists: Automatically creates the index if needed.

  • Advanced configuration:

    • Content Key: Specify a unique name for the field in the storage used to keep content.

    • Embedding key: Specify a unique name for the field in the storage used to keep the vector.

How to Use Document Writers

  1. Input:

    • Provide the vectorized documents from the previous vectorization step.

  2. Configuration:

    • Select the appropriate writer based on your storage requirements.

    • Configure the necessary parameters such as connection, index name, and embedding dimensions.

  3. Output:

    • The writer stores the vectorized data, making it accessible for retrieval during the inference phase.

Benefits of Document Writers

  • Efficient Storage: Optimizes data storage for quick retrieval.

  • Scalability: Handles large datasets, supporting extensive knowledge bases.

  • Flexibility: Offers various configurations to suit different storage needs.

By effectively utilizing document writers, you can ensure that your RAG application is equipped to deliver precise and contextually relevant information efficiently.