Document Writers
Last updated
Last updated
In the indexing workflow of a Retrieval-Augmented Generation (RAG) application, document writers are essential for storing vectorized data, ensuring efficient retrieval during the inference phase. By organizing and saving data in a structured manner, document writers enable the system to quickly access and utilize relevant information, enhancing the overall performance and accuracy of the application.
There are many options available in Dynamiq for document writers. Let's delve into the details of each to understand their unique features and configurations.
Configuration:
Name: Define a unique name for the writer to identify it within your workflow.
Connection: Establish a connection to Weaviate, a vector database optimized for storing and retrieving vectorized data.
Index Name: Specify the index name where the data will be stored. This helps in organizing and retrieving data efficiently.
Options:
Create if not exists: Automatically creates the index if it doesn't already exist, ensuring seamless data storage.
Enable caching: Improves performance by caching frequently accessed data.
Configuration:
Name: Assign a name to the writer for easy identification.
Connection: Set up a connection to Pinecone, a scalable vector database service.
Index Name: Enter the index name to organize your data.
Embedding Dimension: Default is 1536, defining the size of the vector space. This affects the granularity of data representation.
Metric: Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.
Namespace: Define a namespace to segment data within the index, allowing for better organization.
Batch Size: Set the batch size for data writing, which can optimize performance by processing multiple entries at once.
Options:
Create if not exists: Ensures the index is created if it doesn't exist, facilitating uninterrupted data storage.
Enable caching: Reduces retrieval times by storing frequently accessed data.
Index Type: There are two deployment options:
Serverless: Requires specifying the Cloud URL and Region for optimal data locality and access speed.
Pod: Requires specifying the Environment, Pod Type, and number of Pods for deployment.
Depending on the chosen deployment option, provide the related fields when "Create if not exists" is enabled.
Configuration:
Name: Provide a name for the writer to distinguish it in your setup.
Connection: Connect to Chroma, a service for managing vector data.
Index Name: Specify the index name for data storage.
Options:
Create if not exists: Automatically sets up the index if it's not present, ensuring smooth data operations.
Enable caching: Enhances retrieval speed by caching data.
Configuration:
Name: Set a name for the writer for easy reference.
Connection: Establish a connection to Qdrant, a high-performance vector database.
Index Name: Enter the index name to categorize your data.
Embedding Dimension: Default is 1536, which defines the vector size and affects data detail.
Metric: Choose a metric, such as cosine, to measure vector similarity.
Options:
Create if not exists: Automatically creates the index if needed, ensuring continuous data flow.
Enable caching: Speeds up data retrieval by storing commonly accessed vectors.
Input:
Provide the vectorized documents from the previous vectorization step.
Configuration:
Select the appropriate writer based on your storage requirements.
Configure the necessary parameters such as connection, index name, and embedding dimensions.
Output:
The writer stores the vectorized data, making it accessible for retrieval during the inference phase.
Efficient Storage: Optimizes data storage for quick retrieval.
Scalability: Handles large datasets, supporting extensive knowledge bases.
Flexibility: Offers various configurations to suit different storage needs.
By effectively utilizing document writers, you can ensure that your RAG application is equipped to deliver precise and contextually relevant information efficiently.