# Document Writers

In the indexing workflow of a Retrieval-Augmented Generation (RAG) application, document writers are essential for storing vectorized data, ensuring efficient retrieval during the inference phase. By organizing and saving data in a structured manner, document writers enable the system to quickly access and utilize relevant information, enhancing the overall performance and accuracy of the application.

### Available Document Writers

There are many options available in Dynamiq for document writers. Let's delve into the details of each to understand their unique features and configurations.

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2F3UiRVYEhO5t10D3Zs074%2Fimage.png?alt=media&#x26;token=ddbb3c83-edcb-4992-87bc-a855b81a0e6f" alt="" width="172"><figcaption></figcaption></figure>

#### Weaviate Writer

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2FArWdFMNuVhfSfNEzibYd%2Fimage.png?alt=media&#x26;token=ed6f8a26-ba44-4978-8122-43c3ff32d5a3" alt="" width="318"><figcaption></figcaption></figure>

**Configuration**

* **Name:** Define a unique name for the writer to identify it within your workflow.
* **Connection:** Establish a connection to Weaviate, a vector database optimized for storing and retrieving vectorized data.
* **Index Name:** Specify the index name where the data will be stored. This helps in organizing and retrieving data efficiently.
* **Options:**
  * **Create if not exists:** Automatically creates the index if it doesn't already exist, ensuring seamless data storage.
* **Advanced configuration:**
  * **Content Key:**  Specify custom field name used to store content in the storage.

#### Pinecone Writer

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2FTXHi5rEyVG0u1o1ujIPi%2Fimage.png?alt=media&#x26;token=1f73c9ed-b624-4574-b03b-b8251a91e749" alt="" width="293"><figcaption></figcaption></figure>

**Configuration**

* **Name:** Assign a name to the writer for easy identification.
* **Connection:** Set up a connection to Pinecone, a scalable vector database service.
* **Index Name:** Enter the index name to organize your data.
* **Embedding Dimension:** Default is 1536, defining the size of the vector space. This affects the granularity of data representation.
* **Metric:** Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.
* **Namespace:** Define a namespace to segment data within the index, allowing for better organization.
* **Batch Size:** Set the batch size for data writing, which can optimize performance by processing multiple entries at once.
* **Options:**
  * **Create if not exists:** Ensures the index is created if it doesn't exist, facilitating uninterrupted data storage.
* **Index Type:** There are two deployment options:
  * **Serverless:** Requires specifying the Cloud URL and Region for optimal data locality and access speed.
  * **Pod:** Requires specifying the Environment, Pod Type, and number of Pods for deployment.
* Depending on the chosen deployment option, provide the related fields when "Create if not exists" is enabled.
* **Advanced configuration:**
  * **Content Key:**  Specify the custom field name used to store content in the storage.

#### Chroma Writer

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2FseAZNkmaNCZITby9HQy0%2Fimage.png?alt=media&#x26;token=e8d69dcd-0712-4261-be5e-d77392020b9a" alt="" width="293"><figcaption></figcaption></figure>

**Configuration:**

* **Name:** Provide a name for the writer to distinguish it in your setup.
* **Connection:** Connect to Chroma, a service for managing vector data.
* **Index Name:** Specify the index name for data storage.
* **Options:**
  * **Create if not exists:** Automatically sets up the index if it's not present, ensuring smooth data operations.

#### Qdrant Writer

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2F3WMsR8pObgo9S0DyUUZu%2Fimage.png?alt=media&#x26;token=1b85592e-5faf-4233-a83d-fd3bebc2819d" alt="" width="294"><figcaption></figcaption></figure>

**Configuration**

* **Name:** Set a name for the writer for easy reference.
* **Connection:** Establish a connection to Qdrant, a high-performance vector database.
* **Index Name:** Enter the index name to categorize your data.
* **Embedding Dimension:** Default is 1536, which defines the vector size and affects data detail.
* **Metric:** Choose a metric, such as cosine, to measure vector similarity.
* **Options:**
  * **Create if not exists:** Automatically creates the index if needed, ensuring continuous data flow.
* **Advanced configuration:**
  * **Content Key:**  Specify custom field name used to store content in the storage.

#### Milvus Writer

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2F2W8ePGlnCQThNnloPn44%2Fimage.png?alt=media&#x26;token=c58e9788-5a84-4b3f-9bee-012e81dd7247" alt="" width="294"><figcaption></figcaption></figure>

**Configuration**

* **Name:** Set a name for the writer for easy reference.
* **Connection:** Establish a connection to Milvus, a highly performant, scalable vector database.
* **Index Name:** Enter the index name to categorize your data.
* **Options:**
  * **Create if not exists:** Automatically creates the index if needed, ensuring continuous data flow.
* **Advanced configuration:**
  * **Content Key:**  Specify a unique name for the field in the storage used to keep content.
  * **Embedding key:**  Specify a unique name for the field in the storage used to keep the vector.

#### Elasticsearch Writer

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2FKOTS2uZ8KzEdK01eePzG%2Fimage.png?alt=media&#x26;token=7dccd494-4067-498c-b082-32e3e7d8581f" alt="" width="294"><figcaption></figcaption></figure>

**Configuration**

* **Name:** Set a name for the writer for easy reference.
* **Connection:** Establish a connection to Elasticsearch, distributed search and analytics engine.
* **Index Name:** Enter the index name to categorize your data.
* **Embedding Dimension:** Default is 1536, which defines the vector size and affects data detail.
* **Similarity:** Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.
* **Write Batch Size:** Defines the number of records processed and written in a single batch.
* **Options:**
  * **Create if not exists:** Automatically creates the index if needed.
* **Advanced configuration:**
  * **Content Key:**  Specify a unique name for the field in the storage used to keep content.
  * **Embedding key:**  Specify a unique name for the field in the storage used to keep the vector.

#### **PGvector Writer:**

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2FxMtN70Uj3N2tYjsEHHnH%2Fimage.png?alt=media&#x26;token=d7a583cb-3e78-4cbe-97d8-9f49ced95805" alt=""><figcaption></figcaption></figure>

**Configuration**

* **Name:** Set a name for the writer for easy reference.
* **Connection:** Establish a connection to pgvector, open-source vector similarity search for Postgres.
* **Index Name:** Enter the index name to categorize your data.
* **Table Name:** Enter the name of the table where the vectors will be stored.&#x20;
* **Schema Name:** Enter the name of the schema in the database.
* **Embedding Dimension:** Default is 1536, which defines the vector size and affects data detail.
* **Metric:** Choose a metric, e.g., cosine, to determine how similarity is calculated between vectors.
* **Index Method:** Choose the indexing approach used for vector search.
* **Keyword Index name:** Enter the name of the index for keyword-based search.
* **Options:**
  * **Create extension:** Enable automatic creation of the pgvector extension.
  * **Create if not exists:** Automatically creates the index if needed.
* **Advanced configuration:**
  * **Content Key:**  Specify a unique name for the field in the storage used to keep content.
  * **Embedding key:**  Specify a unique name for the field in the storage used to keep the vector.

### How to Use Document Writers

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2FxxBhEbipBQuk0sNCN8ky%2Fimage.png?alt=media&#x26;token=a2ca4083-e5cc-44df-9a29-160b9dc085db" alt=""><figcaption></figcaption></figure>

1. **Input:**
   * Provide the vectorized documents from the previous vectorization step.
2. **Configuration:**
   * Select the appropriate writer based on your storage requirements.
   * Configure the necessary parameters such as connection, index name, and embedding dimensions.
3. **Output:**
   * The writer stores the vectorized data, making it accessible for retrieval during the inference phase.

### Benefits of Document Writers

* **Efficient Storage:** Optimizes data storage for quick retrieval.
* **Scalability:** Handles large datasets, supporting extensive knowledge bases.
* **Flexibility:** Offers various configurations to suit different storage needs.

By effectively utilizing document writers, you can ensure that your RAG application is equipped to deliver precise and contextually relevant information efficiently.
