Embedders & Vector Stores
Eight embedding providers and eight vector stores — the provider/store matrix with writer configuration for each.
Embedders turn text into vectors; vector stores persist them. Every embedding provider ships in two flavors: a document embedder (embeds a list of Document chunks at indexing time) and a text embedder (embeds a single query string at retrieval time). Every store ships a writer node for indexing and a retriever node for search.
Embedding providers
All embedders live in dynamiq.nodes.embedders. Connections read their API keys from environment variables — see Connections & Credentials.
| Provider | Document embedder | Text embedder | Default model |
|---|---|---|---|
| OpenAI | OpenAIDocumentEmbedder | OpenAITextEmbedder | text-embedding-3-small |
| Cohere | CohereDocumentEmbedder | CohereTextEmbedder | cohere/embed-english-v2.0 |
| AWS Bedrock | BedrockDocumentEmbedder | BedrockTextEmbedder | amazon.titan-embed-text-v1 |
| Mistral | MistralDocumentEmbedder | MistralTextEmbedder | mistral/mistral-embed |
| Gemini | GeminiDocumentEmbedder | GeminiTextEmbedder | gemini/gemini-embedding-exp-03-07 |
| Hugging Face | HuggingFaceDocumentEmbedder | HuggingFaceTextEmbedder | huggingface/BAAI/bge-large-zh (document) / huggingface/microsoft/codebert-base (text) |
| IBM watsonx | WatsonXDocumentEmbedder | WatsonXTextEmbedder | watsonx/ibm/slate-30m-english-rtrvr |
| Vertex AI | VertexAIDocumentEmbedder | VertexAITextEmbedder | vertex_ai/text-embedding-005 |
from dynamiq.connections import OpenAI as OpenAIConnection
from dynamiq.nodes.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from dynamiq.types import Document
connection = OpenAIConnection() # reads OPENAI_API_KEY
# Indexing side: documents in, documents-with-embeddings out
doc_embedder = OpenAIDocumentEmbedder(connection=connection, model="text-embedding-3-small")
docs = doc_embedder.run(
input_data={"documents": [Document(content="Machine learning is a branch of AI.")]}
).output["documents"]
# Retrieval side: query in, embedding out
text_embedder = OpenAITextEmbedder(connection=connection, model="text-embedding-3-small")
out = text_embedder.run(input_data={"query": "What is machine learning?"}).output
embedding = out["embedding"] # list[float]
query = out["query"] # original string, handy for prompts downstreamIndex and query with the same provider and model. The document embedder sets the vector space; the text embedder must live in it. If you switch models, re-index.
Vector stores
Writers live in dynamiq.nodes.writers; the underlying store clients in dynamiq.storages.vector:
| Store | Writer node | Retriever node |
|---|---|---|
| Pinecone | PineconeDocumentWriter | PineconeDocumentRetriever |
| Weaviate | WeaviateDocumentWriter | WeaviateDocumentRetriever |
| Qdrant | QdrantDocumentWriter | QdrantDocumentRetriever |
| Milvus | MilvusDocumentWriter | MilvusDocumentRetriever |
| Chroma | ChromaDocumentWriter | ChromaDocumentRetriever |
| Elasticsearch | ElasticsearchDocumentWriter | ElasticsearchDocumentRetriever |
| OpenSearch | OpenSearchDocumentWriter | OpenSearchDocumentRetriever |
| pgvector | PGVectorDocumentWriter | PGVectorDocumentRetriever |
Writers take documents (already embedded) as input and report upserted_count in their output. Set create_if_not_exist=True to create the index programmatically.
Pinecone
Serverless deployment:
from dynamiq.connections import Pinecone as PineconeConnection
from dynamiq.nodes.writers import PineconeDocumentWriter
writer = PineconeDocumentWriter(
connection=PineconeConnection(),
index_name="quickstart",
dimension=1536,
create_if_not_exist=True,
index_type="serverless",
cloud="aws",
region="us-east-1",
)Pod-based deployment:
writer = PineconeDocumentWriter(
connection=PineconeConnection(),
index_name="quickstart",
dimension=1536,
create_if_not_exist=True,
index_type="pod",
environment="us-west1-gcp",
pod_type="p1.x1",
pods=1,
)Elasticsearch
from dynamiq.connections import Elasticsearch as ElasticsearchConnection
from dynamiq.nodes.writers import ElasticsearchDocumentWriter
writer = ElasticsearchDocumentWriter(
connection=ElasticsearchConnection(
url="https://localhost:9200",
api_key="your-api-key",
),
index_name="documents",
dimension=1536,
similarity="cosine",
)For Elastic Cloud, authenticate with username, password, and cloud_id on the connection instead of url/api_key, and optionally pass index_settings / mapping_settings dicts when creating the index.
Weaviate
from dynamiq.nodes.writers import WeaviateDocumentWriter
from dynamiq.storages.vector import WeaviateVectorStore
writer = WeaviateDocumentWriter(
vector_store=WeaviateVectorStore(index_name="Documents", create_if_not_exist=True)
)Any writer can be built either from a connection (the node constructs the store) or from a prebuilt vector_store instance, as shown here.
A complete embed-and-store fragment
from dynamiq import Workflow
from dynamiq.connections import OpenAI as OpenAIConnection, Pinecone as PineconeConnection
from dynamiq.nodes.embedders import OpenAIDocumentEmbedder
from dynamiq.nodes.writers import PineconeDocumentWriter
from dynamiq.types import Document
wf = Workflow()
embedder = OpenAIDocumentEmbedder(
connection=OpenAIConnection(), model="text-embedding-3-small"
)
writer = (
PineconeDocumentWriter(
connection=PineconeConnection(),
index_name="quickstart",
dimension=1536,
create_if_not_exist=True,
index_type="serverless",
cloud="aws",
region="us-east-1",
)
.inputs(documents=embedder.outputs.documents)
.depends_on(embedder)
)
wf.flow.add_nodes(embedder, writer)
result = wf.run(
input_data={
"documents": [
Document(content="Dynamiq is an operating platform for agentic AI."),
]
}
)
print(result.output[writer.id]["output"]["upserted_count"]) # 1dimension must match the embedding model's output size — text-embedding-3-small produces 1536-dimensional vectors. On the platform, the same writers back Knowledge Base storage — see Vector Store vs Knowledge Base.
Next steps
Document Processing
Convert files into documents and split them into chunks — the full converter and splitter catalog with configuration.
Retrievers & Rankers
Query vector stores with per-store retriever nodes, bundle retrieval into an agent tool, and re-rank results with Cohere, an LLM, or time weighting.