Retrievers & Rankers

Query vector stores with per-store retriever nodes, bundle retrieval into an agent tool, and re-rank results with Cohere, an LLM, or time weighting.

Retrievers (dynamiq.nodes.retrievers) search a vector store for the chunks closest to a query embedding. Rankers (dynamiq.nodes.rankers) reorder and trim those results before they reach the LLM. This page covers both, plus VectorStoreRetriever — the composite tool that packages embed → retrieve → re-rank for agents.

Retriever nodes

One retriever per store, all with the same shape: ChromaDocumentRetriever, ElasticsearchDocumentRetriever, MilvusDocumentRetriever, OpenSearchDocumentRetriever, PGVectorDocumentRetriever, PineconeDocumentRetriever, QdrantDocumentRetriever, WeaviateDocumentRetriever.

from dynamiq.connections import Pinecone as PineconeConnection
from dynamiq.nodes.retrievers import PineconeDocumentRetriever

retriever = PineconeDocumentRetriever(
    connection=PineconeConnection(),
    index_name="quickstart",
    top_k=5,
)

Configuration shared by all retrievers:

top_kint

How many documents to return (default 10).

filtersdict

Metadata filters applied to every search.

similarity_thresholdfloat

Minimum similarity (or maximum distance) for a document to be returned.

At run time a retriever takes the query embedding (produced by a text embedder) and optional per-call overrides for top_k, filters, and similarity_threshold; it returns documents, each carrying content, metadata, and a similarity score. The RAG Pipeline page shows the full embedder → retriever → LLM wiring.

Stores with hybrid search (for example Weaviate and Elasticsearch) also accept the raw query string and an alpha parameter that blends keyword and vector scoring (0 = pure keyword, 1 = pure vector).

VectorStoreRetriever: RAG as an agent tool

VectorStoreRetriever bundles a text embedder, a store retriever, and an optional reranker behind a single query interface. Because it is a tool-group node, you can hand it to an agent:

from dynamiq.connections import (
    Cohere as CohereConnection,
    OpenAI as OpenAIConnection,
    Pinecone as PineconeConnection,
)
from dynamiq.nodes.agents import Agent
from dynamiq.nodes.embedders import OpenAITextEmbedder
from dynamiq.nodes.llms import OpenAI
from dynamiq.nodes.rankers import CohereReranker
from dynamiq.nodes.retrievers import PineconeDocumentRetriever
from dynamiq.nodes.retrievers.retriever import VectorStoreRetriever

rag_tool = VectorStoreRetriever(
    name="knowledge-search",
    text_embedder=OpenAITextEmbedder(
        connection=OpenAIConnection(), model="text-embedding-3-small"
    ),
    document_retriever=PineconeDocumentRetriever(
        connection=PineconeConnection(), index_name="quickstart", top_k=20
    ),
    document_reranker=CohereReranker(connection=CohereConnection(), top_k=5),
)

llm = OpenAI(connection=OpenAIConnection(), model="gpt-4o")

agent = Agent(
    name="kb-agent",
    llm=llm,
    tools=[rag_tool],
    role="Answer questions using the knowledge-search tool and cite the sources you used.",
    max_loops=6,
)

result = agent.run(input_data={"input": "What does our refund policy say about digital goods?"})
print(result.output["content"])

A common pattern: retrieve generously (top_k=20 on the retriever), then let the reranker keep the best 5. The counterpart for writes is VectorStoreWriter (dynamiq.nodes.writers.writer), which pairs a document embedder with a store writer so an agent can persist new documents.

Rankers

All three rankers take query + documents (the TimeWeightedDocumentRanker only needs documents) and return a reordered, trimmed documents list, so they slot between a retriever and an LLM — or into document_reranker above.

CohereReranker

Cross-encoder re-ranking through Cohere's rerank API:

from dynamiq.connections import Cohere
from dynamiq.nodes.rankers import CohereReranker
from dynamiq.types import Document

ranker = CohereReranker(connection=Cohere())  # reads COHERE_API_KEY

output = ranker.run(
    input_data={
        "query": "What is machine learning?",
        "documents": [
            Document(content="Machine learning is a branch of AI...", score=0.8),
            Document(content="Deep learning is a subset of machine learning...", score=0.7),
        ],
    }
)
print(output.output["documents"])

modelstr

Default cohere/rerank-v3.5.

top_kint

Documents to keep (default 5).

thresholdfloat

Drop documents scoring below this relevance (default 0).

LLMDocumentRanker

Uses any LLM node to judge relevance — no extra vendor, fully customizable via prompt_template:

from dynamiq.connections import OpenAI as OpenAIConnection
from dynamiq.nodes.llms import OpenAI
from dynamiq.nodes.rankers import LLMDocumentRanker

ranker = LLMDocumentRanker(
    llm=OpenAI(connection=OpenAIConnection(), model="gpt-4o-mini"),
    top_k=5,
)

TimeWeightedDocumentRanker

Boosts recent documents based on a date stored in metadata — useful for news, tickets, and logs:

from dynamiq.nodes.rankers import TimeWeightedDocumentRanker

ranker = TimeWeightedDocumentRanker(
    top_k=5,
    date_field="date",            # metadata key holding the date
    date_format="%d %B, %Y",
    max_days=3600,                # age horizon for the decay
    min_coefficient=0.9,          # floor for the recency multiplier
)

Choosing a ranker

Ranker	Best when	Cost
`CohereReranker`	You want the strongest general-purpose relevance and already use Cohere	Per-call API usage
`LLMDocumentRanker`	You need custom relevance criteria or want to stay on one provider	LLM tokens
`TimeWeightedDocumentRanker`	Freshness matters as much as similarity	Free — pure computation

Retriever nodes

VectorStoreRetriever: RAG as an agent tool

Rankers

CohereReranker

LLMDocumentRanker

TimeWeightedDocumentRanker

Choosing a ranker

Next steps

RAG Pipeline

Embedders & Vector Stores

Agent

Rankers (platform)

On this page