Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  • Overview
  • Whisper Speech-to-Text (STT)
  • Configuration
  • Input
  • Output
  • Connection Configuration
  • Usage Example
  • ElevenLabs Text-to-Speech
  • Configuration
  • Input
  • Output
  • Connection Configuration
  • Usage Example
  • ElevenLabs Speech-to-Speech
  • Configuration
  • Input
  • Output
  • Connection Configuration
  • Usage Example
  1. Low-Code Builder

Audio and voice

PreviousGuide to Advanced Agent Orchestration: Graph OrchestratorNextTools and External Integrations

Last updated 6 months ago

Overview

Audio and Voice Nodes are specialized components within the Dynamiq framework designed to handle Speech-to-Text (STT), Text-to-Speech (TTS) and Speech-to-Speech (STS) conversions. These nodes can transcribe audio files to text and synthesize spoken audio from text, leveraging Whisper for STT and ElevenLabs for TTS and STS.

The Audio and Voice nodes are essential for workflows that require:

  • Transcribing audio files to text (e.g., meeting recordings).

  • Converting text to audio, or audio to audio, for generating voice responses (e.g., virtual assistants, interactive systems).

Whisper Speech-to-Text (STT)

The Whisper node enables audio transcription using the Whisper model, providing high-quality speech-to-text conversion. This node is part of the Audio group in the Workflow editor and requires a connection to either the Whisper, or OpenAI, API.

Configuration

  • Name: Customizable name for identifying this node.

  • Connection: The connection configuration for Whisper.

  • Model: Model name, e.g., whisper-1.

Input

  • audio: Audio file input in Bytes or BytesIO format, supporting various audio formats (default is audio/wav).

Output

  • content: Transcription output as a string containing the recognized text from the audio input.

When used for tracing, or as final output, content is base64-encoded string to facilitate easy handling and transport.

Connection Configuration

  • Type: Whisper

  • Name: Customizable name for identifying this connection.

  • API key: Your API key

  • URL: Whisper API URL, e.g., https://api.openai.com/v1/ for OpenAI

Usage Example

  1. Add an Input node and connect your audio file.

  2. Drag a Whisper node into the workspace and connect it to the Input node. Set the desired model and other configurations.

  3. Make sure that in Whisper node Input section used input transformer like{"audio":"$.input.output.files[0]} to pass exact file from the list.

  4. Attach a downstream node (e.g. Output) to handle the transcribed content.

ElevenLabs Text-to-Speech

The ElevenLabs TTS node converts text into high-quality synthesized speech using ElevenLabs’ advanced TTS models. It provides options to adjust the voice characteristics, making it suitable for generating lifelike audio from text.

Configuration

  • Name: Customizable name for identifying this node.

  • Connection: The connection configuration for Whisper.

  • Model: Model name, e.g., eleven_monolingual_v1.

  • Voices: Select from available voices, e.g. Rachel, to match your required voice profile.

  • Stability: Controls the stability and consistency of the voice.

  • Similarity: Adjusts how closely the voice resembles the original.

  • Style Exaggeration: Amplifies the style of the speaker, enhancing expressiveness.

  • Speaker Boost: Toggle to increase the likeness to the selected voice.

Input

  • text: Text input in string format for conversion to speech.

Output

  • content: Audio output as bytes, containing the synthesized speech.

When used for tracing, or as final output, content is base64-encoded string to facilitate easy handling and transport.

Connection Configuration

  • Type: ElevenLabs

  • Name: Customizable name for identifying this connection.

  • API key: Your API key

Usage Example

  1. Add an Input node to pass in text data.

  2. Add OpenAI node to handle question and return answer (optional).

  3. Connect ElevenLabs TTS to OpenAI node and configure the model, voice, and settings as desired.

  4. Attach a downstream node (e.g. Output) to save or process the generated audio content.

ElevenLabs Speech-to-Speech

The ElevenLabs STS node enables the transformation of an audio input into a new synthesized audio output in the selected voice. This node is particularly useful for voice modulation or re-synthesis applications, where the input speech is “re-voiced” using ElevenLabs models.

Configuration

  • Name: Customizable name for identifying this node.

  • Connection: The connection configuration for Whisper.

  • Model: Model name, e.g., eleven_english_sts_v2.

  • Voices: Select from available voices, e.g. Dave, to match your required voice profile.

  • Stability: Controls the stability and consistency of the voice.

  • Similarity: Adjusts how closely the voice resembles the original.

  • Style Exaggeration: Amplifies the style of the speaker, enhancing expressiveness.

  • Speaker Boost: Toggle to increase the likeness to the selected voice.

Input

  • audio: Audio file input in bytes or BytesIO format, representing the original speech to be transformed.

Output

  • content: Audio output as bytes, containing the synthesized speech that mirrors the input but with the selected voice characteristics.

When used for tracing or as final output, content is base64-encoded string to facilitate easy handling and transport.

Connection Configuration

  • Type: ElevenLabs

  • Name: Customizable name for identifying this connection.

  • API key: Your API key

Usage Example

  1. Add an Input node to provide the original audio file.

  2. Connect it to the ElevenLabs STS node, select the desired model and voice, and configure the settings.

  3. Make sure that in ElevenLabs STS node Input section used input transformer like{"audio":"$.input.output.files[0]} to pass exact file from the list.

  4. Attach a downstream node (e.g. Output) to export the generated audio content.

Whisper node
Whisper connection
Audio workflow with Whisper
ElevenLabs TTS node
ElevenLabs connection
Audio flow with ElevenLabs TTS
ElevenLabs STS node
ElevenLabs connection
Audio flow with ElevenLabs STS