# Audio and voice

## Overview

Audio and Voice Nodes are specialized components within the Dynamiq framework designed to handle Speech-to-Text (STT), Text-to-Speech (TTS) and Speech-to-Speech (STS) conversions. These nodes can transcribe audio files to text and synthesize spoken audio from text, leveraging **Whisper** for STT and **ElevenLabs** for TTS and STS.

The Audio and Voice nodes are essential for workflows that require:

* Transcribing audio files to text (e.g., meeting recordings).
* Converting text to audio, or audio to audio, for generating voice responses (e.g., virtual assistants, interactive systems).

## Whisper Speech-to-Text (STT) <a href="#whisper" id="whisper"></a>

<figure><img src="/files/r8CXFrin59eOvhewrJkn" alt="" width="563"><figcaption><p>Whisper node</p></figcaption></figure>

The Whisper node enables audio transcription using the Whisper model, providing high-quality speech-to-text conversion. This node is part of the Audio group in the **Workflow editor** and requires a connection to either the Whisper, or OpenAI, API.

### **Configuration** <a href="#whisper-configuration" id="whisper-configuration"></a>

* **Name**: Customizable name for identifying this node.
* **Connection**: The connection configuration for Whisper.
* **Model**: Model name, e.g., `whisper-1`.

### **Input** <a href="#whisper-input" id="whisper-input"></a>

* **audio**: Audio file input in Bytes or BytesIO format, supporting various audio formats (default is audio/wav).

### **Output** <a href="#whisper-output" id="whisper-output"></a>

* **content**: Transcription output as a string containing the recognized text from the audio input.

{% hint style="info" %}
When used for tracing, or as final output, **content** is **base64-encoded** **string** to facilitate easy handling and transport.
{% endhint %}

### **Connection Configuration** <a href="#whisper-connection-configuration" id="whisper-connection-configuration"></a>

<figure><img src="/files/zlobq1jVPWZ4l8ZIeyyY" alt="" width="375"><figcaption><p>Whisper connection</p></figcaption></figure>

* **Type**: Whisper
* **Name**: Customizable name for identifying this connection.
* **API key**: Your API key
* **URL:** Whisper API URL, e.g., `https://api.openai.com/v1/` for OpenAI

### **Usage Example** <a href="#whisper-usage-example" id="whisper-usage-example"></a>

<figure><img src="/files/2ahROHcLVIgLws7yAUaL" alt=""><figcaption><p>Audio workflow with Whisper</p></figcaption></figure>

1. Add an **Input** node and connect your audio file.
2. Drag a **Whisper** node into the workspace and connect it to the **Input** node. Set the desired model and other configurations.
3. Make sure that in **Whisper** node Input section used input transformer like`{"audio":"$.input.output.files[0]}` to pass exact file from the list.
4. Attach a downstream node (e.g. **Output**) to handle the transcribed content.

## ElevenLabs Text-to-Speech

<figure><img src="/files/g93kBiGZOEUPOy8h3cws" alt="" width="563"><figcaption><p>ElevenLabs TTS node</p></figcaption></figure>

The ElevenLabs TTS node converts text into high-quality synthesized speech using ElevenLabs’ advanced TTS models. It provides options to adjust the voice characteristics, making it suitable for generating lifelike audio from text.

### **Configuration** <a href="#elevenlabs-tts-configuration" id="elevenlabs-tts-configuration"></a>

* **Name**: Customizable name for identifying this node.
* **Connection**: The connection configuration for Whisper.
* **Model**: Model name, e.g., `eleven_monolingual_v1`.
* **Voices**: Select from available voices, e.g. `Rachel`, to match your required voice profile.
* **Stability**: Controls the stability and consistency of the voice.
* **Similarity**: Adjusts how closely the voice resembles the original.
* **Style Exaggeration**: Amplifies the style of the speaker, enhancing expressiveness.
* **Speaker Boost**: Toggle to increase the likeness to the selected voice.

### **Input** <a href="#elevenlabs-tts-input" id="elevenlabs-tts-input"></a>

* **text**: Text input in string format for conversion to speech.

### **Output** <a href="#elevenlabs-tts-output" id="elevenlabs-tts-output"></a>

* **content**: Audio output as `bytes`, containing the synthesized speech.&#x20;

{% hint style="info" %}
When used for tracing, or as final output, **content** is **base64-encoded** **string** to facilitate easy handling and transport.
{% endhint %}

### **Connection Configuration** <a href="#elevenlabs-tts-connection-configuration" id="elevenlabs-tts-connection-configuration"></a>

<figure><img src="/files/Xj4KkeEgYxP71NORUjDC" alt="" width="375"><figcaption><p>ElevenLabs connection</p></figcaption></figure>

* **Type**: ElevenLabs
* **Name**: Customizable name for identifying this connection.
* **API key**: Your API key

### **Usage Example** <a href="#elevenlabs-tts-usage-example" id="elevenlabs-tts-usage-example"></a>

<figure><img src="/files/etmSqyICend80bePjM3v" alt=""><figcaption><p>Audio flow with ElevenLabs TTS</p></figcaption></figure>

1. Add an **Input** node to pass in text data.
2. Add **OpenAI** node to handle question and return answer (optional).
3. Connect **ElevenLabs TTS** to **OpenAI** node and configure the model, voice, and settings as desired.
4. Attach a downstream node (e.g. **Output**) to save or process the generated audio content.

## ElevenLabs Speech-to-Speech

<figure><img src="/files/3tegpdFrarsmTXbltb5t" alt="" width="563"><figcaption><p>ElevenLabs STS node</p></figcaption></figure>

The ElevenLabs STS node enables the transformation of an audio input into a new synthesized audio output in the selected voice. This node is particularly useful for voice modulation or re-synthesis applications, where the input speech is “re-voiced” using ElevenLabs models.

### **Configuration** <a href="#elevenlabs-sts-configuration" id="elevenlabs-sts-configuration"></a>

* **Name**: Customizable name for identifying this node.
* **Connection**: The connection configuration for Whisper.
* **Model**: Model name, e.g., `eleven_english_sts_v2`.
* **Voices**: Select from available voices, e.g. `Dave`, to match your required voice profile.
* **Stability**: Controls the stability and consistency of the voice.
* **Similarity**: Adjusts how closely the voice resembles the original.
* **Style Exaggeration**: Amplifies the style of the speaker, enhancing expressiveness.
* **Speaker Boost**: Toggle to increase the likeness to the selected voice.

### **Input** <a href="#elevenlabs-sts-input" id="elevenlabs-sts-input"></a>

* **audio**: Audio file input in `bytes` or `BytesIO` format, representing the original speech to be transformed.

### **Output** <a href="#elevenlabs-sts-output" id="elevenlabs-sts-output"></a>

* **content**: Audio output as `bytes`, containing the synthesized speech that mirrors the input but with the selected voice characteristics.

{% hint style="info" %}
When used for tracing or as final output, **content** is **base64-encoded** **string** to facilitate easy handling and transport.
{% endhint %}

### **Connection Configuration** <a href="#elevenlabs-sts-connection-configuration" id="elevenlabs-sts-connection-configuration"></a>

<figure><img src="/files/Xj4KkeEgYxP71NORUjDC" alt="" width="375"><figcaption><p>ElevenLabs connection</p></figcaption></figure>

* **Type**: ElevenLabs
* **Name**: Customizable name for identifying this connection.
* **API key**: Your API key

### **Usage Example** <a href="#elevenlabs-sts-usage-example" id="elevenlabs-sts-usage-example"></a>

<figure><img src="/files/HLxcdXnzKslow0PiF8Sd" alt="" width="563"><figcaption><p>Audio flow with ElevenLabs STS</p></figcaption></figure>

1. Add an Input node to provide the original audio file.
2. Connect it to the **ElevenLabs STS** node, select the desired model and voice, and configure the settings.
3. Make sure that in **ElevenLabs STS** node Input section used input transformer like`{"audio":"$.input.output.files[0]}` to pass exact file from the list.
4. Attach a downstream node (e.g. **Output**)  to export the generated audio content.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.getdynamiq.ai/low-code-builder/audio-and-voice.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
