# AWS

This document walks you through deploying the **Dynamiq GenAI Operating Platform** into **your own AWS VPC** from an AWS Marketplace subscription.\
It is aimed at DevOps engineers, SREs, software engineers, and data‑science practitioners who are comfortable with the AWS CLI and Kubernetes tooling.

***

### Table of Contents

1. [Prerequisites](#prerequisites)
2. [Subscribe on AWS Marketplace](#subscribe-on-aws-marketplace)
3. [Set your environment variables](#set-your-environment-variables)
4. [Create the prerequisite IAM roles](#create-the-prerequisite-iam-roles)
5. [Provision the EKS cluster](#provision-the-eks-cluster)
6. [Create the RDS database](#create-the-rds-database)
7. [Install Karpenter](#install-karpenter)
8. [Create the node pools](#create-the-node-pools)
9. [Install External Secrets & supporting add‑ons](#install-external-secrets-and-supporting-add-ons)
10. [Store Dynamiq secrets](#store-dynamiq-secrets)
11. [Create the Dynamiq service account](#create-the-dynamiq-service-account)
12. [Prepare the S3 bucket and Helm values](#prepare-the-s3-bucket-and-helm-values)
13. [Authenticate to ECR and deploy Dynamiq](#authenticate-to-ecr-and-deploy-dynamiq)
14. [Validate the deployment](#validate-the-deployment)
15. [Cleanup (optional)](#cleanup-optional)

***

### Prerequisites

* **AWS account** with Administrator‑level access (or the specific permissions listed below).
* **AWS CLI ≥ 2.15**, **kubectl ≥ 1.31**, **eksctl ≥ 0.175**, **Helm ≥ 3.14**, **jq**, and **envsubst** installed locally.
* Public or private **domain name** (e.g. `example.com`) that you control and are able to delegate to Route 53.
* At least **one VPC quota slot** for a new EKS cluster (eksctl will create the VPC by default).
* **Service quotas** for the EC2 instance families you plan to use (`m5` for platform nodes, `g5` for GPU nodes).

The acting IAM principal must be allowed to manage EKS, CloudFormation, IAM, RDS, Secrets Manager, S3, STS, and associated resources. For production we recommend deploying from a short‑lived CI user or assume‑role with the following AWS managed policies attached:

* `AmazonEKSClusterPolicy`
* `AmazonEKSServicePolicy`
* `AmazonEKSWorkerNodePolicy`
* `AmazonEC2ContainerRegistryPowerUser`
* `AmazonRDSFullAccess`
* `AWSCloudFormationFullAccess`
* `IAMFullAccess`
* `SecretsManagerReadWrite`
* `AmazonS3FullAccess`

***

### Subscribe on AWS Marketplace

1. Open the [Dynamiq GenAI Operating Platform listing](https://aws.amazon.com/marketplace) in your browser.
2. Click **Continue to Subscribe** → **Accept terms**.
3. Wait until the subscription status shows **Subscribed**.

*No additional Marketplace configuration is required; the Helm chart (deployed later) records usage automatically.*

***

### Set your environment variables

Edit only the three highlighted variables, then copy‑paste the whole block:

```bash
# ---------- BEGIN USER CONFIG -------------------
export AWS_DEFAULT_REGION="us-east-2"        # <— Change to your preferred AWS region
export CLUSTER_NAME="dynamiq-demo"           # <— Unique, lowercase, DNS‑compatible cluster name
export BASE_DOMAIN="example.com"             # <— Root or sub‑domain you control
# ---------- END USER CONFIG ---------------------

export K8S_VERSION="1.31"
export AWS_PARTITION="aws"

export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"

export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"
```

> \*\*Tip   \*\*Add `set -euo pipefail` to abort on errors; all commands below are idempotent unless otherwise noted.

***

### Create the prerequisite IAM roles

The CloudFormation template bundled with Dynamiq creates the minimal IAM roles and policies required by Karpenter and External Secrets.

```bash
aws cloudformation deploy \
  --stack-name "${CLUSTER_NAME}" \
  --template-file ./dynamiq-stack.yaml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"
```

> **Wait** until the stack status reads **CREATE\_COMPLETE** (≈ 1–2 minutes).

***

### Provision the EKS cluster

Paste the snippet below **as‑is**; `envsubst` injects your variables inline:

```bash
envsubst < <(cat <<'EOF'
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "${K8S_VERSION}"
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

iam:
  withOIDC: true
  podIdentityAssociations:
  - serviceAccountName: karpenter
    namespace: kube-system
    roleName: ${CLUSTER_NAME}-karpenter
    permissionPolicyARNs:
      - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
  - serviceAccountName: external-secrets
    namespace: external-secrets
    roleName: ${CLUSTER_NAME}-external-secrets
    permissionPolicyARNs:
      - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/ExternalSecretsPolicy-${CLUSTER_NAME}

iamIdentityMappings:
- arn: arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}
  username: system:node:{{EC2PrivateDNSName}}
  groups:
    - system:bootstrappers
    - system:nodes

managedNodeGroups:
- name: ${CLUSTER_NAME}-ng
  instanceType: m5.large
  desiredCapacity: 1
  minSize: 1
  maxSize: 2
  amiFamily: AmazonLinux2

addons:
- name: eks-pod-identity-agent
EOF
) | eksctl create cluster -f -
```

When the command completes you will have:

* An EKS cluster with one **m5.large** node.
* OIDC provider enabled for IAM Roles for Service Accounts (IRSA).

Retrieve a few handy values:

```bash
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name "${CLUSTER_NAME}" --query 'cluster.endpoint' --output text)"
export KARPENTER_IAM_ROLE_ARN="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
export EXTERNALSECRETS_IAM_ROLE_ARN="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-external-secrets"
```

***

### Create the RDS database

Dynamiq stores structured metadata in PostgreSQL. A convenience CloudFormation stack provisions a single‐AZ **db.t3.medium** instance with encrypted storage.

```bash
export RDS_PASSWORD="d$(date +%s | sha256sum | cut -c1-32)"  # generates a 32‑char password

aws cloudformation deploy \
  --stack-name "${CLUSTER_NAME}-rds" \
  --template-file ./dynamiq-stack-rds.yaml \
  --parameter-overrides ClusterName=${CLUSTER_NAME} DBMasterUserPassword=${RDS_PASSWORD}
```

> **Security note**  Store `RDS_PASSWORD` securely (e.g. in AWS Secrets Manager) after creation.

***

### Install Karpenter

```bash
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true

helm registry logout public.ecr.aws 2>/dev/null || true
export KARPENTER_VERSION="1.0.6"

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "${KARPENTER_VERSION}" \
  --namespace kube-system \
  --create-namespace \
  --set replicas=1 \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait
```

***

### Create the node pools

The following manifests declare two node pools:

* **Platform (m5)**     for web/API workloads.
* **GPU (g5)**          for model inference.

```bash
# Platform nodes
cat <<'EOF' | envsubst | kubectl apply -f -
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: platform
spec:
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  amiFamily: AL2
  amiSelectorTerms:
    - id: ${AMD_AMI_ID}
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        deleteOnTermination: true
        encrypted: true
        iops: 3000
        throughput: 125
        volumeSize: 100Gi
        volumeType: gp3
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: m5
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: 48h
  limits:
    cpu: 128
  template:
    spec:
      nodeClassRef:
        kind: EC2NodeClass
        name: platform
      requirements:
        - key: getdynamiq.ai/workload
          operator: In
          values: ["application"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["large","xlarge","2xlarge","4xlarge","8xlarge"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
EOF

# GPU nodes
cat <<'EOF' | envsubst | kubectl apply -f -
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  amiFamily: AL2
  amiSelectorTerms:
    - id: ${GPU_AMI_ID}
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        deleteOnTermination: true
        encrypted: true
        iops: 3000
        throughput: 125
        volumeSize: 300Gi
        volumeType: gp3
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu-g5
spec:
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 1m0s
  limits:
    cpu: 256
  template:
    spec:
      nodeClassRef:
        kind: EC2NodeClass
        name: gpu
      requirements:
        - key: nvidia.com/gpu
          operator: In
          values: ["true"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["g5"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge","2xlarge","4xlarge","8xlarge","16xlarge","12xlarge","24xlarge","48xlarge"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
EOF
```

***

### Install External Secrets & supporting add‑ons

```bash
# External Secrets Operator
helm upgrade --install external-secrets external-secrets \
  --repo https://charts.external-secrets.io \
  --namespace external-secrets \
  --create-namespace \
  --wait

# Ingress Nginx
helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.ingressClassResource.default=true \
  --wait
```

Create a **ClusterSecretStore** pointing External Secrets to Secrets Manager:

```bash
cat <<'EOF' | envsubst | kubectl apply -f -
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: dynamiq
spec:
  provider:
    aws:
      region: "${AWS_DEFAULT_REGION}"
      service: SecretsManager
EOF
```

***

### Store Dynamiq secrets

Update the placeholders *before* running:

```bash
cat > dynamiq_secrets.json <<'EOF'
{
  "AUTH_ACCESS_TOKEN_KEY": "<CHANGE_THIS_VALUE>",
  "AUTH_REFRESH_TOKEN_KEY": "<CHANGE_THIS_VALUE>",
  "AUTH_VERIFICATION_TOKEN_KEY": "<CHANGE_THIS_VALUE>",
  "AUTH_INTERNAL_TOKEN_KEY": "<CHANGE_THIS_VALUE>"
  "HUGGING_FACE_ACCESS_TOKEN": "<CHANGE_THIS_VALUE>",
  "SMTP_HOST": "<CHANGE_THIS_VALUE>",
  "SMTP_USERNAME": "<CHANGE_THIS_VALUE>",
  "SMTP_PASSWORD": "<CHANGE_THIS_VALUE>",
  "FIRECRAWL_API_KEY": "<CHANGE_THIS_VALUE>",
  "TOGETHER_API_KEY": "<CHANGE_THIS_VALUE>",
  "OPENAI_API_KEY": "<CHANGE_THIS_VALUE>"
}
EOF

aws secretsmanager create-secret \
  --name DYNAMIQ \
  --description "Dynamiq Platform Secret" \
  --secret-string file://dynamiq_secrets.json
```

***

### Create the Dynamiq service account

```bash
kubectl create namespace dynamiq

eksctl create iamserviceaccount \
  --name dynamiq-aws \
  --namespace dynamiq \
  --cluster ${CLUSTER_NAME} \
  --attach-policy-arn arn:aws:iam::aws:policy/AWSMarketplaceMeteringFullAccess \
  --attach-policy-arn arn:aws:iam::aws:policy/AWSMarketplaceMeteringRegisterUsage \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AWSLicenseManagerConsumptionPolicy \
  --approve \
  --override-existing-serviceaccounts
```

***

### Prepare the S3 bucket and Helm values

```bash
export STORAGE_S3_BUCKET="${CLUSTER_NAME}-data-$(openssl rand -hex 4)"

aws s3api create-bucket \
  --bucket "${STORAGE_S3_BUCKET}" \
  --region "${AWS_DEFAULT_REGION}" \
  --create-bucket-configuration LocationConstraint="${AWS_DEFAULT_REGION}"
```

Create a **local.values.yaml** file with domain overrides:

```bash
envsubst <<EOF > local.values.yaml                   
dynamiq:
  domain: ${BASE_DOMAIN}

nexus:
  image:
    repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/dynamiq/enterprise/nexus
  ingress:
    enabled: true
  externalSecrets:
    enabled: true
  configMapData:
    SMTP_FROM_NAME: 'Dynamiq'
    SMTP_FROM_EMAIL: 'noreply@dynamiq.local'
    STORAGE_SERVICE: s3
    STORAGE_S3_BUCKET: ${STORAGE_S3_BUCKET}

synapse:
  image:
    repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/dynamiq/enterprise/synapse
  ingress:
    enabled: true
  externalSecrets:
    enabled: true
  configMapData:
    STORAGE_SERVICE: s3
    STORAGE_S3_BUCKET: ${STORAGE_S3_BUCKET}

catalyst:
  image:
    repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/dynamiq/enterprise/catalyst
  externalSecrets:
    enabled: true
  configMapData:
    STORAGE_SERVICE: s3
    STORAGE_S3_BUCKET: ${STORAGE_S3_BUCKET}

ui:
  image:
    repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/dynamiq/enterprise/ui
  ingress:
    enabled: true
  configMapData: {}
EOF
```

***

### Authenticate to ECR and deploy Dynamiq

```bash
aws ecr get-login-password --region us-east-1 | \
  helm registry login --username AWS --password-stdin 709825985650.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install dynamiq oci://709825985650.dkr.ecr.us-east-1.amazonaws.com/dynamiq/enterprise/dynamiq \
  --namespace dynamiq \
  --values local.values.yaml \
  --wait
```

***

### Validate the deployment

```bash
kubectl get ingress -n dynamiq -o wide
```

Create `A` or `CNAME` records for the three hostnames (`nexus`, `ui`) in Route 53 pointing to the Load Balancer address shown in the `ADDRESS` column. Once DNS propagates you should be able to visit:

* `https://app.${BASE_DOMAIN}` — Dynamiq web console
* `https://api.${BASE_DOMAIN}` — Dynamiq API

***

### Cleanup (optional)

The following commands remove **all** resources created by this guide. **Irreversible!**

```bash
helm uninstall dynamiq -n dynamiq || true
helm uninstall karpenter -n kube-system || true
kubectl delete nodeclaims --all || true

aws cloudformation delete-stack --stack-name "${CLUSTER_NAME}-rds"
aws cloudformation delete-stack --stack-name "${CLUSTER_NAME}"
eksctl delete cluster --name "${CLUSTER_NAME}"

aws s3api delete-bucket --bucket "${STORAGE_S3_BUCKET}"
```

***

### Next Steps

* Enable **HTTPS** by attaching an AWS Certificate Manager (ACM) certificate to the ALB Ingress Controller or by terminating TLS at an external load balancer.
* Adjust **Karpenter NodePool limits** to meet your workload demands.
* Integrate with your **observability stack** (Dynatrace, Datadog, CloudWatch) using Helm `--set` overrides.

Enjoy building with Dynamiq! ✨
