Amazon MSK makes it easy to run Apache Kafka clusters on AWS. Sometimes you need to expose MSK to external clients. Deploying it in a public subnet is the most common case to achieve this. However, in some cases, you need to keep your MSK in a private subnet or expose MSK with your custom domain name.
TL;DR
You need to set up kafka-proxy with custom configuration and a custom TLS certificate, then expose it via a Network Load Balancer.
AWS Blog references:
- Secure connectivity patterns to access Amazon MSK across AWS Regions
- Connect Kafka client applications securely to your Amazon MSK cluster from different VPCs and AWS accounts
Use Cases
Exposing Amazon MSK from a private subnet with a custom FQDN and TLS certificate is useful in the following scenarios:
- Multi-Region Deployments : When you need to connect clients from different AWS regions to your MSK cluster securely.
- Hybrid Cloud Environments : When external clients or on-premises systems need access to your MSK cluster without exposing it directly to the public internet.
- Compliance Requirements : When your organization mandates the use of private subnets and custom domain names for secure communication.
- Custom Domain Branding : When you want to use a custom FQDN for branding or easier client configuration.
High-Level Design
The following assumptions are presented here:
- Amazon MSK is deployed in a private subnet. No extra steps in configuration are made.
- TLS certificate and private key are already stored in AWS Secrets Manager.
- Certificate Manager is not used in this case because we need to retrieve the private key.
- NLB TLS listener is not used in this case because we need to replace not only the certificate but also update Kafka listeners’ FQDN, and kafka-proxy does all this job for us.
- Appropriate IAM roles are already configured to provide access to necessary resources.
- Security groups configuration is out of scope.
Step by Step
Precondition
VPC is configured, and you have enough unused IP addresses in your private VPC to run MSK, ECS, and NLB.
Certificates
Put your certificate and private key into a secret in AWS Secrets Manager.
You can automatically get a certificate for free from various CAs with ACME support, for example:
Amazon MSK Deployment
Use your favorite automation tool (Terraform, Pulumi, or CDK) or create it manually to get Amazon MSK up and running.
Elastic Container Services
ECS setup and configuration is pretty standard as well.
Here are a few hints:
- Do not forget to mirror all used Docker images to ECR to avoid cases where you are unable to spin up the service due to image unavailability or Docker Hub rate limits.
- To deliver the certificate, we will use a special init container that retrieves the certificate and private key from AWS Secrets Manager and places them in a shared volume. Here you can read about the tool capable of doing this.
Below you can find ECS task definitions to bring everything together:
click to expand ECS task definitions
[
{
"name": "aws-secret-to-file",
"image": "ghcr.io/technicaldomain/aws-secret-to-file:latest",
"cpu": 100,
"enableExecuteCommand": true,
"memory": 32,
"memoryReservation": 32,
"essential": false,
"command": [
"/bin/aws-secret-to-file",
"--secret=/demo/kafka/tls-certificate",
"--output=/app/certificate.crt",
"--secret=/demo/kafka/tls-certificate-key",
"--output=/app/certificate.key"
],
"environment": [
{
"name": "AWS\_DEFAULT\_REGION",
"value": "us-east-1"
}
],
"mountPoints": [
{
"readOnly": false,
"containerPath": "/app",
"sourceVolume": "certificates"
}
],
"portMappings": [
{
"containerPort": 1196,
"hostPort": 0,
"protocol": "tcp"
},
{
"containerPort": 1197,
"hostPort": 0,
"protocol": "tcp"
},
{
"containerPort": 1198,
"hostPort": 0,
"protocol": "tcp"
}
],
"volumesFrom": [],
"linuxParameters": {
"initProcessEnabled": true,
"maxSwap": 1024,
"swappiness": 60
},
"secrets": [],
"startTimeout": 120,
"stopTimeout": 120,
"disableNetworking": false,
"privileged": false,
"readonlyRootFilesystem": false,
"interactive": true,
"pseudoTerminal": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "demo-log-group",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "kafka-proxy",
"image": "account\_id.dkr.ecr.us-east-1.amazonaws.com/kafka-proxy:0.4.2",
"cpu": 1000,
"enableExecuteCommand": true,
"memory": 512,
"memoryReservation": 2048,
"essential": true,
"command": [
"server",
"--http-metrics-path=/metrics",
"--tls-enable",
"--proxy-listener-cert-file=/app/certificate.crt",
"--proxy-listener-key-file=/app/certificate.key",
"--proxy-listener-tls-enable=true",
"--bootstrap-server-mapping=b-1.mskname.random.a123.kafka.us-east-1.amazonaws.com:9096,0.0.0.0:1196,kafka.demo.technicaldomain.xyz:1196",
"--bootstrap-server-mapping=b-2.mskname.random.a123.kafka.us-east-1.amazonaws.com:9096,0.0.0.0:1197,kafka.demo.technicaldomain.xyz:1197",
"--bootstrap-server-mapping=b-3.mskname.random.a123.kafka.us-east-1.amazonaws.com:9096,0.0.0.0:1198,kafka.demo.technicaldomain.xyz:1198",
"--log-format=json"
],
"environment": [],
"mountPoints": [
{
"readOnly": null,
"containerPath": "/app",
"sourceVolume": "certificates"
}
],
"portMappings": [],
"volumesFrom": [],
"linuxParameters": {
"initProcessEnabled": true,
"maxSwap": 1024,
"swappiness": 60
},
"dependsOn": [
{
"containerName": "aws-secret-to-file",
"condition": "COMPLETE"
}
],
"secrets": [],
"startTimeout": 120,
"stopTimeout": 120,
"disableNetworking": false,
"privileged": false,
"readonlyRootFilesystem": false,
"interactive": true,
"healthCheck": {
"retries": 3,
"command": [
"CMD-SHELL",
"wget -q -O - http://localhost:9080/health | grep OK || exit 1"
],
"timeout": 5,
"interval": 30,
"startPeriod": null
},
"pseudoTerminal": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "demo-log-group",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
All the magic happens when we provide the bootstrap-server-mapping
parameter. In this configuration, we:
- Connect to
b-1.mskname.random.a123.kafka.us-east-1.amazonaws.com:9096
Kafka bootstrap server. - Map it to the local
0.0.0.0:1196
host and port. - Update the advertised host and port to the value we provide to the outside (
kafka.demo.technicaldomain.xyz:1196
).
Then you can expose this ECS via NLB and have the correct custom FQDN and valid TLS certificate.
Job done.