I Found Suspicious DNS Calls on AWS

I was debugging flaky outbound calls from an AWS Fargate task. To gain better visibility into potential network issues, I followed the common best practice of enabling Route 53 Resolver query logging for DNS queries within my VPC on AWS. Shortly after the logs started populating, while examining them for clues related to the original outbound problems, I noticed something odd...

I noticed repeated lookups for metadata.google.internal. — an endpoint meant exclusively for Google Cloud VMs, but hey, my workload was on AWS all the time!

At first glance, it felt like a hacker reconnaissance probing my AWS Fargate tasks 🕵️‍♂️. In reality, it was an open-source Application Performance Monitoring (APM) agent auto-detecting the cloud provider by testing GCP metadata when AWS metadata was unavailable (AWS Fargate tasks don't expose the EC2 instance metadata endpoint at 169.254.169.254). Without logging, these phantom calls would’ve remained invisible 👻.


Why Enable Resolver Query Logging?

  1. 🔍 Complete DNS Visibility

    Capture every query name, source IP/instance ID, response code, and timestamp — no more blind spots, quick identification of the culprit workload.

  2. 🛡️ Security Insights & Threat Hunting

    Spot malware or phishing domain lookups. Feed logs into SIEMs for automated alerting.

  3. 📋 Audit & Compliance

    Demonstrate continuous network monitoring for regulated workloads (PCI, HIPAA, etc.).

  4. 🚑 Faster Troubleshooting

    Diagnose application failures by correlating DNS resolution errors with service issues.

Quick Start: Enable Query Logging & Basic DNS Firewall via CloudFormation

This template enables comprehensive query logging and sets up a basic DNS Firewall rule group to get you started - get query logging live in minutes.

Simply paste it into your stack, provide your VPC ID, and start capturing all DNS queries to CloudWatch Logs 🛠️📝:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  Route 53 Resolver DNS Firewall with VPC‑level DNS **query logging**.
  Logs **all** DNS queries for a supplied VPC to CloudWatch Logs.

Parameters:
  VpcId:
    Type: AWS::EC2::VPC::Id
    Description: VPC ID to protect and monitor.
  AssociationPriority:
    Type: Number
    Default: 150
    Description: 'Priority (100‑9900) for the VPC firewall association; must be unique within the VPC.'
    MinValue: 100
    MaxValue: 9900
  LogRetentionDays:
    Type: Number
    Default: 30
    Description: Retention (days) for the CloudWatch Logs group.

Resources:

  QueryLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      RetentionInDays: !Ref LogRetentionDays

  BlockedDomains:
    Type: AWS::Route53Resolver::FirewallDomainList
    Properties:
      Name: blocked-domains
      Domains:
        - badexample.com
        - malware.example
        - phishing.test

  FirewallRuleGroup:
    Type: AWS::Route53Resolver::FirewallRuleGroup
    Properties:
      Name: vpc-dns-firewall-rule-group
      FirewallRules:
        - FirewallDomainListId: !Ref BlockedDomains
          Priority: 10
          Action: BLOCK
          BlockResponse: NODATA

  FirewallAssociation:
    Type: AWS::Route53Resolver::FirewallRuleGroupAssociation
    Properties:
      FirewallRuleGroupId: !Ref FirewallRuleGroup
      VpcId: !Ref VpcId
      Priority: !Ref AssociationPriority
      Name: vpc-dns-firewall-association

  DNSQueryLogConfig:
    Type: AWS::Route53Resolver::ResolverQueryLoggingConfig
    Properties:
      Name: vpc-dns-query-logs
      DestinationArn: !GetAtt QueryLogGroup.Arn

  DNSQueryLogAssociation:
    Type: AWS::Route53Resolver::ResolverQueryLoggingConfigAssociation
    Properties:
      ResolverQueryLogConfigId: !Ref DNSQueryLogConfig
      ResourceId: !Ref VpcId

  DNSQueryLogsPolicy:
    Type: AWS::Logs::ResourcePolicy
    Properties:
      PolicyName: route53resolver-query-logs
      PolicyDocument: !Sub |
        {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "Route53ResolverQueryLogsToCloudWatch",
              "Effect": "Allow",
              "Principal": { "Service": "route53resolver.amazonaws.com" },
              "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
              ],
              "Resource": "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:${QueryLogGroup}:*"
            }
          ]
        }

Outputs:
  FirewallRuleGroupId:
    Description: ID of the DNS Firewall Rule Group
    Value: !Ref FirewallRuleGroup
  QueryLogConfigId:
    Description: ID of the Resolver Query Log Config
    Value: !Ref DNSQueryLogConfig
  QueryLogGroupName:
    Description: CloudWatch Logs group for DNS query logs
    Value: !Ref QueryLogGroup

Alternatively, download it from my GitHub repository:

GitHub logo gabrielkoo / aws-route53-dns-firewall-logging-cfn

Protect any AWS VPC from malicious domains and capture every DNS query for observability—all in one drop-in stack.

AWS Route 53 DNS Firewall & VPC Query-Logging — CloudFormation

Protect any VPC from malicious domains and capture every DNS query for observability—all in one drop-in stack.

License: MIT

✨ What you get






























Resource Purpose
AWS::Route53Resolver::FirewallRuleGroup Blocks domains you supply (sample list included).
AWS::Route53Resolver::FirewallRuleGroupAssociation Attaches the rule group to your VPC with a configurable priority.
AWS::Logs::LogGroup Stores every DNS query sent from the VPC. Retention is parameterised.
AWS::Route53Resolver::ResolverQueryLoggingConfig Streams all queries to CloudWatch Logs.
AWS::Logs::ResourcePolicy Grants route53resolver.amazonaws.com permission to write to the log group.

🔒 Least-privilege‡: Nothing outside Route 53 Resolver can write to the log group.
📝 Auditable: Every query—blocked or allowed—hits CloudWatch for real-time dashboards or Athena analysis.

🚀 Quick start

aws cloudformation deploy \
  --stack-name r53-dns-firewall \
  --template-file route53-dns-firewall.yaml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides \
      VpcId=vpc-0123456789abcdef0 \
      AssociationPriority=150 \
      LogRetentionDays=30
Parameter   | Default | Description
---|---|---
VpcId   || Required – Target
Enter fullscreen mode Exit fullscreen mode

Once the template has been deployed, go to CloudWatch Logs and start inspecting the DNS queries made in your VPC!

📄 Sample Log Entry from a Blocked DNS Query

When you add DNS Firewall, blocked lookups surface with an action field. Here’s how a blocked domain appears in CloudWatch Logs / S3 log file:

{
    "version": "1.100000",
    "account_id": "452954105288",
    "region": "us-east-1",
    "vpc_id": "vpc-0b20bb8a5eb28cb99",
    "query_timestamp": "2025-04-27T04:45:23Z",
    "query_name": "badexample.com.",
    "query_type": "A",
    "query_class": "IN",
    "rcode": "NOERROR",
    "answers": [],
    "srcaddr": "10.0.101.123",
    "srcport": "46275",
    "transport": "UDP",
    "srcids": {
        "instance": "i-0cf6fbb4f8960f245"
    },
    "firewall_rule_action": "BLOCK",
    "firewall_rule_group_id": "rslvr-frg-69632fc987684fbd",
    "firewall_domain_list_id": "rslvr-fdl-de20dd498c14dce"
}

Layer on Route 53 DNS Firewall

The provided CloudFormation template establishes a basic DNS Firewall. You can further enhance this by:

  1. 🧠 Managed Threat Lists – Block known malicious domains without manual upkeep.
  2. Custom Allow/Deny Rules – Enforce corporate-approved domains.
  3. Real-Time Enforcement – Decide between BLOCK, ALERT, or TRUNCATE responses.

Apply firewall policies at the VPC level — no per-instance agents needed.


What actually happened with the metadata.google.internal DNS query

Again, my workload was a Python Flask app with Elastic APM installed, running on AWS Fargate.

elastic/apm-agent-python:elasticapm/base.py#L419-L456

def get_cloud_info(self):
    provider = str(self.config.cloud_provider).lower()

    if not provider or provider == "none" or provider == "false":
        return {}
    if provider == "aws":
        # This line was not hit as I didn't explicitly configure the `cloud_provider` option in my Elastic APM setup. 
        ...
    elif provider == "gcp":
        ...
    elif provider == "azure":
        ...
    elif provider == "auto" or provider == "true":
        data = {}
        # This line returned `None` as my workload was based on AWS Fargate, which does not support the EC2 Metadata endpoint.
        data = cloud.aws_metadata()
        if data:
            return data
        # This line caused the DNS query.
        data = cloud.gcp_metadata()
        if data:
            return data
        data = cloud.azure_metadata()
        return data
    else:
        self.logger.warning("Unknown value for CLOUD_PROVIDER, skipping cloud metadata: {}".format(provider))
        return {}

As I didn't configure the cloud provider, the script was being handy to auto detect the VM's cloud provider by trial and error, starting with AWS EC2's metadata endpoint that didn't work as my workload was running on AWS Fargate. It then attempted to detect if the workload was running on Google Cloud.

The AWS and Azure metadata endpoints were both IP based, so no relevant DNS queries were detected by the Route 53 Resolver query logging.

Another Real-World Case: Ubuntu cloud-init Tests

Another observation that I made after enabling the DNS query logging was that, on standalone cloud Ubuntu servers, cloud-init queries does-not-exist.example.com. to detect DNS interception. These benign checks contaminate DNS logs unless you filter them out—another great reason to keep query logging on.

canonical/cloud:cloudinit/util.py#L1297-L1318:

if _DNS_REDIRECT_IP is None:
    badips = set()
    badnames = (
        "does-not-exist.example.com.",
        "example.invalid.",
        "__cloud_init_expected_not_found__",
    )
    badresults: dict = {}
    for iname in badnames:
        try:
            result = socket.getaddrinfo(
                iname, None, 0, 0, socket.SOCK_STREAM, socket.AI_CANONNAME
            )
            badresults[iname] = []
            for _fam, _stype, _proto, cname, sockaddr in result:
                badresults[iname].append("%s: %s" % (cname, sockaddr[0]))
                badips.add(sockaddr[0])
        except (socket.gaierror, socket.error):
            pass
    _DNS_REDIRECT_IP = badips
    if badresults:
        LOG.debug("detected dns redirection: %s", badresults)

Another unexpected finding — but thankfully, it wasn’t an actual exploit 😅.


Best Practices

So now we have gone through on how to setup a Route 53 Resolver with query logging and a simple firewall. Here a few more points for you to take further steps:

  • 🧱 Automate with IaC: Use CloudFormation/Terraform for consistent setup across environments.
  • 📊 Centralize Logs: Stream to CloudWatch Logs or S3, and integrate with security platforms, or alternatively perform automated analysis on aggregated data, potentially using Generative AI.
  • 🧾 Tune Policies: Review and whitelist legitimate domains to avoid false positives - make the domain lists version controlled with say git.
  • 🔁 Periodic Review: Analyze logs regularly to refine firewall rules and detect new patterns.

Related

Getting more serious on DNS security after this article? Also read another article of mine:

Conclusion

Turning on Route 53 Resolver query logging can transform DNS into a powerful diagnostic and security tool 🧠🔐. Paired with DNS Firewall, you gain both visibility and control—preventing unwanted traffic, uncovering phantom queries, and strengthening your AWS network posture.

Stay curious, stay secure! 🛡️✨

/* Some third-party websites copy articles from Dev.to without permission, removing author credit and canonical links. This misleads readers and undermines the integrity of the content. If you're reading this outside of Dev.to, you may not be seeing the original version—and you’re missing the chance to support the author directly. These unauthorized reposts often translate or modify technical content, introducing inaccuracies that can be dangerous in production environments. Dev.to posts are written and maintained by real developers, while scraped versions may distort commands, code, or concepts with no accountability. Content farms typically monetize stolen work with ads, offering no community, no updates, and no interaction. In contrast, Dev.to fosters direct engagement with authors, where readers can ask questions, suggest improvements, and explore related posts—all while supporting the people who created the content. Duplicate content across the web can hurt the original author’s SEO and visibility. When a scraped version outranks the Dev.to post, it steals not just traffic but also recognition and career opportunities. If you find this article helpful, consider visiting it on Dev.to where it was first published and maintained. Scraper sites often strip out interactive elements like code snippets, embeds, and comment threads, degrading the learning experience. You’re likely viewing a broken, outdated, or incomplete version of the article. For the best experience and the latest updates, always refer to the author's original Dev.to post. */