Browser Automation as a Service

This technical blog post outlines the implementation of a Model Context Protocol (MCP) server for Amazon Nova Act on Amazon ECS Fargate platform as a container, providing browser automation capabilities as a service.

Introduction

Amazon Nova Act is an early research preview AI model and SDK designed to enable developers to build reliable web agents that can perform actions within a web browser. By integrating Nova Act with the Model Context Protocol (MCP), its browser automation capabilities can be standardized and exposed to diverse clients (e.g., AI assistants, web UIs, IDE extensions) through multiple communication channels:

  • Standard I/O (stdio) for local scripting and CLI tools like Amazon Q Developer CLI & Cline VS Code Extension (open-source)
  • Server-Sent Events (SSE) via HTTP connection for real-time, bidirectional communication
  • Streaming HTTP transport (this was recently introduced in the latest MCP specification) for scalable web-based interactions

This implementation deploys Nova Act as a containerized service on Amazon ECS fronted by ALB and uses Server-Sent Events (SSE) via HTTP connection to enable real-time communication.

Note: Amazon Nova Act is only available in the US to request for an API Key, and hence the solution is deployed on AWS in us-east-1 region to ensure compliance.

Image description

Architecture Overview

The solution consists of the following components:

  1. MCP Server: A FastAPI application that implements the Model Context Protocol and exposes Nova Act functionality by using FastAPI-MCP
  2. Streamlit Client: A web UI and MCP Client for interacting with the MCP server
  3. AWS Infrastructure: ECS Fargate tasks, ALB, and supporting resources

Nova Act MCP Architecture

Deployment Guide

Prerequisites

Before deploying the Nova Act MCP server on ECS, ensure you have:

  1. AWS CLI configured with appropriate credentials
  2. Node.js and npm installed
  3. AWS CDK installed (npm install -g aws-cdk)
  4. Docker installed and running
  5. A valid Nova Act API key

Step 1: Clone the Repository

git clone https://github.com/awsdataarchitect/nova-act-ecs.git
cd nova-act-ecs

Step 2: Export your API Key as environment variable

export NOVA_ACT_API_KEY = your-api-key-here

Step 3: Deploy the CDK Stack

npm install
cdk bootstrap
cdk deploy

Step 4: Access the Application

After deployment completes, the CDK will output the ALB DNS name. You can access:

  • MCP Server: http:///mcp
  • Streamlit UI: http://:8501

MCP Server Implementation

The server implements the MCP protocol using fastapi_mcp with the following components:

  1. MCP Endpoints (automatically handled by fastapi_mcp):

    • /mcp - SSE endpoint for event streaming
    • /mcp/schema - Schema endpoint for method discovery
    • /mcp/jsonrpc - JSON-RPC endpoint for method calls
  2. Core API Endpoint:

    • /browse - Combined endpoint for all browser automation tasks
  3. Additional Endpoints:

    • /health - Used by ALB for health checks
    • /logs - Endpoint to retrieve recent server logs

The "browse" Method

The implementation uses a simplified approach with a single "browse" method that combines browser control and instruction execution:

@app.post("/browse", operation_id="browse")
async def browse(request: BrowseRequest) -> BrowseResponse:
    """
    Execute a browsing task with Nova Act.
    This method handles browser initialization, navigation, and instruction execution.
    """
    # Implementation details excluded here for brevity...

Request Schema

The browse method accepts a flexible request schema that can handle various browsing scenarios:

class BrowseRequest(BaseModel):
    starting_url: str
    instructions: List[str] = Field(..., description="List of instructions to execute sequentially")
    max_steps_per_instruction: int = 30
    timeout_per_instruction: Optional[int] = None
    schema: Optional[Dict[str, Any]] = None
    headless: bool = True

Response Schema

The response includes detailed information about the browsing session:

class BrowseResponse(BaseModel):
    status: str
    results: List[Dict[str, Any]]
    errors: List[Dict[str, Any]] = []

Server Features

  • Single Global Browser Instance: The server maintains a single global Nova Act instance
  • Headless Mode: Browser always runs in headless mode for ECS compatibility
  • API Key Management: Retrieves API key from environment variables or AWS Secrets Manager
  • Structured Data Extraction: Supports schema-based data extraction
  • Error Handling: Comprehensive error handling and logging
  • Thread Pool Execution: Runs synchronous Nova Act code in a thread pool to avoid asyncio conflicts
  • Resource Monitoring: Monitors system resources (CPU, memory) for debugging
  • Log Buffering: Maintains a circular buffer of recent logs for client display
  • Console Output Capture: Captures all stdout/stderr output including Nova Act's thinking steps

Key Implementation Details

The server uses a thread pool to run synchronous Nova Act code without blocking the FastAPI event loop:

# Execute the browse sequence in a thread pool
        logger.info("Running browse sequence in thread pool")
        browse_result = await asyncio.get_event_loop().run_in_executor(
            thread_pool, run_browse_sequence
        )

The server also implements a log capture mechanism to provide real-time logs to clients, including stdout/stderr interception to capture Nova Act's thinking process:

# Log buffer implementation
class LogBuffer:
    def __init__(self, max_size=1000):
        self.logs = collections.deque(maxlen=max_size)
        self.lock = threading.Lock()

    def add(self, log_entry):
        with self.lock:
            self.logs.append(log_entry)

    def get_logs(self, limit=100):
        with self.lock:
            return list(self.logs)[-limit:]

# Custom stdout/stderr interceptor to capture Nova Act outputs
class OutputInterceptor(StringIO):
    def __init__(self, log_buffer, stream_name, original_stream):
        super().__init__()
        self.log_buffer = log_buffer
        self.stream_name = stream_name
        self.original_stream = original_stream

    def write(self, text):
        # Write to the original stream
        self.original_stream.write(text)

        # Add to log buffer if not empty
        if text.strip():
            self.log_buffer.add(text.rstrip())

    def flush(self):
        self.original_stream.flush()

# Log endpoint
@app.get("/logs")
async def get_logs(limit: int = 100):
    return {"logs": log_buffer.get_logs(limit)}

For the full server implementation, see the GitHub repository.

MCP Client Implementation

The client implementation provides a Python interface to the Nova Act MCP server. I've implemented a synchronous (requests) version as its more stable in the Streamlit environment.

Client Features

  1. Connection Management:

    • Connects to the server's health endpoint to verify availability
    • Manages an HTTP session for all requests
    • Handles connection errors gracefully
  2. API Method:

    • browse(starting_url, instructions, max_steps_per_instruction, timeout_per_instruction, schema, headless) - Execute a browsing task
  3. Error Handling:

    • Proper error propagation
    • Detailed error messages
    • Connection retry logic
  4. Log Retrieval:

    • get_logs(limit) - Retrieve recent server logs

Synchronous Client Implementation

import requests
import logging
from typing import Optional, Dict, Any, List, Union

class MCPClient:
    def __init__(self, base_url: str):
        self.base_url = base_url.rstrip('/')
        self.session = requests.Session()
        self.connected = False
        self._current_url = None

    def connect(self) -> bool:
        """Initialize connection to MCP server"""
        try:
            response = self.session.get(f"{self.base_url}/health")
            if response.status_code == 200:
                self.connected = True
                logger.info("Connected to MCP server")
                return True
            return False
        except Exception as e:
            logger.error(f"Connection error: {str(e)}")
            return False

    def browse(self, starting_url: str, instructions: Union[str, List[str]], 
              max_steps_per_instruction: int = 30, 
              timeout_per_instruction: Optional[int] = None,
              schema: Optional[Dict[str, Any]] = None,
              headless: bool = True) -> Dict[str, Any]:
        """Execute a sequence of instructions in a single browser session."""
        # Convert single instruction to list
        if isinstance(instructions, str):
            instructions = [instructions]

        if not self.connected:
            self.connect()

        try:
            data = {
                "starting_url": starting_url,
                "instructions": instructions,
                "max_steps_per_instruction": max_steps_per_instruction,
                "headless": headless
            }

            if timeout_per_instruction:
                data["timeout_per_instruction"] = timeout_per_instruction
            if schema:
                data["schema"] = schema

            logger.info(f"Sending browse request with {len(instructions)} instructions to {starting_url}")
            response = self.session.post(
                f"{self.base_url}/browse",
                json=data
            )

            if response.status_code != 200:
                raise Exception(f"Server returned {response.status_code}: {response.text}")

            result = response.json()
            logger.info(f"Browse request completed with status: {result.get('status')}")

            # Update current URL
            self._current_url = starting_url
            return result
        except Exception as e:
            raise Exception(f"Error in browse operation: {str(e)}")

    def get_logs(self, limit: int = 100) -> List[str]:
        """Get recent logs from the server"""
        if not self.connected:
            self.connect()

        try:
            response = self.session.get(
                f"{self.base_url}/logs?limit={limit}"
            )

            if response.status_code != 200:
                raise Exception(f"Server returned {response.status_code}: {response.text}")

            result = response.json()
            return result.get("logs", [])
        except Exception as e:
            logger.error(f"Error getting logs: {str(e)}")
            return []

For the full client implementation, see the GitHub repository.

Streamlit UI Implementation

The Streamlit UI provides a user-friendly interface to the Nova Act MCP server:

UI Features

  • Single Form Interface: Combines URL and instruction inputs in one form
  • Schema Builder: UI for creating extraction schemas (Boolean, Text, Product Info, List Items, Custom)
  • Execution Options: Configure max steps and timeout
  • Result Display: Formatted display of execution results and parsed responses
  • History Tracking: Maintains a record of previous operations and results
  • Live Logs Display: Shows real-time server logs in a scrollable window
  • Amazon-Specific Examples: Pre-configured examples for common Amazon shopping tasks

Live Logs Display

A key feature of the UI is the live logs display, which shows the server's output in real-time, including Nova Act's thinking process:

# In the Streamlit UI
with st.expander("Server Logs", expanded=True):
    # Add a refresh button and auto-refresh toggle
    col1, col2 = st.columns([1, 5])
    with col1:
        if st.button("🔄 Refresh Logs"):
            try:
                st.session_state.logs = st.session_state.client.get_logs(limit=100)
                st.session_state.last_log_update = time.time()
            except Exception as e:
                st.error(f"Failed to fetch logs: {str(e)}")

    with col2:
        st.session_state.auto_update_logs = st.checkbox("Auto-refresh logs", value=st.session_state.auto_update_logs)

    # Display logs in a scrollable area
    if st.session_state.auto_update_logs:
        update_logs()

    if st.session_state.logs:
        st.code("\n".join(st.session_state.logs), language="bash")
    else:
        st.info("No logs available. Run a task or click refresh to see logs.")

This feature helps users understand what's happening behind the scenes, especially during long-running operations:

  1. Debugging: See exactly what Nova Act is doing in real-time
  2. Progress Tracking: Monitor the progress of browser automation tasks
  3. Error Identification: Quickly identify issues when operations fail
  4. Transparency: Understand the steps Nova Act takes to complete instructions

UI Screenshot

Streamlit UI

Image description

For the full UI implementation, see the GitHub repository.

AWS CDK Infrastructure

The AWS CDK infrastructure code creates all the necessary resources for deploying the Nova Act MCP server and client on Amazon ECS.

Infrastructure Components

  1. VPC and Networking:

    • VPC with public subnets
    • Internet Gateway
    • Security Groups for ECS tasks and ALB
  2. ECS Resources:

    • ECS Cluster
    • Fargate Task Definitions for server and client
    • ECS Services for server and client
  3. Load Balancing:

    • Application Load Balancer (ALB) with increased idle timeout (300 seconds)
    • Target Groups for server and client with increased deregistration delay
    • Listener Rules for routing traffic
  4. Storage and Secrets:

    • ECR Repository for Docker images
    • Secrets Manager for API key storage

ALB Configuration for Long-Running Operations

To support long-running operations and SSE connections, the ALB is configured with extended timeouts:

// In your CDK stack
const alb = new elbv2.ApplicationLoadBalancer(this, 'NovaActALB', {
  vpc,
  internetFacing: true,
  loadBalancerName: 'nova-act-alb',
  idleTimeout: cdk.Duration.seconds(300) // Increased from default 60 seconds
});

// Target group with increased deregistration delay
const serverTargetGroup = new elbv2.ApplicationTargetGroup(this, 'ServerTargetGroup', {
  vpc,
  port: 8080,
  protocol: elbv2.ApplicationProtocol.HTTP,
  targetType: elbv2.TargetType.IP,
  targets: [serverService],
  deregistrationDelay: cdk.Duration.seconds(120), // Increased deregistration delay
  healthCheck: {
    path: '/health',
    interval: cdk.Duration.seconds(60),
    timeout: cdk.Duration.seconds(5),
    healthyHttpCodes: '200',
  }
});

For the full infrastructure implementation, see the GitHub repository.

Integration with VS Code Cline Extension

The Nova Act MCP server can also be used with the VS Code Cline extension:

  1. Install the Cline extension from the VS Code marketplace
  2. Configure the extension to use your MCP server:
"nova-act": {
  "autoApprove": [],
  "disabled": false,
  "timeout": 300,
  "url": "http:///mcp",
  "transportType": "sse"
}

This configuration enables the browse MCP tool in Cline, allowing you to execute browser automation tasks directly from your IDE as shown below:

Image description

Troubleshooting

Connection Issues

  1. ALB Connection Timeouts:

    • Check ALB security group allows inbound traffic on port 80
    • Verify that the ALB health checks are passing
    • Ensure the ECS tasks are running and registered with the target group
    • Important: For SSE (Server-Sent Events) connections, increase the ALB idle timeout (default is 60 seconds)
    • Configure the ALB target group's deregistration delay to match long-running operations
  2. API Key Issues:

    • Verify the API key is correctly stored in Secrets Manager
    • Check server logs for API key retrieval errors
    • Ensure the ECS task role has permission to read the secret
  3. Browser Control Issues:

    • Check server logs for browser initialization errors
    • Verify that Playwright is correctly installed in the container
    • Ensure the ECS task has sufficient memory and CPU resources

Handling 504 Gateway Timeout Errors

The Nova Act MCP server uses Server-Sent Events (SSE) for real-time communication, which can lead to 504 Gateway Timeout errors from the ALB if operations take longer than the default timeout settings. To address this:

Increase ALB Idle Timeout:

// In your CDK stack
   const alb = new elbv2.ApplicationLoadBalancer(this, 'NovaActAlb', {
     vpc,
     internetFacing: true,
     securityGroup: albSecurityGroup,
     idleTimeout: cdk.Duration.seconds(300) // Increase from default 60 seconds
   });

Implement Client-Side Retry Logic:

def browse_with_retry(self, url, instruction, max_retries=3):
       for attempt in range(max_retries):
           try:
               return self.browse(url, instruction)
           except Exception as e:
               if "504 Gateway Time-out" in str(e) and attempt < max_retries - 1:
                   time.sleep(2)  # Wait before retrying
                   continue
               raise

Use Synchronous Client in Streamlit:

If you're experiencing "Timeout context manager should be used inside a task" errors in Streamlit, switch to the synchronous client implementation using the requests library instead of aiohttp (I was using aiohttp earlier, but is not used in this current version).

Cleanup

To delete all AWS resources provisioned via CDK, run:

cdk destroy

This ensures that no unnecessary infrastructure is left running and helps avoid additional costs.

Conclusion

This implementation of a Nova Act MCP server on Amazon ECS provides a scalable, containerized solution for browser automation. By leveraging the Model Context Protocol and a simplified API with a single "browse" method, it enables seamless integration with various clients, including AI assistants, Streamlit UIs, VS Code extensions (Cline, Cursor), and hopefully soon with the Amazon Q Developer CLI. Currently at the time of writing, Amazon Q Developer CLI only supports MCP tools that are hosted locally via stdio protocol and does not support MCP tools via sse over http protocol.

Although I haven't tried the other popular MCP supported clients like Claude Code, Claude for Desktop, LibreChat, LangChain (MCP adapter module), and JetBrains with Cascade/Windsurf, I invite you to test the Nova Act MCP server using these MCP clients and let me know your feedback.

Key benefits of this architecture include:

  1. Simplicity: A single method handles all browser automation tasks
  2. Scalability: ECS Fargate services can scale based on demand
  3. Security: API keys are securely stored in AWS Secrets Manager
  4. Reliability: Health checks and auto-healing ensure high availability
  5. Monitoring: CloudWatch integration provides visibility into performance
  6. Transparency: Real-time logs show Nova Act's thinking process
  7. Integration: Works with VS Code Cline extension

This solution can be extended to support multiple concurrent browser sessions, additional automation capabilities, and integration with other AWS services like Step Functions for complex workflows.

References

  1. Model Context Protocol (MCP) Specification
  2. Amazon Nova Act
  3. FastAPI Documentation
  4. FastAPI-MCP Documentation
  5. AWS CDK Documentation
  6. Amazon ECS Documentation
  7. Streamlit Documentation
  8. Playwright Documentation
  9. Amazon Q Developer Documentation
  10. Extend the Amazon Q Developer CLI with MCP