Overview
Nova Act is an experimental SDK from Amazon that enables developers to build browser automation agents. It combines natural language instructions with direct browser manipulation capabilities.
Key Features
-
Hybrid Automation Approach:
- Combines natural language instructions (
act()
method) with direct Playwright browser control - Allows breaking complex workflows into smaller, more reliable steps
- Combines natural language instructions (
-
Information Extraction:
- Supports structured data extraction using Pydantic models
- Includes convenience schemas like
BOOL_SCHEMA
for simple yes/no questions
-
Parallel Execution:
- Enables running multiple browser sessions concurrently using ThreadPoolExecutor
- Useful for tasks like scraping multiple pages simultaneously
-
Authentication Handling:
- Supports persistent browser state through Chrome user data directories
- Allows pre-authenticated sessions
-
Sensitive Data Handling:
- Recommends using Playwright APIs directly for password entry
- Provides warnings about screenshot collection of sensitive information
Technical Implementation
- Built on Playwright for browser automation
- Requires Python 3.10+
- Supports MacOS and Ubuntu
- Uses temporary directories for isolated browser sessions by default
Best Practices
-
Prompt Design:
- Be prescriptive and specific in instructions
- Break large tasks into smaller steps
- Avoid high-level, vague prompts
-
Error Handling:
- Check
matches_schema
when using structured responses - Handle potential schema mismatches gracefully
- Check
-
Performance:
- First run requires Playwright browser installation (1-2 minutes)
- Subsequent runs start quickly
Limitations
- Currently doesn't support IPython
- Cannot interact with non-browser applications
- Struggles with hidden elements (mouseover menus)
- Doesn't handle browser window dialogs
- Early research preview with many expected limitations
Use Cases
The SDK demonstrates several practical applications:
-
E-commerce workflows:
- Product search and cart operations
- Order history management
-
Data Collection:
- Scraping structured data from websites
- Aggregating information from multiple sources
-
Task Automation:
- Food ordering
- Travel booking
- Research tasks
Security Considerations
- API keys must be protected
- Sensitive data should be entered via Playwright, not natural language prompts
- Screenshots may capture sensitive information visible in the browser
- Includes Acceptable Use Policy requirements
Comparison to Alternatives
Compared to other automation tools:
- More structured than pure Playwright scripting
- More controllable than end-to-end LLM automation
- Combines benefits of programmatic control with natural language flexibility
Getting Started
Basic requirements:
- Python 3.10+
- MacOS or Ubuntu
- API key from nova.amazon.com/act
Installation:
pip install nova-act
The SDK particularly suited for developers who need to automate web-based workflows while maintaining control over the automation process. Its hybrid approach offers a balance between flexibility and reliability that could make it useful for prototyping and certain production use cases.
Would you like me to help you using the Nova Act SDK?