“In the beginning there was the word... and the word was scraped.”
🎯 What is ScraperAgent?
ScraperAgent is a Swarm-native scraper —
no Docker, no headless Chrome, no external queues.
Just:
- 🧠 Drop a
.json
job intopayload/
- 🕷️ Agent fetches + parses the target
- 📜 Logs result to Mailman in seconds
✅ What It Extracts
From any URL, it pulls:
Example log:
{
"target": "https://matrixswarm.com",
"title": "MatrixSwarm - Autonomous Agent OS",
"description": "A self-healing AI swarm framework.",
"link_count": 27,
"links": [
"https://matrixswarm.com/about",
"https://matrixswarm.com/docs"
]
}
💾 How It Works
You drop this in:
json
Copy
Edit
{
"target_url": "https://matrixswarm.com",
"mode": "summary"
}
ScraperAgent sees the .json, processes the target, and drops a parsed report into Mailman.
⚙️ Spawn Directive
json
Copy
Edit
{
"permanent_id": "scraper-1",
"name": "scraper",
"filesystem": {
"folders": [
{ "name": "payload", "type": "d" }
]
},
"config": {
"report_to": "mailman-1"
}
}
🧠 Why It Changes Everything
Traditional Stack MatrixSwarm
Puppeteer ❌
Chrome Headless ❌
Containerized Scraper API ❌
200MB image + queue ❌
Swarm Alternative:
💡 scraper_agent.py + payload/ folder + mailman-1
No containers.
No browser.
No mess.
🤯 Bonus Potential
✨ Feed results into OracleAgent for summary
🔁 Schedule crawl via Commander
💣 Pair with ReaperAgent to wipe stale data after analysis
💬 The Takeaway
This isn't scraping.
This is Swarm Recon — deployed by Matrix, logged by Mailman, interpreted by Oracle, and remembered forever.
🔗 Get Started
GitHub: https://github.com/matrixswarm/matrixswarm
Website: https://matrixswarm.com
YouTube: MatrixSwarm OS – Spawn, Kill, Resurrect
X/Twitter: @matrixswarm
📜 Fork It Clause
MatrixSwarm is open.
Fork it.
Or Fork U.
(The swarm is open. Bring tools or get logged.)