“In the beginning there was the word... and the word was scraped.”


🎯 What is ScraperAgent?

ScraperAgent is a Swarm-native scraper

no Docker, no headless Chrome, no external queues.

Just:

  • 🧠 Drop a .json job into payload/
  • 🕷️ Agent fetches + parses the target
  • 📜 Logs result to Mailman in seconds

✅ What It Extracts

From any URL, it pulls:

  • ✅ Meta description
  • ✅ First 10 links
  • ✅ Logs it all into /comm/mailman-1/payload/ as a .json file

Example log:

{
  "target": "https://matrixswarm.com",
  "title": "MatrixSwarm - Autonomous Agent OS",
  "description": "A self-healing AI swarm framework.",
  "link_count": 27,
  "links": [
    "https://matrixswarm.com/about",
    "https://matrixswarm.com/docs"
  ]
}
💾 How It Works
You drop this in:

json
Copy
Edit
{
  "target_url": "https://matrixswarm.com",
  "mode": "summary"
}
ScraperAgent sees the .json, processes the target, and drops a parsed report into Mailman.

⚙️ Spawn Directive
json
Copy
Edit
{
  "permanent_id": "scraper-1",
  "name": "scraper",
  "filesystem": {
    "folders": [
      { "name": "payload", "type": "d" }
    ]
  },
  "config": {
    "report_to": "mailman-1"
  }
}

🧠 Why It Changes Everything

Traditional Stack MatrixSwarm
Puppeteer ❌
Chrome Headless ❌
Containerized Scraper API ❌
200MB image + queue ❌
Swarm Alternative:
💡 scraper_agent.py + payload/ folder + mailman-1

No containers.
No browser.
No mess.

🤯 Bonus Potential
✨ Feed results into OracleAgent for summary

🔁 Schedule crawl via Commander

💣 Pair with ReaperAgent to wipe stale data after analysis

💬 The Takeaway
This isn't scraping.

This is Swarm Recon — deployed by Matrix, logged by Mailman, interpreted by Oracle, and remembered forever.

🔗 Get Started
GitHub: https://github.com/matrixswarm/matrixswarm

Website: https://matrixswarm.com

YouTube: MatrixSwarm OS – Spawn, Kill, Resurrect

X/Twitter: @matrixswarm

📜 Fork It Clause
MatrixSwarm is open.
Fork it.
Or Fork U.
(The swarm is open. Bring tools or get logged.)