If you've ever written a Selenium script, you know the pain:
driver.find_element(By.XPATH, "//div[@class='form']/button[2]")
It’s fragile. It’s unreadable. And when the DOM changes — it breaks.
That’s why I built talk2dom, a Python tool that lets you locate DOM elements using natural language + an LLM.
🧠 The Problem
The hard part of web automation isn’t clicking — it’s locating.
- Buttons with no ID
- Forms wrapped in modals
- Dynamic pages with dozens of similar s
And writing XPath by hand? That’s not automation — that’s archaeology.
🛠️ The Tool
talk2dom does one thing really well:
🗣️ You describe the element.
🤖 It gives you a selector.Example:
from talk2dom import get_locator by, val = get_locator(driver, "Find the login button") driver.find_element(by, val).click()
Internally, it sends the raw HTML + your instruction to an LLM (OpenAI, Groq or other providers), then returns:
("xpath", "//button[@type='submit']")
You still control the browser — it just tells you where to look.
🔓 Free model support (Groq)
You can use open models like LLaMA-3 via Groq:
export GROQ_API_KEY="your_key"
by, val = get_locator(driver, "Find the search field", model="llama-3.3-70b-versatile", model_provider="groq")
It’s fast — and free.
📦 Install
pip install talk2dom
🤖 Built for LLM-native workflows
- Structured function call output (Pydantic)
- Works with both
driver
or scopedWebElement
- No hallucinations — just selectors
📌 TL;DR
If you:
- Write automation scripts
- Build LLM-based web agents
- Hate writing XPath
Try talk2dom.
You might never inspect the DOM manually again.Feedback and PRs welcome 🙌