No more writing brittle XPath. No more guessing CSS selectors.
With talk2dom, you describe what you want to find — it returns the right locator.
🧠 What is talk2dom?
talk2dom is a Python tool that lets you locate any DOM element using natural language + an LLM.
from talk2dom import get_locator
by, value = get_locator(driver, "Find the login button")
Internally it sends the page HTML and your instruction to an LLM (e.g., GPT-4o or LLaMA-3), and returns:
("xpath", "//div[@class='header']//button[1]")
You then use it to perform your next action.
💡 Why is this useful?
Because in test automation and AI agent flows, the hard part is not clicking — it’s locating the right thing to click.
- Pages are complex
- IDs and classes often change
- Writing XPath is error-prone
talk2dom
solves that by letting you focus on intent, not HTML structure.
✨ New: Automatic Element Highlighting
We recently added built-in highlighting to make debugging and demos easier.
from talk2dom import get_element
elem = get_element(driver, "Click the confirm button")
elem.click()
What it does:
- Adds a red solid outline and light background to the element
- Keeps it visible for
duration
seconds - Resets the style after — no manual work needed
This helps you:
- Visually confirm what the LLM selected
- Record videos with agent feedback
- Build no-code workflows with visual trace
📦 Install
pip install talk2dom
Works with OpenAI, Groq, or any LangChain-compatible model.
🧪 Sample Script
from selenium import webdriver
from talk2dom import get_element
driver = webdriver.Chrome()
driver.get("http://example.com")
el = get_element(driver, "Click the search icon")
el.click()
You can also pass a WebElement
to scope the HTML and save tokens:
sidebar = driver.find_element(By.ID, "sidebar")
el = get_element(driver, "Find the settings button", element=sidebar)
🧠 Philosophy
We don’t control your browser.
We help you find what to control.
GitHub → https://talk2dom.itbanque.com/
PyPI → https://pypi.org/project/talk2dom/
Feedback, stars, contributions — all welcome ✨