If you've ever written a Selenium script, you know the pain:

driver.find_element(By.XPATH, "//div[@class='form']/button[2]")

It’s fragile. It’s unreadable. And when the DOM changes — it breaks.

That’s why I built talk2dom, a Python tool that lets you locate DOM elements using natural language + an LLM.


🧠 The Problem

The hard part of web automation isn’t clicking — it’s locating.

  • Buttons with no ID
  • Forms wrapped in modals
  • Dynamic pages with dozens of similar
    s

    And writing XPath by hand? That’s not automation — that’s archaeology.


    🛠️ The Tool

    talk2dom does one thing really well:

    🗣️ You describe the element.

    🤖 It gives you a selector.

    Example:

    from talk2dom import get_locator
    
    by, val = get_locator(driver, "Find the login button")
    driver.find_element(by, val).click()

    Internally, it sends the raw HTML + your instruction to an LLM (OpenAI, Groq or other providers), then returns:

    ("xpath", "//button[@type='submit']")

    You still control the browser — it just tells you where to look.


    🔓 Free model support (Groq)

    You can use open models like LLaMA-3 via Groq:

    export GROQ_API_KEY="your_key"
    by, val = get_locator(driver, "Find the search field", model="llama-3.3-70b-versatile", model_provider="groq")

    It’s fast — and free.


    📦 Install

    pip install talk2dom

    🤖 Built for LLM-native workflows

    • Structured function call output (Pydantic)
    • Works with both driver or scoped WebElement
    • No hallucinations — just selectors

    📌 TL;DR

    If you:

    • Write automation scripts
    • Build LLM-based web agents
    • Hate writing XPath

    Try talk2dom.

    You might never inspect the DOM manually again.


    Feedback and PRs welcome 🙌