Introduction

When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the first critical tasks is finding all the URLs on a website.

While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there's a faster, modern way: using Olostep Maps API.

In this guide, we'll:

  • Introduce the challenge of URL discovery
  • Show how to build a live Streamlit app to scrape all URLs
  • Compare it with traditional techniques (like sitemap.xml and robots.txt)
  • Provide complete runnable Python code

Target Audience: Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.

Why Extract All URLs?

Finding every page on a website can help you:

  • Analyze site structure (for SEO)
  • Scrape website content efficiently
  • Find hidden gems like orphan pages
  • Monitor website changes
  • Prepare data for AI agents and automation

Traditional Methods (Before Olostep)

1. Sitemaps (XML Files)

Webmasters often create XML sitemaps to help Google index their sites. Here's an example:

https://example.com
  
  
    https://example.com/about

To find sitemaps:

Other possible sitemap locations:

  • /sitemap.xml.gz
  • /sitemap_index.xml
  • /sitemap.php

You can also Google:

site:example.com filetype:xml

Problems:

  • Some websites don't maintain updated sitemaps.
  • Not all pages may be listed.
  • Dynamic websites (heavy JavaScript) often leave out many pages.

2. Robots.txt

Example:

User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin

Good for finding disallowed URLs and sitemap links, but again not comprehensive.

The Modern Solution: Olostep Maps API

✅ Find up to 100,000 URLs in seconds.

✅ No need to manually find sitemap or robots.txt.

✅ Simple API call.

✅ No server maintenance or IP bans.

👉 Full code Gist

Let's build a full Streamlit app to demo this!

🛠️ Full Project: Website URL Extractor with Olostep Maps API + Streamlit

1. Install Requirements

pip install streamlit requests

2. Python Code

import streamlit as st
import requests
import json

def fetch_urls(target_url, api_key):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {"url": target_url}
    response = requests.post("https://api.olostep.com/v1/maps", headers=headers, json=payload)
    if response.status_code == 200:
        return response.json()
    else:
        st.error(f"Failed to fetch URLs. Status code: {response.status_code}")
        return None

st.title("🔎 Website URL Scraper")

st.markdown("Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!")

api_key = st.text_input("Enter your Olostep API Key", type="password")
url_to_scrape = st.text_input("Enter Website URL (e.g., https://example.com)")

if st.button("Find URLs"):
    if api_key and url_to_scrape:
        with st.spinner("Fetching URLs..."):
            data = fetch_urls(url_to_scrape, api_key)
        if data:
            urls = data.get("urls", [])
            st.success(f"✅ Found {len(urls)} URLs!")
            for idx, u in enumerate(urls, start=1):
                st.markdown(f"{idx}. [{u}]({u})")

            st.download_button(
                "📄 Download URLs as Text File",
                data="\n".join(urls),
                file_name="discovered_urls.txt",
                mime="text/plain"
            )

📸 Example Output

✅ Found 35 URLs from https://docs.olostep.com

📥 Saved as discovered_urls.txt

⚡ Why Olostep Maps API Beats Traditional Methods

Feature Sitemap/Robots.txt SEO Spider Olostep Maps
Instant Response
Handles JS-heavy Sites ⚠️ (Partial)
Handles Big Sites ❌ (Limit)
No Setup Needed
Easy Pagination

📈 Conclusion

Using Olostep Maps API + a few lines of Streamlit code, you can build powerful website discovery tools in minutes.

No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.

✅ Super fast

✅ Reliable

✅ Perfect for Growth Engineering, SEO, Scraping, and Automation.

🚀 Ready to try?

Register at 👉 Olostep.com and start building your own data pipelines today!


Written by:

Mohammad Ehsan Ansari

Growth Engineer @ Olostep

Happy scraping! 🚀