Planning a trip to a new city can be exciting, but often overwhelming. Traditional guidebooks offer static information, and online searches can drown you in generic recommendations. What if you had a personal travel assistant you could chat with, one that understands your interests, helps you build a custom itinerary, remembers your preferences, and even provides directions?
That's exactly the idea behind the Sightseeing Agent I developed for my Capstone project. The goal was to move beyond static travel planning and create a dynamic, personalized experience using the power of Generative AI.
The Problem: Static Plans vs. Dynamic Exploration
Standard travel planning often involves:
- Generic Information: Guidebooks or websites list popular spots but don't tailor suggestions to your specific interests (e.g., "I love baroque architecture" or "I need wheelchair-accessible options").
- Manual Itinerary Building: Juggling maps, opening hours, and personal preferences across different sources is tedious.
- Lack of Adaptability: Plans are often rigid. What if a place is unexpectedly closed, or you suddenly feel like visiting a park instead of a museum? Static plans don't adapt easily.
- No Memory: Standard tools don't remember your preferences or the conversation context from one planning session (or even one query) to the next.
The Solution: A Conversational AI Agent
I aimed to solve these problems by building an AI agent that acts as an interactive sightseeing planner, specifically for Erlangen, Germany (though the concept is adaptable). Users can:
- Chat Naturally: Ask questions about available attractions, history, or interests in plain English.
- Get Personalized Suggestions: The agent leverages Google's Gemini LLM to understand requests and can use tools to fetch relevant, up-to-date information.
- Build an Itinerary: Users can ask the agent to add specific places to their plan.
- Specify Preferences: Tell the agent about interests (history, parks, etc.) or needs (accessibility), which it remembers for future suggestions.
- Review and Finalize: The agent helps confirm the plan before finishing.
- Get Directions: Once the plan is finalized, the agent uses the Google Maps Directions API to provide walking directions between the chosen locations.
How Gen AI Makes It Happen: The Tech Stack
The agent is built using Python and relies on a few key technologies:
- Google Gemini: The core Large Language Model (LLM) provides the natural language understanding and generation capabilities, allowing for fluid conversation. It also acts as the "brain" deciding when to call external tools.
- LangGraph: This library from LangChain is crucial for building reliable, stateful AI applications. It allows us to define the agent's workflow as a graph, explicitly managing the flow of information and actions between different steps (nodes).
- LangChain: Provides the foundational components, including tool definitions, message types, and LLM integrations.
-
Google Maps APIs (Places & Directions): External tools connected to the agent provide real-world data:
- Places API: Fetches details about tourist attractions (summary, opening hours, accessibility hints).
- Directions API: Calculates routes between locations in the finalized itinerary.
Implementation Highlights: State, Tools, and Graphs
Let's look at a few code concepts that make this work:
1. Managing State with TypedDict and MemorySaver:
The agent needs to remember things. LangGraph manages this through a state object. We define its structure using Python's TypedDict and use LangGraph's built-in message handling and a checkpointer for persistence.
from typing import Annotated, List, Any, Dict
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage
from langgraph.checkpoint.memory import MemorySaver # For persistence
class TravelPlanState(TypedDict):
"""State representing the user's sightseeing plan conversation."""
# Conversation history (appended automatically)
messages: Annotated[list[BaseMessage], add_messages]
# User's plan details
itinerary: list[str]
preferences: Dict[str, Any]
# Flag to control workflow end
finished: bool
# Checkpointer setup (done during graph compilation)
# memory = MemorySaver()
# graph = graph_builder.compile(checkpointer=memory)
The MemorySaver allows the agent to pick up a conversation where it left off using a unique thread_id, remembering the itinerary, preferences, and messages.
2. Defining Tools with @tool:
We give the LLM capabilities by defining Python functions as tools. The @tool decorator helps LangChain understand the function's purpose (from its docstring) and arguments.
from langchain_core.tools import tool
# Example Tool: Getting available places (stateless info retrieval)
@tool
def get_available_places() -> str:
"""
Retrieves a list of tourist attractions
in Erlangen from Google Maps Places API...
Returns the list as a JSON string.
"""
# ... (Code using googlemaps client to call Places API) ...
# Returns JSON string pass
# Implementation omitted for brevity
# Example Tool: Adding to itinerary (modifies state)
@tool
def add_place_to_itinerary(place: str) -> str:
"""Adds the specified place to the user's itinerary."""
# NOTE: The actual state update happens in 'plan_node',
# this definition is just for the LLM to know the tool exists.
return f"Placeholder: Added {place}"
3. Orchestrating with LangGraph Nodes:
The workflow is a graph where nodes perform actions.
- The agent node calls the LLM (llm_with_tools, which knows about the defined tools).
- A route_agent node looks at the agent's output. If the agent decided to call get_available_places, the router directs flow to a built-in ToolNode. If it called add_place_to_itinerary, it routes to our custom plan_node. If the plan is finished, it routes to get_directions_node or END. Otherwise, it waits for user_input.
- The custom plan_node safely executes the state-changing logic (like appending to the itinerary list) based on the tool call requested by the agent.
- The get_directions_node runs only after finalization to call the Directions API.
# Simplified conceptual flow in the graph builder:
graph_builder = StateGraph(TravelPlanState)
# Add nodes for agent, user input, executing tools, updating plan, getting directions
graph_builder.add_node("agent", sightseeing_agent_with_tools)
graph_builder.add_node("user_input", user_input_node)
graph_builder.add_node("tools", ToolNode(info_retrieval_tools)) # Executes get_available_places
graph_builder.add_node("plan", plan_node) # Executes add_place_to_itinerary etc.
graph_builder.add_node("directions", get_directions_node) # Executes Directions API call
# Define entry point and edges (some conditional based on router output)
graph_builder.add_edge(START, "agent")
graph_builder.add_conditional_edges("agent", route_agent, {...}) # Routes to tools, plan, user_input, directions, or END
graph_builder.add_conditional_edges("user_input", should_exit, {...}) # Routes to agent or END
graph_builder.add_edge("tools", "agent") # Return to agent after tool execution
graph_builder.add_edge("plan", "agent") # Return to agent after plan update
graph_builder.add_edge("directions", END) # End after showing directions
This structured approach ensures reliability and makes the agent's behaviour predictable and debuggable.
Limitations and Challenges
While powerful, this approach isn't without limitations:
- LLM Imperfections: Gemini, like any LLM, might occasionally misunderstand a nuanced request, misinterpret context, or fail to call the correct tool. More sophisticated prompting or fine-tuning could help but adds complexity.
- API Reliance & Costs: The agent heavily relies on external APIs (Gemini, Maps). This incurs costs and is dependent on API availability and quotas. Error handling for API failures is essential.
- Context Window: While we use windowed memory, very long conversations might still lose early context not captured in the preferences or itinerary.
- Non-Interactive Finalization: The current simulation uses input() within the finalize_plan logic inside plan_node. This works interactively but fails in the non-interactive test setup. A real deployment would need a proper UI callback mechanism.
- No Visuals: The agent currently only describes places and directions textually; it doesn't display maps or images directly in the chat.
Future Possibilities: The Art of the Possible
This Capstone project lays a foundation. Exciting future enhancements could include:
- Multimodality: Allow users to upload images of places they like or receive map snapshots/photos directly in the chat.
- Booking Integration: Connect to hotel or event booking APIs.
- Proactive Suggestions: Have the agent suggest activities based on the time of day, weather (via another tool!), or user location.
- Deeper Preference Learning: Store and analyze user preferences over multiple trips.
- Voice Interaction: Enable users to talk to the agent instead of typing.
- Enhanced Error Handling: Implement more robust loops for clarifying ambiguous user requests or handling tool errors gracefully.
Conclusion
Building this Sightseeing Agent demonstrated the incredible potential of combining large language models like Gemini with structured workflow tools like LangGraph and real-world data from external APIs. By managing state effectively and defining clear operational steps, we can create AI applications that move beyond simple Q&A to become genuinely helpful, personalized, and dynamic assistants for complex tasks like travel planning. While challenges remain, the possibilities for more intuitive and intelligent interactions are vast.