AI for ESG Reporting Using Real-Time RAG and Live Data Streams

Why Real-Time ESG Reporting Needs a Shake-Up
In the world of finance and sustainability, Environmental, Social, and Governance (ESG) data is gold. But traditional ESG reporting is slow, static, and backward-looking.
That’s a problem.
Modern asset managers, analysts, and compliance teams need real-time insights. They can’t wait for quarterly updates or laggy data refreshes.
Enter AI + Real-Time Data Pipelines.
Imagine an AI system that not only fetches the most relevant ESG info right now, but also explains it to you in natural language. That’s what our hackathon team set out to build.
What We Built: A Real-Time ESG RAG Application
At the Generative AI Hackathon hosted by IIT Jammu and Pathway, our goal was clear:
Build a real-time Retrieval-Augmented Generation (RAG) app powered by Pathway.
💡 Key Features:
• Live ESG + news ingestion
• On-the-fly indexing and vector search
• Natural language answers using an LLM
• REST API and clean UI to tie it together
The Problem Statement We Solved
The challenge asked us to:
• Ingest real-time ESG and news data
• Build a vector store for document retrieval
• Integrate an LLM into a RAG pipeline
• Expose it via a REST API
• Add a simple UI to show results in real time
Our twist?
_We focused on ESG data in the financial domain, where timely insights are critical for compliance, investor updates, and risk assessment.
System Architecture: How It All Comes Together
Here’s a quick view of our tech stack and data flow:
scss
_Data Sources (ESG + News) _
→ Pathway Pipeline (Ingestion & Indexing)
→ Vector Store (Custom Embeddings)
→ RAG (LLM with retrieved context)
→ FastAPI (REST Endpoint)
→ Streamlit UI (Live Interface)
Tools We Used
• Pathway: real-time data ingestion & indexing
• FastAPI: RESTful backend
• Ollama + Phi3: lightweight, local LLM
• Streamlit: interactive frontend
We designed everything to simulate a production-ready, low-latency ESG dashboard.
Pathway in Action: Real-Time ESG Intelligence
We used Pathway to:
• Ingest JSONLines files for ESG & news
• Stream new entries with pw.io.jsonlines.read
• Parse and normalize data using custom schemas
• Generate basic embeddings (hash-based)
• Build a vector index in real time
Why it matters:
Unlike batch pipelines, Pathway reacts instantly to new data — exactly what ESG systems need.
Retrieval-Augmented Generation (RAG) with LLMs
_Here’s how we made answers smart, fast, and grounded:

Query received via REST API
Query embedded → nearest neighbors retrieved from index
Context + query → passed to Ollama Phi3 model
Response returned with answer + context + metadata We manually handled distance calculations due to type quirks in Pathway — a cool hack that paid off. User Interface & API: Real-Time, User-Friendly We kept the interface super simple: • Streamlit dashboard with: o Query input o Styled results (Answer, Context, Metadata) o Real-time ESG data log • REST API using FastAPI at /rag o Accepts POST queries o Returns full RAG output as JSON

Streamlit app for ESG RAG showing live answers

JSON response from FastAPI RAG endpoint

Challenges We Faced

Embedding accuracy: Our hash-based method worked — but lacked semantic depth. We’re eyeing Sentence Transformers next.
Real-time simulation: We faked streaming via .jsonl updates. A real app would hook into financial APIs or Kafka.
Pathway quirks: Type issues during embedding comparison meant writing custom logic to find nearest neighbors.

Lessons Learned
• Real-time pipelines demand reactive architecture
• Pathway is killer for streaming use cases
• RAG reduces LLM hallucination by grounding in facts
• Simplicity in UI and architecture wins during hackathons

What’s Next?
• Upgrade to semantic embedding models
• Add ESG trend visualizations (e.g., emissions over time)
• Experiment with multi-step reasoning agents
• Try Pathway’s native serve_callable for deployment

GitHub & Resources
🔗_ Repo:_ GitHub – Real-Time ESG RAG App

📽️ Demonstration Video of Our Project:
Intro to Retrieval-Augmented Generation

🔗 Tooling:
• Pathway GitHub
• Ollama LLM Runner
• Streamlit Docs

Frequently Asked Questions (FAQs)
What is ESG reporting in finance?
ESG reporting tracks how companies perform on Environmental, Social, and Governance criteria — key for sustainable investing.
Why use AI for ESG analysis?
AI enables faster insights, better data integration, and real-time alerting vs. traditional quarterly reports.
How does a RAG pipeline work?
RAG fetches relevant context for a user query and feeds it into an LLM to generate grounded answers.
Can I deploy this app myself?
Yes! Our GitHub has setup instructions for running locally with your own data.

Final Thoughts
We built this project to showcase what’s possible when AI meets live data. Real-time ESG insights aren’t just cool — they’re necessary in today’s fast-moving financial landscape.
This hackathon win was just the start.
Try the demo, fork the code, and help us take real-time AI for ESG to the next level.

AI for ESG Reporting Using Real-Time RAG and Live Data Streams

Comments (0)

Read More

#reading

#popular

AI for ESG Reporting Using Real-Time RAG and Live Data Streams

Comments (0)

Read More

Questions to ask before you build a knowledge graph

What I learned building my first AI Agent – Part 1

Enhancing LLMs with Retrieval-Augmented Generation (RAG): A Practical Guide

All Data and AI Weekly #188 - May 5, 2025

#reading

#popular