
Challenge
Earth Security helps organisations navigate the carbon credit market. Their research team identifies which companies are most likely to invest in specific types of carbon projects, from mangrove restoration to clean energy. The intelligence they need is buried across sustainability reports, annual filings, and regulatory disclosures. Each research cycle meant weeks of manual document review.
The brief was tight. Build something crude but useful that helps analysts identify leads and opportunities. Discovery to working MVP, under £10k.
Approach
I worked with the analysts through a discovery phase to define the data points, data sources, and queries that would actually move the needle. 60+ fields per company — climate commitments, carbon credit history, biodiversity targets, financial instruments, SBTi status. Each extraction prompt was specified with validation criteria so the LLM pipeline could reliably pull structured data from unstructured reports.
Document PipelineReports, filings & disclosures
Multi-Source Enrichment7 sources · 3k+ companies
Prospect MatchingScored & ranked by fit
We started with the analysts’ real questions, not a polished spec. The data points, sources, and scoring logic were defined with the people who’d use the tool every day.
Built from 7 sources, and growing.
Firmographics, corporate hierarchy, climate commitments, carbon credit history, and contact data — stitched together automatically so analysts never start from a blank spreadsheet.
Sixty data points sounds like a lot. It’s the minimum. Every field maps to a question their analysts were already asking manually — we just made the answers show up before they had to go looking.
The research team needed answers from thousands of pages of unstructured reports. The tool indexes sustainability disclosures as vector embeddings so analysts can search by meaning, not keywords. LangChain orchestrates the pipeline from document ingestion through to matching output, and every result links back to the exact passage in the source document.
Analysts query the vector store in natural language through a Streamlit interface. No SQL, no filters. Just ask the question the same way you’d ask a colleague. The system returns ranked matches with citations from the source documents.
The best tool is the one your team actually uses. Natural language search meant analysts didn’t need to learn SQL or wrestle with filters. They just asked the question.
Result
Discovery to MVP in 2 months, under £10k. What used to take weeks of manual document review now takes seconds.
The tool replaced weeks of manual document review with structured, enriched company intelligence. Analysts now surface qualified prospects in seconds and spend their time on outreach, not spreadsheets.