Skip to main content
All projects
ESG / Climate Earth Security

Climate Research Engine

Analysts spent weeks manually matching companies to carbon offset projects. This tool does it in seconds. Vector search over sustainability disclosures, 3k+ companies profiled.

LangChain FAISS OpenAI Streamlit PostgreSQL Pydantic Docker AWS ECS Terraform
Aerial view of mangrove forest and river with boats

Challenge

Earth Security helps organisations navigate the carbon credit market. Their research team identifies which companies are most likely to invest in specific types of carbon projects, from mangrove restoration to clean energy. The intelligence they need is buried across sustainability reports, annual filings, and regulatory disclosures. Each research cycle meant weeks of manual document review.

The brief was tight. Build something crude but useful that helps analysts identify leads and opportunities. Discovery to working MVP, under £10k.

Approach

I worked with the analysts through a discovery phase to define the data points, data sources, and queries that would actually move the needle. 60+ fields per company — climate commitments, carbon credit history, biodiversity targets, financial instruments, SBTi status. Each extraction prompt was specified with validation criteria so the LLM pipeline could reliably pull structured data from unstructured reports.

01Ingest
Document PipelineReports, filings & disclosures
Source DocumentsAnnual sustainability reports, ESG filings, CDP disclosures, net-zero strategy PDFs
Chunking & EmbeddingDocuments split into semantic sections and embedded with OpenAI for vector search
Vector IndexingIndexed into FAISS for sub-second similarity search across the full corpus
02Enrich
Multi-Source Enrichment7 sources · 3k+ companies
Company FirmographicsIndustry, HQ, employee count, revenue band, and key contacts via Apollo & HubSpot
Corporate StructureLegal entity mapping and parent-subsidiary hierarchy from GLEIF registry
Climate IntelligenceLLM-extracted: emission targets, SBTi status, credit history, green bonds, nature commitments
03Match
Prospect MatchingScored & ranked by fit
Vector SimilarityMatches company climate profiles against project characteristics using semantic search
Evidence CitationsEvery match links back to the exact passage in the source document that supports it
Analyst-Ready OutputRanked company profiles with match scores, ready for review and outreach decisions

We started with the analysts’ real questions, not a polished spec. The data points, sources, and scoring logic were defined with the people who’d use the tool every day.

Built from 7 sources, and growing.

Firmographics, corporate hierarchy, climate commitments, carbon credit history, and contact data — stitched together automatically so analysts never start from a blank spreadsheet.

ApolloCompany data
Firmographics, industry classification, employee count, contacts
HubSpotCRM
CRM records, engagement history, deal stage and pipeline status
GLEIFLegal entities
Legal entity identifiers, corporate hierarchy, parent-subsidiary mapping
Verra RegistryCarbon registry
VCS carbon project listings, verification status, credit issuance volumes
Ocean 100Marine data
Top 100 ocean economy companies, marine sustainability commitments
LLM ExtractionCorp Disclosures
Emission targets, SBTi status, carbon credit usage, green bond history

Sixty data points sounds like a lot. It’s the minimum. Every field maps to a question their analysts were already asking manually — we just made the answers show up before they had to go looking.

The research team needed answers from thousands of pages of unstructured reports. The tool indexes sustainability disclosures as vector embeddings so analysts can search by meaning, not keywords. LangChain orchestrates the pipeline from document ingestion through to matching output, and every result links back to the exact passage in the source document.

Companies
BP plcEnergy
ShellEnergy
MaerskShipping
NestléConsumer
Carbon Projects
REDD+
Blue Carbon
Clean Cookstoves
Renewable Energy

Analysts query the vector store in natural language through a Streamlit interface. No SQL, no filters. Just ask the question the same way you’d ask a colleague. The system returns ranked matches with citations from the source documents.

Climate Intelligence Search
Example queries
FTSE 100 companies investing in blue carbon projects23 results
Shipping companies with SBTi-approved reduction targets14 results
Companies that purchased REDD+ credits in the last 2 years31 results
European consumer brands with mangrove restoration investment8 results

The best tool is the one your team actually uses. Natural language search meant analysts didn’t need to learn SQL or wrestle with filters. They just asked the question.

Company profile snapshot
Company Profile — BP plc
BP plcEnergy · London, UK
70%
Match
score
Oil & GasSector
$164BRevenue
67,600Employees
LSE: BPListed
Science Based TargetsNature DisclosureActive Credit Buyer
Project Fit
82%
REDD+Avoided deforestation
”$50M+ committed to nature-based solutions”
71%
Blue CarbonCoastal & marine
”Active mangrove restoration investment”
45%
Clean CookstovesHousehold energy
”Scope 3 offset strategy includes household”
Key Evidence
50% emission reduction target by 2030
$4.75B in green bonds issued
Active VCS credit buyer — Forest Protection, Mangrove, Cookstove

Result

Discovery to MVP in 2 months, under £10k. What used to take weeks of manual document review now takes seconds.

0Total build cost
0Discovery to MVP
0Companies profiled

The tool replaced weeks of manual document review with structured, enriched company intelligence. Analysts now surface qualified prospects in seconds and spend their time on outreach, not spreadsheets.