ESG / Climate Earth Security

Climate Research Engine

Analysts spent weeks manually matching companies to carbon offset projects. This tool does it in seconds. Vector search over sustainability disclosures, 3k+ companies profiled.

LangChain FAISS OpenAI Streamlit PostgreSQL Pydantic Docker AWS ECS Terraform

Aerial view of mangrove forest and river with boats

Challenge

Earth Security helps organisations navigate the carbon credit market. Their research team identifies which companies are most likely to invest in specific types of carbon projects, from mangrove restoration to clean energy. The intelligence they need is buried across sustainability reports, annual filings, and regulatory disclosures. Each research cycle meant weeks of manual document review.

The brief was tight. Build something crude but useful that helps analysts identify leads and opportunities. Discovery to working MVP, under £10k.

Approach

I worked with the analysts through a discovery phase to define the data points, data sources, and queries that would actually move the needle. 60+ fields per company — climate commitments, carbon credit history, biodiversity targets, financial instruments, SBTi status. Each extraction prompt was specified with validation criteria so the LLM pipeline could reliably pull structured data from unstructured reports.

01Ingest

Document PipelineReports, filings & disclosures

Source DocumentsAnnual sustainability reports, ESG filings, CDP disclosures, net-zero strategy PDFs

Chunking & EmbeddingDocuments split into semantic sections and embedded with OpenAI for vector search

Vector IndexingIndexed into FAISS for sub-second similarity search across the full corpus

02Enrich

Multi-Source Enrichment7 sources · 3k+ companies

Company FirmographicsIndustry, HQ, employee count, revenue band, and key contacts via Apollo & HubSpot

Corporate StructureLegal entity mapping and parent-subsidiary hierarchy from GLEIF registry

Climate IntelligenceLLM-extracted: emission targets, SBTi status, credit history, green bonds, nature commitments

03Match

Prospect MatchingScored & ranked by fit

Vector SimilarityMatches company climate profiles against project characteristics using semantic search

Evidence CitationsEvery match links back to the exact passage in the source document that supports it

Analyst-Ready OutputRanked company profiles with match scores, ready for review and outreach decisions

We started with the analysts’ real questions, not a polished spec. The data points, sources, and scoring logic were defined with the people who’d use the tool every day.

Built from 7 sources, and growing.

Firmographics, corporate hierarchy, climate commitments, carbon credit history, and contact data — stitched together automatically so analysts never start from a blank spreadsheet.

ApolloCompany data

Firmographics, industry classification, employee count, contacts

HubSpotCRM

CRM records, engagement history, deal stage and pipeline status

GLEIFLegal entities

Legal entity identifiers, corporate hierarchy, parent-subsidiary mapping

Verra RegistryCarbon registry

VCS carbon project listings, verification status, credit issuance volumes

Ocean 100Marine data

Top 100 ocean economy companies, marine sustainability commitments

LLM ExtractionCorp Disclosures

Emission targets, SBTi status, carbon credit usage, green bond history

Sixty data points sounds like a lot. It’s the minimum. Every field maps to a question their analysts were already asking manually — we just made the answers show up before they had to go looking.

The research team needed answers from thousands of pages of unstructured reports. The tool indexes sustainability disclosures as vector embeddings so analysts can search by meaning, not keywords. LangChain orchestrates the pipeline from document ingestion through to matching output, and every result links back to the exact passage in the source document.

Companies

BP plcEnergy

ShellEnergy

MaerskShipping

NestléConsumer

Carbon Projects

REDD+

Blue Carbon

Clean Cookstoves

Renewable Energy

Analysts query the vector store in natural language through a Streamlit interface. No SQL, no filters. Just ask the question the same way you’d ask a colleague. The system returns ranked matches with citations from the source documents.

Climate Intelligence Search

Oil & gas companies with nature-based offset commitments

Example queries

›FTSE 100 companies investing in blue carbon projects23 results

›Shipping companies with SBTi-approved reduction targets14 results

›Companies that purchased REDD+ credits in the last 2 years31 results

›European consumer brands with mangrove restoration investment8 results

The best tool is the one your team actually uses. Natural language search meant analysts didn’t need to learn SQL or wrestle with filters. They just asked the question.

Company profile snapshot

Company Profile — BP plc

Oil & GasSector

$164BRevenue

67,600Employees

LSE: BPListed

Science Based TargetsNature DisclosureActive Credit Buyer

Project Fit

82%

REDD+Avoided deforestation

”$50M+ committed to nature-based solutions”

71%

Blue CarbonCoastal & marine

”Active mangrove restoration investment”

45%

Clean CookstovesHousehold energy

”Scope 3 offset strategy includes household”

Key Evidence

50% emission reduction target by 2030

$4.75B in green bonds issued

Active VCS credit buyer — Forest Protection, Mangrove, Cookstove

Result

Discovery to MVP in 2 months, under £10k. What used to take weeks of manual document review now takes seconds.

0Total build cost

0Discovery to MVP

0Companies profiled

The tool replaced weeks of manual document review with structured, enriched company intelligence. Analysts now surface qualified prospects in seconds and spend their time on outreach, not spreadsheets.

Need something like this?

30 minutes. I'll walk you through what building it would look like for your setup.

More work

Enterprise SaaS

LISA — IT Support Agent

Automated 40% of IT support requests

B2B SaaS / RevOps

Managed Data Enrichment

2,000 records enriched in 5 minutes. Every score explained. Zero credits.