§ 02 · 2025

Wealth-tech · prospect enrichment

Sales intelligence pipeline.

I built an end-to-end pipeline that turns a prospect's name and email into a 22-field structured profile, factual bio, and sales-persona tag synthesized from six independent sources. The system is deployed in production and called from the client's GTM workflow on every new prospect.

Client: Wealth-tech leader
Role: Solo design + build
Domain: RIA / broker-dealer
Surface: Meeting prep
Status: Production
Scope: Microservice · prompts · ops

§ Readouts

Output22fieldsstrict-schema profile

Sources6+feedsreconciled per prospect

Latency12s p50roughly 30s p95

Cost<10centstypical end-to-end

§ 03

The problem

The hierarchy is the product

The sales team needed sharper prospect context than any off-the-shelf enrichment platform could give them. Their market has deeply layered account structure: parent firm, legal RIA or broker-dealer, brand, branch, wealth team, then the individual advisor or registered representative.

Getting that hierarchy wrong changes the sales conversation. A rep might sell under a small wealth-team brand while the regulator record points to a giant parent firm. Generic enrichment tools flatten that into one company field, which creates bad prep, wrong assumptions, and weak outreach.

The right answer lived across specialized databases, regulatory filings, LinkedIn, public registry pages, and client workflow data. I built the layer that resolved, reconciled, and compressed those signals into a profile reps could trust before a meeting.

§ 04

What I built

Six stages, strict output

01Identity resolution

CRD cascade

FINRA and SEC first, then site-restricted search, then LLM web lookup only when cheaper deterministic tiers miss.

02Authenticated enrichment

FastAPI microservice

A self-authenticating Python service wraps undocumented internal endpoints behind a clean API protected by X-API-Key.

03Search lanes

Email · name · CRD

Each input path uses a different scoring and verification strategy so common advisor names do not silently resolve to the wrong person.

04Detail fan-out

14+ endpoints

Profile, education, work history, current firms, registered locations, AUM statistics, branch locations, news, accolades, and tags.

05Reconciliation

Firm hierarchy

LinkedIn, regulatory records, registry summaries, and platform data are reconciled across team, branch, brand, RIA/BD, and parent firm.

06LLM synthesis

22-field JSON

One structured call emits a factual 2-4 sentence bio, firm/team fields, parent-firm fields, and a sales-persona tag.

§ 05

Engineer's notes

Where the hard parts were

CRD discovery is a safety problem, not a convenience feature. The cascade prefers regulator sources, writes provenance to separate columns, and only uses LLM lookup after cheaper deterministic paths miss.
IdentityAuditability
The richest platform had no public API. I built a self-authenticating service that logs in, maintains short-lived headers, retries once on 401, and exposes a clean enrichment API to the outer workflow.
AuthUndocumented API
Search by email, full name, and CRD are different matching problems. I used token overlap, domain-vs-firm comparison, normalized-name scoring, positional bonuses, and profile verification instead of a single fuzzy matcher.
MatchingPrecision
The hierarchy model matters: wealth team if present, then smallest client-facing entity with a CRD, then brand plus legal RIA/BD, with parent firm only when structurally distinct.
ModelingRIA / BD
The prompt is a specification, not a vibe. It suppresses data-source disclosure, forbids invented role enhancements, defines ownership phrasing, and requires strict JSON across all 22 fields.
Prompt specStructured output
Reliability work made the service operable: periodic health checks, force-refresh endpoints, headless-browser fallback auth, in-band endpoint errors, and per-key Slack alert throttling.
OpsSelf-healing

§ 06

Output contract

Representative prospect profile

type AdvisorProfile = {
  bio: string
  yearsOfIndustryExperience: number
  wealthTeam: {
    name: string | null
    website: string | null
    advisorCount: number | null
    address: string | null
    aum: number | null
    clientCount: number | null
  }
  currentFirm: FirmProfile
  parentFirm: FirmProfile | null
  currentFirmCrd: string | null
  persona:
    | "Experienced Solo Advisor"
    | "Growing Solo Advisor"
    | "Growing Team Investigator"
    | "Large Team Investigator"
}

§ 07

Outcome and next.

Operational result

Meeting prep that reps stopped double-checking.

The pipeline produces a better, more consistent brief in roughly 30-60 seconds at less than ten cents per prospect in the common path. Manual equivalent research takes 15-30 minutes per prospect and still varies by the person doing it.

The next improvements are clear: true concurrency for the 14-endpoint fan-out, response caching keyed by CRD, Pydantic validation around the LLM output, and cohort-level evaluation that ties persona and bio variants to sales outcomes.

Have a system like this to ship?

For focused AI systems architecture or production workflow work, write directly. I take on limited, non-conflicting fractional builds.

hello@chrisgarre.com

Directhello@chrisgarre.com

Cadence2 business days

EngagementsFractional · 6-12 wk

CapacitySelective · 1-2 d/wk