CRD cascade
FINRA and SEC first, then site-restricted search, then LLM web lookup only when cheaper deterministic tiers miss.
I built an end-to-end pipeline that turns a prospect's name and email into a 22-field structured profile, factual bio, and sales-persona tag synthesized from six independent sources. The system is deployed in production and called from the client's GTM workflow on every new prospect.
The sales team needed sharper prospect context than any off-the-shelf enrichment platform could give them. Their market has deeply layered account structure: parent firm, legal RIA or broker-dealer, brand, branch, wealth team, then the individual advisor or registered representative.
Getting that hierarchy wrong changes the sales conversation. A rep might sell under a small wealth-team brand while the regulator record points to a giant parent firm. Generic enrichment tools flatten that into one company field, which creates bad prep, wrong assumptions, and weak outreach.
The right answer lived across specialized databases, regulatory filings, LinkedIn, public registry pages, and client workflow data. I built the layer that resolved, reconciled, and compressed those signals into a profile reps could trust before a meeting.
FINRA and SEC first, then site-restricted search, then LLM web lookup only when cheaper deterministic tiers miss.
A self-authenticating Python service wraps undocumented internal endpoints behind a clean API protected by X-API-Key.
Each input path uses a different scoring and verification strategy so common advisor names do not silently resolve to the wrong person.
Profile, education, work history, current firms, registered locations, AUM statistics, branch locations, news, accolades, and tags.
LinkedIn, regulatory records, registry summaries, and platform data are reconciled across team, branch, brand, RIA/BD, and parent firm.
One structured call emits a factual 2-4 sentence bio, firm/team fields, parent-firm fields, and a sales-persona tag.
CRD discovery is a safety problem, not a convenience feature. The cascade prefers regulator sources, writes provenance to separate columns, and only uses LLM lookup after cheaper deterministic paths miss.
The richest platform had no public API. I built a self-authenticating service that logs in, maintains short-lived headers, retries once on 401, and exposes a clean enrichment API to the outer workflow.
Search by email, full name, and CRD are different matching problems. I used token overlap, domain-vs-firm comparison, normalized-name scoring, positional bonuses, and profile verification instead of a single fuzzy matcher.
The hierarchy model matters: wealth team if present, then smallest client-facing entity with a CRD, then brand plus legal RIA/BD, with parent firm only when structurally distinct.
The prompt is a specification, not a vibe. It suppresses data-source disclosure, forbids invented role enhancements, defines ownership phrasing, and requires strict JSON across all 22 fields.
Reliability work made the service operable: periodic health checks, force-refresh endpoints, headless-browser fallback auth, in-band endpoint errors, and per-key Slack alert throttling.
type AdvisorProfile = {
bio: string
yearsOfIndustryExperience: number
wealthTeam: {
name: string | null
website: string | null
advisorCount: number | null
address: string | null
aum: number | null
clientCount: number | null
}
currentFirm: FirmProfile
parentFirm: FirmProfile | null
currentFirmCrd: string | null
persona:
| "Experienced Solo Advisor"
| "Growing Solo Advisor"
| "Growing Team Investigator"
| "Large Team Investigator"
}The pipeline produces a better, more consistent brief in roughly 30-60 seconds at less than ten cents per prospect in the common path. Manual equivalent research takes 15-30 minutes per prospect and still varies by the person doing it.
The next improvements are clear: true concurrency for the 14-endpoint fan-out, response caching keyed by CRD, Pydantic validation around the LLM output, and cohort-level evaluation that ties persona and bio variants to sales outcomes.
For focused AI systems architecture or production workflow work, write directly. I take on limited, non-conflicting fractional builds.