Chicago, IL · AI Automation & GTM Engineering

Most AI projects die in production. Mine don't.

More than 80% of AI projects never reach production. RAND, 2024

150+ in production, still running. The demo is the easy part. What kills most systems is everything after: real data, real volume, the edge cases a demo never sees. I build for that part: the dedup, the guardrails, the failure handling nobody films, the work that decides whether something survives week two.

Lahjat: live and playable ↗ context-kit: clone and run ↗ ~60s: a sentence to a live, deduped campaign

The premise

The model stopped being the hard part a while ago. The hard part is figuring out what one person does all day, then building something that does it for them and keeps doing it long after launch. I came to this through linguistics: the study of how meaning gets built and how it quietly falls apart. That's the same problem as making an AI system behave. Precise language in, predictable behavior out. Which is why the scarce skill isn't writing code. Plenty of people write code. The scarce skill is taste: knowing what to build, what to keep boring and deterministic, and what to leave alone. Most of my best decisions were things I talked someone out of.

What I work on

Three currents, often running at once

AI Automation

Workflows that triage, enrich, decide, and act on real data. I build deterministic systems where reliability is the point, and agentic ones where genuine reasoning earns its cost. Choosing correctly between the two is most of the work.

n8nLangGraphPythonClaude / OpenRouterRAGSupabase

Marketing & RevOps

I close the distance between marketing activity and revenue. I wire the GTM stack end to end and report on what moves the number, not just what's easy to count. Direct-response instincts in the Halbert/Kennedy tradition, applied to systems instead of single letters.

SalesforceActiveCampaignApolloInstantlyDirect response

Language & Structure

The quality of what comes out of an AI system is bounded by the precision of what goes in. That's a linguistics problem before it's an engineering one — and it's where most implementations quietly bleed. (Trained linguist. Phonetics specialization. Fluent in German, C1 Arabic in progress.)

LinguisticsPhonetics / IPAPrompt architecture

العربية

Arabic→ C1

/ˈʕarabiː/

Deutsch

Germanfluent

/dɔʏtʃ/

Español

Spanishconversational

/espaˈɲol/

Patterns, not promises

What "production-grade" actually means

A demo runs once, on stage. A system runs every day, on real data, when no one's watching. The difference is in the details: dedup logic, hallucination checks, brand-voice enforcement, false-positive reduction.

Time to live

~60s

From a free-text customer description to a live, deduplicated outbound campaign

Pipeline depth

20+

Discrete nodes in a single pipeline — parse, enrich, generate, validate, dedup, launch

Volume shipped

150+

Systems shipped to production, not prototypes left in a sandbox

The stack I build with

How I think about it

Principles I build by

→ Specificity beats generality

General-purpose AI rollouts look impressive and quietly fail. Systems built around one real workflow get used. I'd rather solve one person's task completely than gesture at solving everyone's.

→ The boring parts are the product

Deduplication, validation, error handling, voice enforcement. Nobody demos these. They're the entire reason a system survives contact with reality.

→ Deterministic until proven otherwise

Agentic architecture is a tool, not a default. Most problems want a reliable pipeline. Knowing when not to reach for autonomy is a senior signal.

→ Language is the interface

The quality of what comes out of an AI system is bounded by the precision of what goes in. That's a linguistics problem before it's an engineering one.

Selected systems

A few things I've built, described by what they decide

Anonymized on purpose. No clients, no revenue figures. The logo behind a system is the least interesting thing about it. What's worth your time is the architecture and the decisions inside it.

Account intelligence → sales floor

One Slack command returns an account brief that already knows what you sell them

Slack slash command (async webhook)
n8n parallel fan-out / merge
Version-controlled context, fetched at runtime
Open-web research + CRM state, fused
LLM synthesis, Claude via OpenRouter

A rep types a company name into the channel they already live in and gets back a structured brief seconds later: ICP fit, likely buyers, application match, recommended products, and the field most research tools skip entirely, the relationship. Net-new or existing account? What do they already own? Is there open pipeline a teammate is quietly working? The command fans out in parallel, reads the company's public footprint from a live research API, pulls demand-side truth from the CRM, composes both against a version-controlled context layer, and posts the answer back to the same thread.

Most "research bots" summarize a website. The work here was the join: supply-side signal from the open web against demand-side reality from the CRM, so the brief can tell a rep they're about to walk into a deal a colleague already owns. The context files are read from a versioned repo at request time, never pasted into a prompt, so one edit propagates to every brief the team will ever run. And customer data is walled off from model training by default. That last constraint is the one nobody asks for. It's also the reason a system like this is allowed near a CRM at all.

Free-text → live campaign engine

A sentence becomes a deduplicated outbound sequence in about a minute

Multi-stage n8n pipeline
Web frontend
LLM for language
Enrichment + CRM dedup

You type a plain-English description of who you want to reach. The system parses it into structured targeting, pulls and enriches matching prospects, writes a multi-step sequence in a defined brand voice, checks its own output for hallucinations, dedupes against the CRM, and pushes it live. Sentence in, campaign out.

What I'm proud of isn't the speed. It's that it's not an agent. The LLM does the language; the orchestration does the thinking. Autonomy would have demoed better and broken more. Choosing the reliable architecture over the impressive one is the whole job.

Context infrastructure

A version-controlled context layer injected into every AI call a company makes

GitHub repo, structured markdown
Composable core / catalog / department layers
Slack-native slash command surface
Reviewed like code, because it is

Prompt libraries collect answers; this maintains truth. One repo of structured markdown — ICP, personas, voice, the claims the company never makes — composed in layers and injected as context into every AI workflow in the org. Anyone can pull it from Slack with a slash command. Nobody can quietly fork it.

Hallucination is a context problem before it's a prompt problem. A model fills gaps with plausible fiction when nobody hands it the source of truth. Treating context like code — versioned, composable, reviewed — turned drafts that needed three rounds of correction into drafts that come back clean.

The production system stays behind the curtain. So I rebuilt the architecture in the open — a runnable reference with an assembly engine, CI budget enforcement, and a fictional company in place of any client. Clone it and run it.

Inspect the reference build

Sales-signal triage

An LLM-augmented pipeline that surfaces upsell signals buried in correspondence

n8n + LLM classification
CRM-integrated
Tuned for false-positive reduction

Reads inbound correspondence and flags genuine expansion signals before anyone has to go looking for them. The hard part isn't detection, it's restraint. A classifier that fires on everything is noise wearing a useful costume, and a team learns to ignore it within a week. So the whole system is tuned around what it chooses not to surface.

The temptation was to let a model "read everything and decide." I built it as classification with tight thresholds instead, tuned so a flag a human trusts is worth more than ten the model finds interesting. Precision is the product; the cleverness is invisible on purpose.

Retrieval & content

RAG assistant + AEO/GEO content pipeline

Retrieval-augmented generation
Web frontend
Answer-engine-optimized output

Two halves of the same problem: getting grounded answers out of a body of source material, and structuring published material so answer engines surface and cite it correctly. One faces inward, one faces the open web, but both come down to the same discipline, making sure the words map cleanly to something real. The public-facing half turned answer engines into a measurable, compounding referral channel, year over year.

Retrieval is where most RAG quietly fails: the model sounds confident over the wrong chunk. I spent the effort on grounding and citation rather than the generation layer, because a fluent answer to the wrong question is the most expensive kind of wrong. Disciplined retrieval beats clever generation every time.

Inbound qualification → sales routing

A high-intent signal gets checked against CRM history before a rep is ever pinged

Tag-based signal capture
n8n orchestration, multi-stage
LLM qualification gate, structured output
Enrichment + research, gated on verification

Every tagged high-value signal looks the same from the outside: someone showed interest. Most of it is noise, an existing customer re-engaging, a deal already being worked, the same contact tagged twice. This reads each lead against live CRM history first, checking for an open opportunity, a recent purchase, or a real conversation about that specific product, before anything reaches a human. Only the genuinely net-new ones get enriched and routed.

The expensive failure mode here isn't a missed lead, it's a false alarm: a rep gets pinged about an account a colleague already owns, and stops trusting the system within a week. So qualification fails closed, surfacing only on real evidence of net-new interest, while enrichment fails open, a lookup hiccup never blocks a genuine one. Research only runs once a contact is independently verified; an unmatched name skips straight to notification instead of feeding a guess to a model that would happily invent one.

Work with me

Working together

I take a small number of engagements. The work is project-based: a fixed scope, a shipped system, and full documentation so you own what we build. If you're running a B2B sales or marketing motion and the top of your funnel is manual, expensive, or both — that's the problem I solve.

Start with the pipeline diagnostic

A build with my name on it

Everything above is anonymized. This one you can play.

Client work stays behind the curtain. Lahjat doesn't. It's a side project I designed and shipped end to end, and it's the clearest thing I can point to when I say the linguistics training is the edge, not the trivia.

A GeoGuessr for Arabic speech. Hear a clip, drop a pin, find out where the voice is from.

The interesting decision is in the scoring. Arabic dialects don't follow national borders, so the data model doesn't either. Cities sit in roughly 37 linguistically motivated clusters, so guessing Mosul for an Aleppo clip scores better than guessing Baghdad, even though Baghdad is closer on the map, because Mosul and Aleppo share qeltu features Baghdad lacks. Geography is the obvious model. It's also the wrong one.

Every guess feeds an accuracy-weighted, crowd-tagged corpus of dialect audio as a byproduct. The thing teaches while it collects.

Next.jsSupabaseMapboxVercel

Play Lahjat ↗

lahjat.app — live, playable, and mine.

The newsletter

Signal,
Not Noise

One real system per issue: what it does, the architecture decision behind it, and what broke before it worked. If you build automation for a living — or want to — this is the tuition I already paid, free. No hype, no headline roundups.

Your information is protected. Unsubscribe at any time.