Rishabh Pandey
I'm a production engineer at Meta, working on the reliability of large distributed systems — observability, capacity, and incident response on a database tier powering an ads platform. Before that: event pipelines and cloud infrastructure at Geico, and internships at Geico and Texas Instruments.
These days the problems that pull me in sit at the intersection of backend systems and applied AI — I've shipped LLM pipelines at Meta and agentic projects on the side, and I'm looking to go deeper on both.
Meta — Production Engineer
Reliability of a distributed database tier powering an ads platform.
Owned the investigation that root-caused recurring database-throttling SEVs (~5–6 per half, ~30–40 unactioned alerts) to cross-team callers reusing shared functions and exhausting the owning team's QPS budget.
Instrumented per-callsite QPS observability via 70%/90% threshold detectors and method-signature attribution; designed and drove a callsite-forking remediation that segments ownership and isolates traffic per caller.
Built an applied ML pipeline on Llama 3.3 — structured summarization, embeddings, cosine-similarity clustering — grouping 8,000+ SEV follow-up tasks into actionable clusters for batch resolution.
Executed a subdomain migration to a dedicated tenant for a platform serving thousands of DAU; cut p95 latency 57% and shrank the fault-isolation domain.
Geico — Software Engineer
Serverless event pipelines and contact-center observability on AWS.
Authored Python Lambda services that normalize contact events, batch-write to DynamoDB, and fan out via EventBridge with DLQs and exponential backoff — 500–700K events/day at ~90% test coverage on least-privilege Terraform modules.
Rolled out Amazon Connect queue observability across 800+ queues with SNS + Slack alerting; cut batch metrics latency 12× (120 s → 10 s) via request coalescing and concurrent SDK fan-out.
Software Engineering Intern
Deployed JupyterHub on Azure Kubernetes Service with Azure AD SSO/RBAC and Spark/ADLS connectivity; the lake-first analytics workflow contributed to a ~25% reduction in team Snowflake spend.
Texas Instruments — SWE Intern
Internal tooling for licensing operations.
Engineered an Oracle APEX + PL/SQL internal tool that replaced a 4–6 hour manual lookup with a ~5-second self-serve flow, used daily by licensing ops across 250+ products.
Local Paste Service
repo ↗A zero-dependency pastebin that lives entirely on your machine — a single 20 MB Go binary backed by SQLite in WAL mode for concurrent local storage. CLI-first workflow, automatic TTL expiration via a background garbage collector.
→ next: projects/interview-agent.mdLinkedIn Interview Agent
repo ↗An agentic RAG interview platform: ingests a job description and resume, maps skills semantically with FAISS, and generates adaptive questions conditioned on the candidate's gaps. Instrumented end to end — p50/p95 latency and $/session tracking, with question generation at ~2 s p95.
→ next: contact.jsonFull detail lives in the individual files — experience/ and projects/ in the tree.
Email is fastest. Happy to talk about production engineering, reliability, infra tooling — or anything you've found on this site.