Skip to content

▸ MISSION CONTROL — EARTH STATION 01 ◂

PORTFOLIO.SYS

SUSHIL DALAVI · AI RESEARCH ENGINEER · v2.0.25

NEURAL CORE.........[----]
NLP ENGINE..........[----]
RAG PIPELINE........[----]
LLM INTERFACE.......[----]
PORTFOLIO ASSETS....[----]

INITIALIZING SYSTEMS...

ALL SYSTEMS NOMINAL · STANDBY FOR DEPLOYMENT · LAT 34.02° N / LON 118.28° W

AvailableSoftware Developer

Sushil Dalavi.

Scalable Backend Systemsfor AI, Data & Reliability

Software engineer building backend platforms, data systems, and AI tooling that hold up under real traffic.

Résumé
Connect
LAUSC '26
About

Software engineer focused on backend, data, and AI systems.

APIs, data pipelines, async workers, validation layers, and observability.

I like building systems that stay calm when traffic gets messy.

My work usually sits where backend and AI meet: API orchestration, data pipelines, async workers, validation layers, and observability that actually helps during incidents.

At USC Annenberg, I helped turn a noisy multilingual ingestion flow into a reliable pipeline: 1M+ validated records across 3 country datasets, plus an 85% drop in manual validation through schema-checked FastAPI routing and model fallback logic.

Before that at Reliance Jio, I focused on backend speed and stability - p99 dropped from 2.8s to 480ms at 1,500+ RPS, and a failure-prone document pipeline was rebuilt into Kafka-backed chunked streaming to S3 to remove OOM crashes.

Focus

Backend Engineering · FastAPI · Spring BootDistributed Systems · Async Workflows · ReplayData Infrastructure · AWS · StreamingAPI Contracts · Reliability · ObservabilityAI Infrastructure · Inference PlatformsPerformance Engineering · Load & Latency
Education
University of Southern California

Master of Science in Computer Science

University of Southern California

Los Angeles, CA
Aug 2024
May 2026
University of Mumbai

Bachelor of Engineering in Computer Engineering

University of Mumbai

Mumbai, India
Aug 2019
May 2023

Relevant Coursework

Analysis of AlgorithmsDatabase SystemsInformation RetrievalApplied Cryptography
The short version

Current work,
summarized.

Scroll to unfold the manifest.

sushil@sushildalavi.dev
$ cat ~/work/manifest.yaml
role: Software Engineer · AI Infrastructure · USC Annenberg NLC
current_work:
- multi_modal_alignment: F1=0.993 · cov=0.999
- aws_data_platform: records=1M+ · countries=3
- distributed_workflows: tools=12 · retries=auto
- hybrid_retrieval: MRR+21.8% · nDCG+18.0%
stack:
- python · fastapi · pydantic · SQLAlchemy
- aws (s3, glue) · graphql · playwright
- postgresql · redis · kafka · prometheus
availability: open_to_sde_swe_ai_ml_roles
 
$ _
Selected Work

Selected engineering work

Backend, observability, workflow, and AI infrastructure projects built end-to-end, with clear reliability, latency, and throughput outcomes.

01Observability ·Databases

QueryLens

PostgreSQL Query Performance Monitor

Production-style observability pipeline for PostgreSQL query behavior with C++ telemetry collection, Kafka/Redpanda transport, FastAPI ingestion, deterministic regression detection, and dashboard triage workflows.

  • Captured PostgreSQL query telemetry with a low-overhead C++ collector and normalized SQL fingerprinting.
  • Hardened ingestion with idempotent event IDs, retry/backoff, and DLQ persistence.
  • Detected deterministic query regressions across latency spikes, scan fallbacks, temp spills, and vector-search issues.
  • Streamed telemetry through Kafka/Redpanda into FastAPI and PostgreSQL with Prometheus/Grafana operational visibility.
C++PythonFastAPIPostgreSQLpg_stat_statementsKafka/RedpandaPrometheusGrafana+1

Shipped Metrics

100K events

Event Volume

8,938 events/sec

Throughput

0 DLQ/persistence failures

Reliability

Stack

C++
Python
FastAPI
PostgreSQL
pg_stat_statements
Kafka/Redpanda
02Distributed Systems ·Workflows

ReplayForge

Async Workflow Replay & Failure Debugging Platform

Crash-tolerant workflow replay platform with a FastAPI control plane, Go worker execution path, Redis Streams transport, PostgreSQL persistence, and replay-safe recovery automation.

  • Built FastAPI operational APIs for workflow visibility, replay controls, and failure debugging.
  • Implemented idempotency-keyed completion paths with retry/backoff and durable PostgreSQL writes.
  • Recovered orphaned pending entries using XAUTOCLAIM and replay-safe Redis Streams reprocessing.
  • Integrated DLQ and replay tooling to isolate poison messages without stalling main flow throughput.
GoPythonFastAPIRedis StreamsPostgreSQLSQLAlchemyDocker ComposeReact

Shipped Metrics

Redis Streams

Transport

XAUTOCLAIM

Recovery

Replay-safe ingestion

Safety

Stack

Go
Python
FastAPI
Redis Streams
PostgreSQL
SQLAlchemy
03APIs ·Schema Governance

SchemaPilot

API Contract Drift Monitor

Contract observability service that tracks API payload drift through FastAPI ingestion, deterministic schema fingerprints, endpoint-scoped uniqueness guards, and severity-aware classification.

  • Tracked live traffic through a FastAPI /track ingestion path for runtime contract visibility.
  • Generated deterministic schema fingerprints and endpoint-scoped uniqueness keys for stable drift detection.
  • Used PostgreSQL advisory locks to coordinate concurrent updates safely across endpoint baselines.
  • Classified drift as SAFE, RISKY, or BREAKING and routed alerts via retry/backoff and DLQ flows.
PythonFastAPIPostgreSQLSQLAlchemyPrometheusGrafanaKafkaReact+1

Shipped Metrics

5,000 requests

Load

200

Concurrency

0

Failures

Stack

Python
FastAPI
PostgreSQL
SQLAlchemy
Prometheus
Grafana
04LLM Systems ·Inference

nanoserve

OpenAI-Compatible LLM Serving Engine

From-scratch local inference server on Apple Silicon with continuous batching, prefix KV reuse, quantization paths, streaming API compatibility, and integrated observability for throughput and latency profiling.

  • Implements continuous batching scheduler and prefix-cache reuse for serving efficiency
  • Supports OpenAI-style chat completions with streaming and non-streaming paths
  • Includes INT8/INT4 quantization experiments and reproducible ablation benchmarking
  • Ships with Prometheus metrics and provisioned Grafana dashboard observability
PythonFastAPIPyTorchTransformersTorchAOUvicornPrometheusGrafana+1

Shipped Metrics

OpenAI-Compatible

API

Continuous

Batching

INT8/INT4

Quantization

Stack

Python
FastAPI
PyTorch
Transformers
TorchAO
Uvicorn
05RAG ·Research

sourcery

Scholarly RAG with Citation Grounding

Scholarly retrieval-augmented generation app combining uploaded PDFs and live academic sources, with hybrid retrieval, evidence-linked generation, and calibrated claim-support confidence scoring.

  • Hybrid dense+sparse retrieval over local documents and scholarly APIs
  • Evidence-constrained generation that cites supporting source chunks
  • Confidence calibration over claim-evidence pairs for support scoring
  • Offline-first runtime path with local model fallback support
PythonFastAPIPostgreSQLpgvectorReactViteOllamaOpenAI API

Shipped Metrics

198

Tests

32 Typed

Routes

Hybrid

Retrieval

Stack

Python
FastAPI
PostgreSQL
pgvector
React
Vite
06Healthcare AI ·NLP

SOAPFlow

Clinical Transcript to SOAP Note Platform

Clinical documentation platform that converts transcripts into structured SOAP notes with multi-provider generation routing, PHI de-identification, evaluation tooling, vector search history, and audio transcription support.

  • Supports six generation backends including hosted and local providers
  • PII/PHI de-identification pipeline with Presidio and spaCy components
  • SOAP quality validation with structured warning and severity output
  • Vector-indexed note history and transcript-to-note workflow persistence
PythonFastAPIReactTypeScriptQdrantRedisSQLAlchemyMLflow

Shipped Metrics

6 Modes

Backends

SOAP

Format

Vector History

Search

Stack

Python
FastAPI
React
TypeScript
Qdrant
Redis
07Security ·Healthcare

MedLedger

Secure Decentralized EHR Audit Logging

Privacy-first audit logging system for EHR access with encrypted records, hash-chained ledgers across independent nodes, and TLS-pinned inter-service communication for tamper detection and integrity verification.

  • Encrypted audit records with per-patient key derivation and authenticated metadata binding
  • Cross-node majority verification detects tampering and chain inconsistency
  • Mutual service trust via pinned cert bundles and strict TLS verification
  • Role-based access controls for patient, doctor, audit, and admin personas
PythonFlaskCryptographyAES-GCMHKDFEd25519HMAC-SHA256PBKDF2+1

Shipped Metrics

3

Audit Nodes

TLS Pinned

Transport

Hash-Chained

Ledger

Stack

Python
Flask
Cryptography
AES-GCM
HKDF
Ed25519
Experience

Where I've built

Building backend, data, and AI infrastructure systems across research and telecom platforms.

Jun 2025 — May 2026
USC Annenberg Norman Lear Center

Software Engineer - AI Infrastructure

·USC Annenberg Norman Lear Center
Los Angeles, CA

Built production-style AI data infrastructure, ingestion automation, and validation services for large-scale multilingual social and media datasets.

  • Delivered 1M+ validated multilingual records across 3 country datasets through an AWS pipeline (S3, Glue, Athena, Lambda) with schema-versioned outputs
  • Reduced manual validation workload by 85% by automating LLM-output checks via FastAPI orchestration, OpenAI/Gemini fallback routing, Pydantic schema validation, JSON repair, and retry logic
  • Engineered an authenticated GraphQL ingestion workflow with Playwright, session refresh, structured retries, and parse validation for long-running social data pulls
  • Shortened media-to-transcript turnaround by 40% by parallelizing async stages for download, speech-to-text, speaker diarization, and structured transcript export
PythonFastAPIAWSS3GlueAthenaLambdaGraphQLPlaywrightPydantic
Dec 2023 — Jul 2024
Reliance Jio Platforms

Software Engineer

·Reliance Jio Platforms
Navi Mumbai, India

Built and optimized high-throughput backend services and distributed data flows across inventory and service-management systems.

  • Cut p99 latency from 2.8s to 480ms at 1,500+ RPS by profiling hot paths, adding targeted MySQL composite indexes, and isolating high-frequency reads with Redis look-aside caching
  • Eliminated recurring OOM failures in a 25K+ file document pipeline by redesigning uploads into a Kafka-orchestrated chunked streaming flow to AWS S3
  • Standardized Java/Kotlin service contracts and shared domain models across 6 Spring Boot microservices for inventory, operations, and service-management domains
  • De-risked 3 major upgrades with NGINX traffic shadowing and canary rollouts, catching 2 high-severity regressions before customer impact
JavaKotlinSpring BootPythonMySQLRedisKafkaAWS S3NGINX
Toolchain

What I build with

Stacks and tools I reach for when shipping production systems.

01

Languages

GoC++JavaKotlinPythonSQLBashTypeScriptJavaScript

02

Backend & APIs

Spring BootFastAPIRESTGraphQLgRPCPydanticSQLAlchemyHibernate

03

Databases & Streaming

PostgreSQLMySQLMongoDBRedisKafkaRedis Streamspgvector

04

Cloud & DevOps

AWSS3GlueDockerLinuxNGINXGitHub ActionsCI/CD

05

Testing, Observability & Frontend

PytestJUnitPlaywrightPrometheusGrafanaReactViteTailwind CSS
Off Hours

Beyond the terminal

What keeps perspective — the things worth stepping away for.

Interests

Football👑Real Madrid🏊Swimming🏓Table Tennis🎬Netflix🎧Spotify🎮Game Dev

Hala Madrid

Real Madrid CF

A proud Real Madrid supporter through and through — the mentality, history, and winning culture is unmatched.

Hala Madrid y nada más.

You miss 100% of the shots you don't take.

Wayne Gretzky
— Michael Scott
Contact

Let's talk
engineering.

I'm open to 2026 New Grad Software Engineer, Backend Engineer, Platform Engineer, and AI Infrastructure roles focused on reliable backend systems, data infrastructure, and practical AI tooling.

Say hello
Backend EngineeringData PlatformsAI InfrastructureOpen to 2026 roles