AI & Data Science Leader with a Ph.D. in Computer Engineering. Built and deployed the iTAAP production data platform across 25+ California school districts — spanning agentic AI orchestration, multi-model ML prediction, browser automation, geospatial visualization, and full-lifecycle ETL. Published at IEEE INFOCOM, ACM SIGCOMM, and Elsevier.
School Districts
Power BI Dashboards
Expatiate Communications · Pasadena, CA
University of Delaware · Newark, DE
7+ years across research and industry positions
First-principles thinking — designing novel systems when off-the-shelf tools fall short, and choosing the right technology for each problem rather than defaulting to the familiar.
Production systems designed to fail gracefully, recover automatically, and be safe for non-technical operators — with test modes, audit trails, and idempotent reruns.
Deep knowledge across two very different fields — California K-12 education compliance and internet-scale network measurement — enabling contributions that generalist engineers cannot make.
Solo architect, builder, and owner of a 32-project production platform serving 25+ independent organizations — from design to deployment to ongoing support.
School districts lacked predictive visibility into IEP compliance, academic progress, and operational risk — relying on slow, fragmented manual data aggregation across disparate assessment platforms.
Platform deployed across 25+ California school districts. LangGraph agentic automation achieved a 75–90% reduction in pipeline processing time. Replaced dozens of hours of weekly manual work — data collection, compliance tracking, report generation, and stakeholder communication — with single-command automated pipelines.
Transparent proxies silently intercept and modify web traffic without user awareness, but their true prevalence, behavior, and network-wide impact were poorly understood at scale.
Published at IEEE INFOCOM 2024 — one of the top-ranked venues in computer networking, revealing the significant hidden influence of transparent proxies on internet traffic integrity.
The open proxy landscape — used for anonymization, censorship circumvention, and malicious activity — had never been comprehensively characterized in terms of scale, geography, and behavior.
Published in Computer Networks (Elsevier), 2022 — delivering the first comprehensive analysis of the open proxy ecosystem and its security implications at internet scale.
Remote peering in BGP networks was known to distort anycast routing decisions, but the extent of this unintended impact on global traffic distribution — including for major cloud providers — had not been passively quantified.
Published in ACM SIGCOMM Computer Communication Review, 2019 — a flagship networking venue — establishing foundational methodology for passive anycast analysis used in subsequent internet measurement research.
As AI crawlers become ubiquitous, websites are moving beyond binary blocking (robots.txt) to a more sophisticated, unmeasured tactic: returning HTTP 200 responses to both humans and AI bots, but serving degraded, watermarked, or "poisoned" content specifically to crawlers like GPTBot.
Unlike prior work measuring blocking, this measures deception — filling a critical gap in understanding how the web's content landscape diverges between human and AI readers.
LLMs are widely used to generate Infrastructure-as-Code (Terraform, Kubernetes YAML, Nginx configs). If a model hallucinates a plausible but unregistered domain endpoint, an attacker could register that domain to intercept live API traffic or credentials from deployed systems.
Distinct from package hallucination studies — this targets DNS-level infrastructure interception, a critical supply chain risk not previously measured in the LLM security literature.
Selected work from the iTAAP production platform — agentic AI orchestration, multi-model ML forecasting, full-lifecycle ETL, compliance automation, geospatial visualization, and systems engineering across 25+ California school districts.
Agentic LangGraph state machine with gate-node conditional routing, 4-stage fan-out parallelism, and local Ollama LLM that diagnoses failures and suggests fixes. Student data never leaves the server.
Per-district ML system predicting CAASPP ELA/Math, ELPAC, Chronic Absenteeism, College/Career, and Suspension. Per-district model selection across RF / Linear / ARIMA / Holt-Winters with OpenAI + Gemini narrative generation.
5 models trained per school (ETS, ARIMA, Prophet, Random Forest, linear baseline). Ensemble of top 3 weighted by inverse validation error. Per-school comparison charts and ranked summary CSV.
Progressive multi-indicator filtering algorithm that finds nearby California schools outperforming a given school. Handles data gaps correctly — "no data" is not treated as "underperforming." Haversine radius search + Folium map output.
4-step pipeline encoding federal IDEA law as computable thresholds: SQL query → Playwright PDF download with MFA handling → PDF date extraction → green/yellow/orange/red risk scoring published to Power BI dashboards.
Cross-domain risk flagging across 14 LEAs: joins CAASPP, ELPAC, discipline, and SPED data to assign SST, 504, and SEL flags. Color-coded Excel reports via openpyxl + SQL Server write-back for Power BI dashboards.
Streamlit app with Plotly Mapbox scatter map, vectorized Haversine radius search, zip-code geocoding, click-to-move centering, and radar chart comparison across 6 performance indicators for any two schools side by side.
Dual-mode pipeline: REST API with token auth for 5 report types, Selenium browser automation for portal-only reports. Dual-credential routing maps each LEA to its assigned CALPADS account automatically.
Async two-phase Playwright scraper: Phase 1 paginates Coveo search API to collect profile IDs; Phase 2 scrapes each profile with BeautifulSoup regex ID matching. Per-record error isolation — failures log and continue without aborting.
Validation-first ETL: pre-flight LEA code check across 25+ districts, auto-fix for trailing delimiter errors (dry-run + apply-fix flags), idempotent truncate+reload, freshness tracking upserted to SQL Server for Power BI staleness indicators.
Go CLI fetching 12 Aeries SIS dataset types into MongoDB. Dynamically discovers high schools by HighGradeLevel field — no hardcoded lists. Per-school error recovery continues on API failure. `init()` guard validates config before any network call.
OAuth2 Gmail API pipeline querying SQL Server and dynamically selecting metrics from a configurable pool of 12 indicators. Constructs HTML-formatted tables and delivers weekly performance summaries to 18 school sites. Test mode prevents accidental sends.
4 Active Certifications · Issued 2026 · Valid through 2028
Coursera · Issued Aug 2023
Extensive peer review contributions ensuring the integrity and quality of high-tier network science and security venues.