Open to Opportunities

Scaling AI Capabilities from Research to Production.

AI & Data Science Leader with a Ph.D. in Computer Engineering. 14+ years bridging rigorous academic research and production-grade AI systems — from internet-scale network measurement to Agentic AI pipelines that cut processing time by 90%.

14+

Years Engineering

90%

Pipeline Efficiency

8+

Top-Tier Pubs

Rui Bian, PhD — AI & Data Science Leader

Experience

2023 — Present
Lead Data Scientist

Expatiate Communications · Pasadena, CA

  • Architected iTAAP data platform on Microsoft Fabric serving nearly 30 school districts
  • Designed Agentic AI ETL system — achieved 90% reduction in data processing time
  • Built Assessment Integration Engine (iReady, NWEA, IXL → State Testing levels) and Power BI/DAX compliance dashboards
  • Developed iTAAP Insights Email Alerts and secure cross-team data endpoints
  • Mentors data science interns on internal tooling and automation workflows
2017 — 2022
Ph.D. Researcher — Computer Engineering

University of Delaware · Newark, DE

  • Conducted internet-scale measurement research on BGP routing, transparent proxies, and open proxy ecosystems
  • Published at IEEE INFOCOM 2024, Computer Networks (Elsevier) 2022, and ACM SIGCOMM CCR 2019
  • Built large-scale Python/AWS data collection pipelines processing millions of network probes
  • TPC member and reviewer: IEEE INFOCOM, DSN, TNSE, Computer Networks, and others
  • GPA: 3.96 / 4.0 — graduated December 2022
Prior Experience
Engineering & Research Roles

7+ years across research and industry positions

  • Broad engineering background spanning systems, data, and applied research prior to doctoral studies
  • B.S. in Engineering — University of Science and Technology of China (USTC)

High-Impact Engineering

Transforming EdTech Intelligence at Scale

Expatiate Communications Lead Data Scientist

The Challenge

School districts lacked predictive visibility into IEP compliance, academic progress, and operational risk — relying on slow, fragmented manual data aggregation across disparate assessment platforms.

Architecture

  • Modernized core data pipelines using Microsoft Fabric, advanced Python, and optimized SQL stored procedures
  • Designed an Agentic AI system to fully automate complex ETL processes and intelligent data routing
  • Built an Assessment Integration Engine standardizing iReady, NWEA, and IXL data against State Testing levels for real-time academic progress prediction
  • Engineered compliance dashboards in Power BI / DAX: FAPE Compliance Prediction, IEP Risk Projection, Student Services Tracking, Achievement Gap, and State Performance Plan Indicators
  • Developed iTAAP Insights Email Alerts to automatically notify districts of at-risk students requiring early intervention
  • Architected secure data endpoints enabling cross-team consumption of centralized iTAAP data for other company products

Business Impact

Platform deployed across nearly 30 school districts. Achieved a 90% reduction in data processing time via Agentic AI automation. Mentors data science interns on internal tooling, continuously improving the iTAAP operational lifecycle.

Internet-Scale Transparent Proxy Analysis

University of Delaware Ph.D. Research · IEEE INFOCOM 2024

The Challenge

Transparent proxies silently intercept and modify web traffic without user awareness, but their true prevalence, behavior, and network-wide impact were poorly understood at scale.

Methodology

  • Designed a large-scale active measurement system to detect and fingerprint transparent proxies across global internet paths
  • Built Python-based data collection and analysis pipelines processing millions of network probes
  • Developed novel detection heuristics combining HTTP header analysis and TCP-level signals

Academic Impact

Published at IEEE INFOCOM 2024 — one of the top-ranked venues in computer networking, revealing the significant hidden influence of transparent proxies on internet traffic integrity.

Mapping the Open Proxy Ecosystem

University of Delaware Ph.D. Research · Computer Networks 2022

The Challenge

The open proxy landscape — used for anonymization, censorship circumvention, and malicious activity — had never been comprehensively characterized in terms of scale, geography, and behavior.

Methodology

  • Crawled, scanned, and analyzed 436,000+ open proxies across the global internet
  • Built large-scale data collection infrastructure using Python and AWS for distributed scanning
  • Applied statistical modeling and traffic analysis to characterize proxy behavior, uptime, and abuse patterns

Academic Impact

Published in Computer Networks (Elsevier), 2022 — delivering the first comprehensive analysis of the open proxy ecosystem and its security implications at internet scale.

Anycast Routing & Remote Peering Effects

University of Delaware Ph.D. Research · ACM SIGCOMM CCR 2019

The Challenge

Remote peering in BGP networks was known to distort anycast routing decisions, but the extent of this unintended impact on global traffic distribution — including for major cloud providers — had not been passively quantified.

Methodology

  • Developed a passive BGP measurement methodology to infer anycast catchment boundaries without active probing
  • Analyzed global BGP routing tables and AS-path data across hundreds of vantage points
  • Correlated routing anomalies with remote peering relationships at internet exchange points (IXPs)

Academic Impact

Published in ACM SIGCOMM Computer Communication Review, 2019 — a flagship networking venue — establishing foundational methodology for passive anycast analysis used in subsequent internet measurement research.

In Progress

AI Cloaking & Content Differentiation on the Open Web

Independent Research Target: IMC / WWW / USENIX Security

The Challenge

As AI crawlers become ubiquitous, websites are moving beyond binary blocking (robots.txt) to a more sophisticated, unmeasured tactic: returning HTTP 200 responses to both humans and AI bots, but serving degraded, watermarked, or "poisoned" content specifically to crawlers like GPTBot.

Methodology

  • Twin-crawler framework (Playwright) visiting Tranco Top 10,000 domains — once as a standard browser UA, once as GPTBot
  • DOM tree structural comparison and text similarity scoring via Jaccard & TF-IDF cosine similarity
  • Sector-level taxonomy: paywall injection, text truncation, gibberish poisoning, visual watermarking

Novelty

Unlike prior work measuring blocking, this measures deception — filling a critical gap in understanding how the web's content landscape diverges between human and AI readers.

In Progress

LLM-Hallucinated Infrastructure Domains as an Attack Surface

Independent Research Target: NDSS / CCS / USENIX Security

The Challenge

LLMs are widely used to generate Infrastructure-as-Code (Terraform, Kubernetes YAML, Nginx configs). If a model hallucinates a plausible but unregistered domain endpoint, an attacker could register that domain to intercept live API traffic or credentials from deployed systems.

Methodology

  • 1,000+ DevOps-focused prompts submitted to GPT-4o, Claude 3.5 Sonnet, and Llama-3-70B
  • Regex extraction of all generated domains, filtered against known public registries
  • DNS resolution + Registrar API queries to quantify hallucination rate and live registrability of phantom endpoints

Novelty

Distinct from package hallucination studies — this targets DNS-level infrastructure interception, a critical supply chain risk not previously measured in the LLM security literature.

Tech Ecosystem

LLM & AI Systems

Prompt Engineering Context Engineering Harness Engineering RAG MCP Agentic AI AI Security AI Guardrails AI-Coding CLI

Core ML & Modeling

AI Model Training Model Fine-Tuning Model Evaluation PyTorch HuggingFace Scikit-Learn NLP Predictive Modeling

Data Engineering

Microsoft Fabric Apache Airflow Spark Snowflake ETL Pipelines SQL / Stored Procedures Big Data Analytics

Cloud & Architecture

AWS (EC2, S3, RDS) Docker Kubernetes System Design Python PostgreSQL Power BI / DAX MLOps

Professional Credentials

DC

DataCamp

4 Active Certifications · Issued 2026 · Valid through 2028

AI Engineer for Developers Associate
AI Engineer for Data Scientists Associate
Data Scientist Associate
Data Engineer Associate
G

Google Cybersecurity Professional Certificate

Coursera · Issued Aug 2023

Thought Leadership

Selected Publications & Patents

Silent Observers Make a Difference: A Large-scale Analysis of Transparent Proxies on the Internet.

Rui Bian et al. | IEEE INFOCOM, 2024.

Shining a Light on Dark Places: A Comprehensive Analysis of Open Proxy Ecosystem.

Rui Bian et al. | Computer Networks, 2022.

Towards Passive Analysis of Anycast in Global Routing: Unintended Impact of Remote Peering.

Rui Bian et al. | ACM SIGCOMM CCR, 2019.

Patent: Manufacturing method of micro lens / 一种微透镜的制作方法.

Gang Liu, Ying Xiong, Rui Bian et al. | CN104614936B.

Academic Service

Extensive peer review contributions ensuring the integrity and quality of high-tier network science and security venues.

Key TPC / Reviewer Roles:

  • IEEE INFOCOM ('17, '18, '19, '20, '21)
  • IEEE/IFIP DSN ('19, '21, '22)
  • IEEE Transactions on Network Science and Engineering (TNSE)
  • Computer Networks
  • IEEE ITEC, IEEE RTC, IEEE SmartSys

Let's Connect

Open to Senior Data Scientist, AI/ML Engineer, and leadership roles. Based in Los Angeles — open to hybrid and remote.

Email LinkedIn GitHub Resume

Rui's AI Matchmaker

Hi! I'm an AI assistant trained on Rui's background. How can I help you evaluate his fit for your team?