AI & Machine Learning
01AI / Orchestration

LangGraph Agentic Pipeline Orchestrator

Multi-step SPED data pipeline required manual sequential execution every week — steps were skipped, order was wrong, and failures were silent.

  • LangGraph StateGraph with gate node + conditional routing: downstream stages only run when ingestion succeeds
  • Fan-out parallelism to 4 concurrent stages (LRE, IEP compliance, predictions, LEA reports) with a custom TypedDict reducer for safe concurrent state writes
  • Local Ollama LLM (qwen2.5:7b) analyzes failure logs and suggests fixes — student data never leaves the server
  • Tkinter GUI monitor with real-time streaming log output for non-technical operators
LangGraphOllamaqwen2.5:7bPythonsubprocessTkinter
↑ 75–90% runtime reduction · replaced 4–6 hr manual weekly process with one command
02Machine Learning

6-Domain Student Outcome Prediction System

Districts only learned about poor student outcomes after the fact — no forward-looking indicators existed for proactive intervention.

  • 6-domain prediction: CAASPP ELA, CAASPP Math, ELPAC, Chronic Absenteeism, College/Career, Suspension — each with a dedicated analyzer class
  • Feature engineering from multi-year SQL Server data: prior year scores, year-over-year deltas, 3-year rolling means, trend slopes
  • Per-district model selection across 4 candidate models (Random Forest, Linear, ARIMA, Holt-Winters) ranked by R² on held-out validation data
  • OpenAI (GPT-4o-mini) + Gemini (gemini-1.5-flash) fallback chain generates plain-language administrative narratives with hallucination validation
scikit-learnstatsmodelsARIMAHolt-WintersOpenAI APIGemini APImatplotlib
Deployed 20+ districts · proactive intervention 4–6 months before year-end outcomes
03ML Forecasting

Per-School Suspension Rate Forecasting

Suspension rates varied wildly by school and had different seasonal patterns — a single global model was unreliable for planning intervention resources.

  • 5 models trained per school: ETS (Holt-Winters), ARIMA, Prophet, Random Forest, linear baseline — with MAE evaluation against 2024–2025 actuals
  • Ensemble of top 3 (ETS + ARIMA + Prophet) weighted by inverse validation error
  • Per-school comparison charts showing all model projections vs. actuals; summary CSV ranked by RMSE
  • ARIMA exception handling for schools with too-short history to converge
ProphetARIMAETSscikit-learnstatsmodelsmatplotlibseaborn
Ensemble outperformed any single model in 70%+ of schools tested
04AI / Algorithm

School Recommendation Engine

No systematic way existed to find nearby California schools that consistently outperform a given school across multiple indicators, while handling real-world data gaps correctly.

  • Haversine 30-mile radius search narrows a statewide dataset to geographically relevant candidates
  • Progressive multi-indicator filter: applies each available metric sequentially — schools without data for an indicator are excluded from that step only, not globally
  • Correctly distinguishes "no data" from "underperforming" — a critical design decision that naive threshold filtering misses
  • Folium interactive map output + ranked summary CSV by number of indicators outperformed
PythonpandasHaversineFoliumgeopygooglemaps
6 indicators · statewide coverage · data-gap-aware recommendation algorithm
Data Engineering & ETL
05Data Engineering

SEIS Input Data Pipeline

Weekly manual combining, normalizing, and routing of SEIS export files from 25 districts into a single warehouse-ready format.

  • Multi-source ingestion combining per-district SEIS CSV exports with configurable column normalization
  • JCCS merge: 4 sub-entities aggregated into a single district record before routing
  • File routing by LEA code to district-specific destinations; CLI stage flags for selective re-runs
Pythonpandaspathlibargparse
Eliminated weekly manual file processing across 25 LEAs
06ETL / Validation

CALPADS Extract Uploader

25+ districts' CALPADS extract files arrived with delimiter errors, wrong LEA codes, and encoding issues — manual upload was silently corrupting warehouse data.

  • Pre-flight LEA code validation in column 3 against expected values before any upload begins
  • Auto-fix: trailing delimiter detection with `--fix` (preview) and `--apply-fix` (correct in-place) CLI flags
  • Idempotent truncate+reload pattern: safe to rerun after partial failure; SQLAlchemy fast_executemany for bulk performance
  • Freshness upsert to SQL Server DataDateTime table after each successful load — visible in Power BI staleness indicators
PythonpandasSQLAlchemypyodbcargparse
Auto-fix resolved 80% of errors · eliminated silent data corruption
07ETL / Automation

CALPADS ODS Download Pipeline

CALPADS ODS reports needed regular download for 25+ LEAs across both REST API-accessible and portal-only formats, using two different credential sets.

  • REST API with token auth for 5 report types; Selenium browser automation for 2 portal-only SSRS reports
  • Dual-account LEA routing: `lea_data.py` maps each district to its assigned CALPADS account automatically
  • `--download` / `--upload` mode flags: re-run uploads without re-downloading already-retrieved files
PythonrequestsSeleniumpandasSQLAlchemy
25+ districts · 7 report types · replaced 45-min manual weekly session
08ETL / SFTP

SouthFork SIRAS Pipeline

A rural LEA's SIRAS export schema was entirely different from the county's required 80+ column format — requiring weekly manual field mapping, date normalization, and disability code translation.

  • Playwright downloads raw SEIS export in single unattended session
  • 80+ column pandas transform: disability code lookup dicts (30+ entries), grade level code translation, multi-format date normalization to SIRAS required format
  • Column count validation before output; Paramiko SFTP upload to county server
  • subprocess orchestrator chains all 3 stages with stage-level error logging
PythonPlaywrightpandasparamiko
Automated weekly SIRAS delivery for rural LEA · eliminated manual schema translation
09ETL / Automation

IVA SIS Extract Pipeline

Charter school's Aeries SIS CALPADS extracts required navigating a complex UI, unzipping downloads, renaming files to standard convention, and uploading via SFTP each cycle.

  • Playwright automation with Kendo UI-aware selectors navigates Aeries SIS, selects each CALPADS extract type, and downloads the generated ZIP
  • Subprocess unzip + regex-based rename to `IVA_<type>.txt` naming convention
  • Paramiko SFTP uploads all `IVA_*.txt` files to staging server; each stage independently runnable for reruns
PythonPlaywrightparamikozipfile
End-to-end automation replacing multi-step manual portal + upload workflow
10ETL / Format Engineering

BEST Academy STAS Attendance Pipeline

Charter school's SIS generated attendance in a proprietary CSV format entirely incompatible with the CALPADS STAS 23-field caret-delimited fixed format required for state submission.

  • Playwright downloads the attendance CSV from the SIS portal in an unattended session
  • 23-field STAS converter: maps source columns, derives `other_days` (total minus present minus absent), calculates dynamic school-year string
  • Paramiko SFTP delivers the formatted file; python-dotenv for credential management
PythonPlaywrightpandasparamikopython-dotenv
Automated weekly CALPADS attendance submission · replaced manual format conversion
11Data Quality

SEIS Data Corrector

SEIS exports contained incorrect district name assignments caused by student transfer timing, propagating errors into downstream compliance reports.

  • Modular pipeline: combine raw exports → correct via reference mapping → organize into per-district CSVs
  • Backup of originals before any correction; slugified output filenames for cross-platform compatibility
  • Reference mapping table updated independently of the pipeline code
Pythonpandaspython-slugify
Correct district assignments in all downstream compliance reporting
12Data Engineering

File Format Conversion Utilities

Education data files arrived in varying formats and encodings (CSV/TXT, UTF-8/CP1252) incompatible with legacy CALPADS upload systems.

  • CSV-to-TXT streaming converter: memory-efficient row-by-row processing for large files
  • UTF-8 to CP1252 re-encoder: handles characters outside the CP1252 range with configurable replacement
  • Multi-year historical CSV concatenator: combines annual exports with consistent header management
Pythonpandas
All file types uploadable to legacy CALPADS systems without manual intervention
13Data Quality

Extract Checker — Pre-Upload Validation

Mis-routed or misconfigured CALPADS extract files would silently overwrite correct warehouse data with no error raised during upload.

  • Pre-flight scan checks the LEA code in column 3 of every extract file against expected values for each of 17 districts
  • Generates a pass/fail report before any upload begins — bad files never reach the database
  • Config-driven: expected LEA codes maintained in a separate mapping file, not hardcoded
Pythonpandas
Prevented warehouse corruption · all bad files caught at validation stage
14Data Quality

Verify Dropouts — Enrollment Verification

Students exit-coded as dropouts may have re-enrolled elsewhere, inflating dropout rates in federal compliance reports — verifying each SSID manually against live CALPADS was taking 3–4 hours per LEA.

  • ThreadPoolExecutor parallel Selenium workers: each worker has its own ChromeDriver instance, logs to a per-worker queue
  • Deduplication runs before workers start; batch REST API write-back with chunked requests and single-item fallback on 404
  • Worker count and batch size configurable via environment variables; JSON audit logs with timing per worker
PythonSeleniumThreadPoolExecutorrequests
3–4 hr → 25–35 min per LEA · 8× speedup with 8 parallel workers
Compliance & Analytics
15Compliance

IEP Compliance Pipeline

21 LEAs were manually tracking IEP annual review and triennial re-evaluation deadlines in spreadsheets — missing a federal IDEA deadline is a compliance violation.

  • SQL query identifies students with IEPs due within configurable windows
  • Playwright downloads invitation and assessment plan PDFs from SEIS with MFA detection and handling
  • PDF date parser extracts meeting dates, assessment dates, and invitation dates across multiple form layouts
  • Computes days until/past deadline per student; applies green/yellow/orange/red/grey risk scoring based on encoded California IDEA rules (annual ≤365 days, triennial ≤3 years, assessment plan ≤60 days)
PythonPlaywrightpdfplumberopenpyxlSQLAlchemyPower BI
21 districts · zero missed IEP deadlines after deployment
16Compliance

LRE Compliance Reporting

15 LEAs needed IDEA Least Restrictive Environment placement percentages calculated from SEIS data and compared against California state benchmarks — a calculation that required deep understanding of federal LRE category definitions.

  • 4 federal LRE placement categories (≥80% GE time, 40–79%, <40%, separate school) correctly encoded from IDEA regulation, validated against known state reports
  • Preschool LRE uses different category definitions; JCCS 4 sub-entities aggregate separately before district rollup
  • Per-LEA Jupyter notebooks run headlessly via nbconvert runner; SQL Server write-back for Power BI
PythonpandasJupyternbconvertSQLAlchemy
15 districts · automated LRE reporting · replaced days-long manual Excel process
17Analytics

EAP At-Risk Student Identification

14 LEAs had no consistent method to identify students needing SST referrals, 504 plans, or SEL support — each coordinator used a different ad-hoc process.

  • Multi-source join: CAASPP assessment scores, ELPAC results, discipline records (suspension days), and SPED indicators at the student level
  • 3 policy-encoded risk flags: SST (academic performance thresholds), 504 (eligibility indicators), SEL (suspension ≥5 days)
  • Color-coded Excel reports via openpyxl; SQL Server write-back for Power BI weekly review dashboards
PythonpandasopenpyxlSQLAlchemypyodbc
14 LEAs · replaced 14 ad-hoc processes with one consistent system
18Analytics

Student Service Calculations & Predictions

15+ LEAs had no automated way to calculate current special education service hours by type (SLP, OT, PT) or project future service demand for resource planning.

  • Per-LEA Jupyter notebooks calculate service hours by type from SEIS data; nbconvert runner executes all notebooks headlessly in batch
  • Service hour projections using trend analysis for resource planning across the school year
  • openpyxl formatted reports; Selenium assists with data extraction from web portals where needed
PythonpandasJupyternbconvertopenpyxlSelenium
15+ LEAs · weekly automated service calculation replacing manual Excel aggregation
19Demo / BI

IEP Compliance Demo Environment

Client prospects needed to experience the full iTAAP platform without accessing production student data — a demo environment required matching compliance logic without real records.

  • Complete sanitized Power BI demo suite (10+ .pbix files) across all dashboard types, using synthetic data
  • Python script exports 60+ IEP compliance color-coding rules (Initial eval, Annual IEP, Tri-Annual) across 3 compliance dimensions to machine-readable legend CSV
  • Documentation generation: rule descriptions formatted for both human review and system reference
Power BI DesktopPythonpandasopenpyxl
Enabled client demos across all iTAAP dashboard types without production data exposure
Automation & Web Scraping
20Automation

SEIS Column Selection Automation

Each SEIS data export required manually clicking ~103 checkboxes in the SEIS portal per LEA, taking 30–60 minutes per cycle — and any missed column required starting over for that LEA.

  • 103 required columns externalized to `columns_needed.txt`; per-LEA credentials in git-ignored `credentials.json`
  • Selenium iterates through the column list, selects each checkbox, triggers export download, and saves with timestamps
  • Explicit waits (not fixed sleeps), retry logic for transient failures, and per-run timestamped log file
PythonSeleniumChromeDriver
30–60 min → ~8 min unattended · eliminated missed-column errors entirely
21Automation

CALPADS 8.1a ODS Report Scraper

CALPADS 8.1a Student Profile Exits report is embedded inside a nested SSRS iframe and cannot be accessed via API — download required manual portal navigation every week.

  • Playwright handles CALPADS login, outer page navigation, iframe frame-switching (nested context manager)
  • Dynamic date parameterization injected into SSRS report viewer; nested popup + download event capture
  • Custom date-stamped filename written to output directory on success
PythonPlaywright
Weekly CALPADS SSRS report automated · consistent date-stamped output
22Automation

SBCUSD SEIS Multi-Report Automation

A large school district needed 4 different SEIS report types downloaded weekly from different portal sections, with optional MFA and a cross-month datepicker that reset on navigation.

  • Single-session Playwright automation: MFA detection via regex match on page text before attempting bypass
  • Saved search URL execution for 2 report types; Service Tracker navigation for 1; MFA-aware general download for 1
  • Cross-month datepicker traversal: detects current displayed month and clicks navigation arrows as needed
  • Paramiko SFTP upload after all 4 downloads complete in one session
PythonPlaywrightparamiko
4 SEIS report types automated in a single unattended session for a large district
23Automation

Automated Weekly Email Alert System

18 school sites had no regular visibility into their performance metrics — administrators only saw data when they logged into Power BI, which many never did.

  • Gmail API OAuth2 pipeline (credentials.json → token.json) with scoped send-only permissions
  • Dynamic metric selection: rotates 1 General Ed + 1 SPED metric from a configurable pool of 12 indicators each week
  • SQL Server queries build HTML-formatted data tables with inline styling for email clients
  • Test mode: single flag sends to internal addresses only — safe for non-technical staff to test
PythonGmail APIOAuth2pandasSQLAlchemy
18 school sites · 12 metric pool · zero-touch weekly delivery
24Automation

SQL Stored Procedure Runner

14 stored procedures per district database needed sequential execution with a validation checkpoint in the middle — manual runs caused missed steps, wrong order, and no audit record.

  • Python CLI + Tkinter GUI runner executes 3 SP groups per database in correct sequence
  • VerifyDropOuts checkpoint between groups: if it fails, downstream SPs are blocked (prevents calculations on unverified dropout data)
  • Each SP execution is timed and logged to a timestamped CSV with stdout/stderr capture
PythonpyodbcTkinter
Zero missed SP steps or wrong-order executions since deployment
25Automation

Power BI Bulk Refresh Automation

150+ .pbix files across 6 directories needed regular refresh after each pipeline run — manually opening and refreshing each file in Power BI Desktop was taking hours.

  • PowerShell WinForms GUI with folder picker and progress display
  • Smart completion detection: monitors Power BI Desktop UI state (title bar text) + CPU-idle fallback for files that don't update the title
  • Resume-from-log: skips files already refreshed in a previous run; configurable concurrency limits to avoid memory exhaustion
PowerShellWinFormsWindows Process API
150+ .pbix files · single-click automated bulk refresh with audit log
26Web Scraping

ASHA Pro Finder Async Scraper

Districts needed to identify California SLP and audiologist contractors from the ASHA directory — no bulk export existed, requiring individual profile page visits for each of thousands of providers.

  • Phase 1: async Playwright paginates the Coveo search API with POST requests (up to 6,000 profiles), extracting IDs and metadata via BeautifulSoup regex matching on inconsistent HTML attributes
  • Phase 2: async profile scraping extracts 18 fields (name, certifications, phone, address, specialty, education) per page
  • Per-record error isolation: failed pages log a warning and continue — the full run never aborts on a single bad page
  • Results streamed to UTF-8-sig CSV incrementally; configurable concurrency limits to avoid rate limiting
asyncioPlaywrightBeautifulSoupPythonpandas
~6,000 SLP & audiologist profiles · 18 fields per record · complete California directory
Business Intelligence & Reporting
27Business Intelligence

Power BI Reporting Suite

25+ districts needed consistent, self-service reporting across 9 analytics domains — building and maintaining dashboards manually for each district was not scalable.

  • 9 dashboard types × 20+ districts = 150+ production .pbix files: iTAAP main, Mini, EAP, IEP Compliance, SPP, Student Service, Service Projection, SST/504, SchoolCard
  • Template-based deployment: changes propagate to all districts by updating the template and redistributing
  • All dashboards connect live to SQL Server; automated refresh via PowerShell automation after each pipeline run
Power BI DesktopSQL Server (live)DAXPythonPowerShell
150+ production dashboards · 25+ districts · 9 dashboard types · weekly automated refresh
28SQL / Analytics

SQL Server Stored Procedure Library

Complex compliance metrics needed consistent calculation logic across 94 district databases — ad-hoc queries per district produced inconsistent results that broke reporting.

  • 14 SP types replicated per district: chronic absenteeism with day-multiplier projection, IDEA Indicators 9–10 disproportionality (age filtering, ethnicity normalization, LRE bucketing, ~320KB), LRE calculation, geospatial comparable schools (geography data type), ELPAC, graduation cohorts, CCR
  • Window functions, dynamic SQL, statistical mode calculation for ethnicity majority determination
  • Geospatial comparable schools SP uses SQL Server geography type for radius-based school matching
T-SQLSQL Servergeography data type
~94 district SP instances · powers all Power BI dashboards · consistent compliance metrics
29Observability

Data Freshness Tracker

Power BI dashboard consumers had no way to know if they were looking at fresh or stale data — two silent pipeline failures went unnoticed until decisions were already made.

  • Python monitors file timestamps for all pipeline output directories using pathlib
  • Upserts last-modified timestamp to SQL Server DataDateTime table after each successful pipeline stage
  • Power BI reads the DataDateTime table live — each dashboard shows a "last updated" indicator in the corner
PythonpathlibpandasSQLAlchemy
Caught 2 silent pipeline failures before decisions were made on stale data
30Tooling

Power BI File Data Extractor

Power BI .pbix files are opaque ZIP containers — auditing embedded data models or dashboard structure required opening each file in Power BI Desktop manually, one at a time.

  • pbix_extractor.py treats .pbix as ZIP archive and uses pbixray to extract all embedded data model tables to CSV for programmatic audit
  • pbix_analyzer.py reads the Report/Layout JSON file inside .pbix — encoded as UTF-16-LE with a binary preamble — to extract visual structure, DAX measures, and report metadata
  • CLI design: directory-level batch processing across all .pbix files in a folder
Pythonpbixrayzipfilepandas
Programmatic .pbix audit without Power BI Desktop · supports 150+ file reporting suite
Software Engineering
31Go / Systems

SBCUSD Aeries API Go Client

12 Aeries SIS dataset types needed to be ingested into MongoDB for analytics — high-volume iteration over all schools required reliable per-entity error recovery that Python threading made complex.

  • Go CLI with `-fetch=all` and comma-separated selective mode; `flag` package with `regexp` validation of year format
  • Dynamic high school discovery: queries the loaded schools collection for `HighGradeLevel ≥ 9` — no hardcoded school lists, no maintenance when schools change
  • Per-school error recovery with idiomatic Go `(value, error)` returns — a single API failure logs and continues without aborting the full run
  • `init()` guard validates config placeholders before any network call; MongoDB ping confirms connectivity at startup
  • Idempotent `clearCollection` + `InsertMany` pattern safe to rerun without duplicate data
GoMongoDBnet/httpgo.mongodb.org/mongo-driverencoding/json
12 dataset types · dynamic school classification · zero maintenance · compiled single binary
32Web App

California School Radar Map

Administrators had no interactive way to find and compare nearby schools on multiple performance indicators geographically — static reports required a GIS analyst to produce and were instantly outdated.

  • Streamlit app backed by SQL Server CTE view joining 12 indicators (CAASPP, ELPAC, chronic absenteeism, suspension, graduation, CCR) across 10,000+ schools
  • Vectorized Haversine distance search using NumPy broadcasting: sub-second proximity filtering on the full dataset without a spatial index
  • UI: zip-code geocoding jump, click-to-move map centering, dynamic indicator selection from sidebar, Streamlit session state preserving position/selections across interactions
  • Radar chart renders any two selected schools side-by-side across all 6 performance indicators for direct comparison
PythonStreamlitPlotly MapboxNumPypandasSQL Server
10,000+ California schools mapped · live proximity filtering · sub-second performance