Data Science · Etihad Airways Pilot Training

Professional Work.

Turning training operations data into decisions at Etihad Airways pilot training — from EBT/SBT analytics and predictive ML to full-stack platforms, NLP and compliance automation, across a fleet of 106+ aircraft. These are case-study overviews of internal work.

🔒 Overview & methodology only — no proprietary data, systems, or pilot information shown
544
Pilots analysed
15,351
Training records modelled
8
Internal tools built
32
Table TMS Oracle schema
01 · Training Analytics
🔒 Production · internal
TMS KPI Pipeline
ProblemNew-Joiner training outcomes were known only retrospectively, scattered across raw Oracle TMS exports.
ApproachA Python pipeline over the TMS Oracle schema that computes training KPIs on a rolling 24-month window and renders VP-level HTML + Excel dashboards, with root causes mapped to ICAO competency codes.
ImpactTurned ad-hoc exports into a repeatable management reporting pipeline with root-cause analysis across the New-Joiner cohort.
544 pilots · 15,351 records · 24-month rolling window
PythonOraclepandasPower BI
🔒 Production · internal
EBT Analytics — IRR & Taxonomy
ProblemInstructor grading consistency was hard to measure, and debrief data wasn't machine-readable against competencies.
ApproachA JSON taxonomy mapping task rows to the 9 ICAO competency codes, with inter-rater reliability measured by Gwet AC1 (chosen over Cohen's kappa to avoid the kappa paradox); AC1 below threshold auto-flags calibration.
ImpactMade instructor consistency measurable and auditable, with calibration flags built into VP dashboards and GCAA audit prep.
9 ICAO competencies · Gwet AC1 · ICAP controlled event
PythonStatisticsICAO EBTIRR
02 · Predictive & Adaptive
🔒 Research
Predictive Training
ProblemTraining risk is only visible after a check — too late to intervene.
ApproachAn ML pipeline that forecasts at-risk pilots before events — logistic baseline → gradient-boosted trees → sequence models — on engineered competency-trajectory, recency and fleet-history features.
SchematicOracle TMSfeature pipelinemodelrisk dashboard
ImpactShifts the organisation from retrospective to proactive — early, targeted instructor intervention.
scikit-learn baseline → XGBoost → sequence models
scikit-learnXGBoostFastAPIFeature eng.
🔒 Design
Adaptive Training Lifecycle
ProblemFixed annual/biannual EBT cycles train a steady high-performer and an at-risk pilot identically.
ApproachA lifecycle decision engine that adjusts training frequency and content from performance signals and predictive risk; triggers include score decline, competency gaps, recency and fleet transition. Rule-based v1, GCAA-aligned.
ImpactReplaces calendar-driven training with performance-indexed cadence — more touchpoints where they matter.
Rule-based engine · feeds the EBT Platform
Rules engineEBTRegulatory
03 · Platforms & Engineering
🔒 Production-oriented · internal
EBT Platform
ProblemCompetency-based assessment needs a real system — not spreadsheets and PDFs.
ApproachA full-stack platform: React + Vite + Tailwind (TypeScript) front end, FastAPI + Alembic back end, Dockerised with GitHub Actions CI — a scenario subsystem wired to a regulatory compliance engine.
ImpactThe intended host application for analytics, training recommendations, document management and the adaptive lifecycle view.
React + FastAPI · Docker · CI · compliance engine
ReactFastAPIDockerCI/CD
🔒 Design
Technical Writing Platform
ProblemTraining content — SOPs, scenarios, OCC/STC docs — lives in scattered Word/PDF/SharePoint with no version control or competency linking.
ApproachA docs-as-code authoring model: Markdown + metadata, version control, competency-code tagging and review workflows that render and export into the EBT Platform.
ImpactA single, versioned, competency-tagged source of training content feeding the platform.
Markdown + metadata · Git-versioned · competency-tagged
Docs-as-codeMarkdownGitSchema
04 · NLP & Compliance Automation
🔒 Production · internal
NLP Document Intelligence
ProblemSurfacing topic patterns and unmapped competencies across large training-document corpora by hand is impractical.
ApproachTF-IDF / NLP over A320 OCC and B787 STC documents — topic-frequency analysis, taxonomy building and detection of topics not yet mapped to competencies — output as HTML and Excel analytics.
ImpactTurned dense procedure documents into a structured, queryable competency taxonomy.
A320 OCC + B787 STC corpora · taxonomy + gap detection
scikit-learnTF-IDFpandasNLP
🔒 Production · internal
GCAA Licensing Dashboard
ProblemCrew-licensing attestation was a manual download-and-track chore prone to expiry gaps.
ApproachA Streamlit dashboard + automation toolkit — automated attestation downloads, a SharePoint pipeline, status tracking and PDF report generation; the visual system was reused for IRR/ICAP audit dashboards.
ImpactAutomated a manual compliance workflow end-to-end, with a reusable audit-grade reporting layer.
Streamlit · SharePoint automation · PDF reporting
StreamlitPythonSharePointAutomation
Confidentiality — This page reflects professional data-science and engineering work carried out for Etihad Airways pilot training. It presents methodology and overviews only: no pilot information, proprietary systems, live links, or training outcome data are shown. Scale figures denote analysis volume, not results. Not affiliated with, or endorsed by, Etihad Airways.