Data Engineer @ Tredence Inc.
- 98% pipeline success across 150+ curated tables; 100+ ADF pipelines moving 100+ GB/day of retail data from Oracle and cloud sources into Databricks + Delta Lake.
- Built Bronze / Silver / Gold Delta layers in PySpark + Spark SQL standardizing customer, transaction, tender, discount, and store datasets for reporting.
- CDC-aware incremental ingestion with Delta MERGE/upsert patterns to keep high-volume curated tables synchronized without costly full reloads.
- SCD Type 2 dimensional logic preserving historical attribute values while exposing current-state records to downstream analytics and BI.
- Cut critical ETL runtime 89% (3h → 20m) and pushed throughput 4 → 5 GB/min via join tuning, partitioning, predicate pushdown, shuffle optimization, and right-sizing.
- +25% query performance on a 20 TB Azure Synapse warehouse via dimensional modeling, partitioning, strategic indexing, and statistics maintenance.
- Validation & reconciliation with SQL/PySpark and Soda-style rules catching schema, null, duplicate, referential-integrity, and row-count issues pre-BI.
- −45% audit effort via Unity Catalog automated runtime lineage and RBAC across raw → reporting layers.
- −40% production deploy errors by automating Databricks + ADF CI/CD on GitHub Actions and Azure DevOps.
- Partnered with 30+ analytics and ML stakeholders to define data contracts and sustain strict SLA-bound reporting.
- Delivered 20+ Power BI and Tableau dashboards on the customer-360 gold layer, tracking in-store, online, and delivery-partner activity across departments.

