// intro.reelshreyas.mp4 · 00:00 loop

scroll to enter the data platform ↓

Shreyas Mysore Sudesh

DataEngineer.

Cloud-native ELT pipelines, CDC + SCD Type 2 lakehouse models, and governed analytics across Azure, AWS, and GCP.

View Projects→Download Resume

LinkedIn GitHub

› Azure Databricks› Delta Lake› Snowflake · dbt› Kafka Streaming

lakehouse · livev2026.06

streaming

10k/min

batch

100+ GB/day

warehouse

20 TB

Years Experience

ADF Pipelines

Curated Tables

Pipeline Success

0+ GB

Daily Processing

0 TB

Synapse Tuned

// 01 · about

Engineer of cloud-native data platforms.

At Tredence, I owned a 100+ GB/day Azure pipeline end-to-end across ingestion, transformation, data quality, and CI/CD — built on Azure Databricks and Delta Lake — and tuned a 20 TB Synapse warehouse to 25% faster queries. Experienced with dbt for modular SQL transformations, Kafka streaming ingest, Terraform provisioning, and Snowflake modeling across personal and open-source projects.

Location

United States

shreyasmsudesh28@gmail.com

Phone

518-956-5361

Education

M.S. BA · UAlbany · 3.8

Certifications

✓
Databricks Certified Data Engineer Associate
Earned · 2024
✓
Great Learning — Data Analytics Program
Earned · 2022–2023
○
Microsoft DP-203 — Azure Data Engineer
Status · Planned
○
Google Professional Data Engineer
Status · Planned

// 02 · stack

Skill matrix.

Production-grade tooling I reach for across ingestion, transformation, storage, and serving.

Data Engineering

Azure Data FactoryDatabricksDelta LakePySparkSpark SQLAWS GluedbtApache AirflowCDCSCD Type 2Delta MERGE

Streaming & Integration

Apache KafkaGoogle Pub/SubSpark Structured StreamingEvent-Driven ArchStream-to-Batch

Cloud & Infrastructure

AzureAWSGCPTerraformAzure DevOpsGitHub ActionsAWS LambdaStep Functions

Data Warehousing

Azure SynapseSnowflakeBigQueryRedshiftPostgreSQLOracle

Programming & SQL

PythonSpark SQLT-SQLPostgreSQLOracle SQLMySQLGitJira

ML & BI

XGBoostBigQuery MLScikit-learnPower BITableauLooker Studio

Governance & Quality

Unity CatalogData LineageRBACData ContractsSodadbt testsReconciliationDQ Monitoring

// 03 · experience

Production deployments, not demos.

Three roles, one through-line: cloud-native pipelines that scale and stay reliable in production.

Data Engineer @ Tredence Inc.

Bengaluru, India · Mar 2023 — Aug 2024

98% pipeline success across 150+ curated tables; 100+ ADF pipelines moving 100+ GB/day of retail data from Oracle and cloud sources into Databricks + Delta Lake.
Built Bronze / Silver / Gold Delta layers in PySpark + Spark SQL standardizing customer, transaction, tender, discount, and store datasets for reporting.
CDC-aware incremental ingestion with Delta MERGE/upsert patterns to keep high-volume curated tables synchronized without costly full reloads.
SCD Type 2 dimensional logic preserving historical attribute values while exposing current-state records to downstream analytics and BI.
Cut critical ETL runtime 89% (3h → 20m) and pushed throughput 4 → 5 GB/min via join tuning, partitioning, predicate pushdown, shuffle optimization, and right-sizing.
+25% query performance on a 20 TB Azure Synapse warehouse via dimensional modeling, partitioning, strategic indexing, and statistics maintenance.
Validation & reconciliation with SQL/PySpark and Soda-style rules catching schema, null, duplicate, referential-integrity, and row-count issues pre-BI.
−45% audit effort via Unity Catalog automated runtime lineage and RBAC across raw → reporting layers.
−40% production deploy errors by automating Databricks + ADF CI/CD on GitHub Actions and Azure DevOps.
Partnered with 30+ analytics and ML stakeholders to define data contracts and sustain strict SLA-bound reporting.
Delivered 20+ Power BI and Tableau dashboards on the customer-360 gold layer, tracking in-store, online, and delivery-partner activity across departments.

98%

Success Rate

100+

Pipelines

150+

Tables

20 TB

Synapse

20+

Dashboards

3h → 20m

Runtime

Research Project Assistant @ SUNY Research Foundation

Albany, NY · Apr 2025 — May 2026

Accelerated pipeline runtime 75% (4h → 1h) for 70 GB/week of research data on ADF + Databricks with Python, PySpark, Spark SQL.
Cut data delivery turnaround 50% — next-business-day to same-day — by streamlining requirements with faculty researchers.
100% pipeline reproducibility & auditability for 90+ researchers and students via end-to-end lineage and documented workflows.

70 GB

Weekly

4h → 1h

Runtime

90+

Researchers

100%

Reproducible

Associate Data Engineer @ Askar Microns Pvt Ltd

Mysore, India · Sep 2021 — Feb 2023

+25% data availability and −30% Redshift load delay across 30+ AWS Glue ETL pipelines unifying ERP, production, procurement, finance, sales, and marketing data.
Report prep 1 week → < 3h (−95%) via self-service Tableau and Power BI KPI dashboards adopted across 7+ business functions.
−35% ETL infra overhead by migrating cron jobs to serverless Lambda + Step Functions orchestrating Glue.
+40% faster ad-hoc analysis by exposing the S3 data lake through AWS Athena ahead of Redshift loads.
Authored SQL transformation logic and reporting datasets for procurement, production, inventory, finance, sales, marketing, and executive KPI tracking.

30+

Glue Pipelines

Functions

1wk → 3h

Reporting

−35%

ETL Overhead

// 04 · projects

Featured platforms.

Open-source and personal builds that mirror enterprise patterns — Medallion, dbt, exactly-once streaming.

PROJECT · 01Medallion · ML · GCP

NYC 311 Active Intelligence Pipeline

Event-driven Medallion architecture processing 4.4M+ service requests in BigQuery, with Census ACS geospatial joins and an XGBoost model.

4.4M+

Records

R² 0.786

Model Fit

ACS

Geospatial

PROJECT · 02dbt · Snowflake · Airflow

Snowflake Analytics Engineering

dbt pipeline with staging / intermediate / mart layers and 40+ automated quality tests across 5M+ subscription and attribution records.

5M+

Records

dbt Models

40+

DQ Tests

PROJECT · 03Kafka · Spark · Delta

Real-Time Streaming Platform

Exactly-once Kafka-to-Delta streaming with sub-5s end-to-end latency, processing 10,000 events/minute across 3 topics.

10k/min

Events

< 5s

Latency

Topics

// 05 · architecture

Reference architectures.

Switch between the three platforms I've shipped most. Hover any node for the spec.

Azure Lakehouse

node inspector

Hover a node to inspect responsibilities.

// 06 · resume

One PDF, full archive.

latest resume

Shreyas Mysore Sudesh — Data Engineer

Full role history, project metrics, certifications, and the complete tech stack in one PDF.

Download Resume ↓LinkedIn GitHub Email

FORMATPDF · A4
UPDATED2026 · Q2
ROLES3 · 3+ years
CERTS1 active · 2 planned
EDUM.S. BA · UAlbany

// contact

Let's build something data-driven.

available for opportunities

Shreyas Mysore Sudesh

Data Engineer

United States

direct channels

Prefer email? I usually respond within 24 hours.

Email Me Download Resume