SMS · Data Platform
Initializing Data Platform…
BOOT_SEQUENCE000%
// intro.reelshreyas.mp4 · 00:00 loop

scroll to enter the data platform ↓

Shreyas Mysore Sudesh

DataEngineer.

Cloud-native ELT pipelines, CDC + SCD Type 2 lakehouse models, and governed analytics across Azure, AWS, and GCP.

Azure Databricks Delta Lake Snowflake · dbt Kafka Streaming
lakehouse · livev2026.06
SOURCEINGESTLAKETRANSFORMWAREHOUSEBI
streaming
10k/min
batch
100+ GB/day
warehouse
20 TB
0+
Years Experience
0+
ADF Pipelines
0+
Curated Tables
0%
Pipeline Success
0+ GB
Daily Processing
0 TB
Synapse Tuned
// 01 · about

Engineer of cloud-native data platforms.

At Tredence, I owned a 100+ GB/day Azure pipeline end-to-end across ingestion, transformation, data quality, and CI/CD — built on Azure Databricks and Delta Lake — and tuned a 20 TB Synapse warehouse to 25% faster queries. Experienced with dbt for modular SQL transformations, Kafka streaming ingest, Terraform provisioning, and Snowflake modeling across personal and open-source projects.

Location
United States
Email
shreyasmsudesh28@gmail.com
Phone
518-956-5361
Education
M.S. BA · UAlbany · 3.8
Certifications
  • Databricks Certified Data Engineer Associate
    Earned · 2024
  • Great Learning — Data Analytics Program
    Earned · 2022–2023
  • Microsoft DP-203 — Azure Data Engineer
    Status · Planned
  • Google Professional Data Engineer
    Status · Planned
// 02 · stack

Skill matrix.

Production-grade tooling I reach for across ingestion, transformation, storage, and serving.

Data Engineering
11
Azure Data FactoryDatabricksDelta LakePySparkSpark SQLAWS GluedbtApache AirflowCDCSCD Type 2Delta MERGE
Streaming & Integration
05
Apache KafkaGoogle Pub/SubSpark Structured StreamingEvent-Driven ArchStream-to-Batch
Cloud & Infrastructure
08
AzureAWSGCPTerraformAzure DevOpsGitHub ActionsAWS LambdaStep Functions
Data Warehousing
06
Azure SynapseSnowflakeBigQueryRedshiftPostgreSQLOracle
Programming & SQL
08
PythonSpark SQLT-SQLPostgreSQLOracle SQLMySQLGitJira
ML & BI
06
XGBoostBigQuery MLScikit-learnPower BITableauLooker Studio
Governance & Quality
08
Unity CatalogData LineageRBACData ContractsSodadbt testsReconciliationDQ Monitoring
// 03 · experience

Production deployments, not demos.

Three roles, one through-line: cloud-native pipelines that scale and stay reliable in production.

01

Data Engineer @ Tredence Inc.

Bengaluru, India · Mar 2023 — Aug 2024
ORACLEADFDATABRICKSDELTA LAKESYNAPSEPOWER BI
  • 98% pipeline success across 150+ curated tables; 100+ ADF pipelines moving 100+ GB/day of retail data from Oracle and cloud sources into Databricks + Delta Lake.
  • Built Bronze / Silver / Gold Delta layers in PySpark + Spark SQL standardizing customer, transaction, tender, discount, and store datasets for reporting.
  • CDC-aware incremental ingestion with Delta MERGE/upsert patterns to keep high-volume curated tables synchronized without costly full reloads.
  • SCD Type 2 dimensional logic preserving historical attribute values while exposing current-state records to downstream analytics and BI.
  • Cut critical ETL runtime 89% (3h → 20m) and pushed throughput 4 → 5 GB/min via join tuning, partitioning, predicate pushdown, shuffle optimization, and right-sizing.
  • +25% query performance on a 20 TB Azure Synapse warehouse via dimensional modeling, partitioning, strategic indexing, and statistics maintenance.
  • Validation & reconciliation with SQL/PySpark and Soda-style rules catching schema, null, duplicate, referential-integrity, and row-count issues pre-BI.
  • −45% audit effort via Unity Catalog automated runtime lineage and RBAC across raw → reporting layers.
  • −40% production deploy errors by automating Databricks + ADF CI/CD on GitHub Actions and Azure DevOps.
  • Partnered with 30+ analytics and ML stakeholders to define data contracts and sustain strict SLA-bound reporting.
  • Delivered 20+ Power BI and Tableau dashboards on the customer-360 gold layer, tracking in-store, online, and delivery-partner activity across departments.
98%
Success Rate
100+
Pipelines
150+
Tables
20 TB
Synapse
20+
Dashboards
3h → 20m
Runtime
02

Research Project Assistant @ SUNY Research Foundation

Albany, NY · Apr 2025 — May 2026
SOURCESADFDATABRICKSCURATEDRESEARCHERS
  • Accelerated pipeline runtime 75% (4h → 1h) for 70 GB/week of research data on ADF + Databricks with Python, PySpark, Spark SQL.
  • Cut data delivery turnaround 50% — next-business-day to same-day — by streamlining requirements with faculty researchers.
  • 100% pipeline reproducibility & auditability for 90+ researchers and students via end-to-end lineage and documented workflows.
70 GB
Weekly
4h → 1h
Runtime
90+
Researchers
100%
Reproducible
03

Associate Data Engineer @ Askar Microns Pvt Ltd

Mysore, India · Sep 2021 — Feb 2023
ERPGLUES3ATHENAREDSHIFTTABLEAU
  • +25% data availability and −30% Redshift load delay across 30+ AWS Glue ETL pipelines unifying ERP, production, procurement, finance, sales, and marketing data.
  • Report prep 1 week → < 3h (−95%) via self-service Tableau and Power BI KPI dashboards adopted across 7+ business functions.
  • −35% ETL infra overhead by migrating cron jobs to serverless Lambda + Step Functions orchestrating Glue.
  • +40% faster ad-hoc analysis by exposing the S3 data lake through AWS Athena ahead of Redshift loads.
  • Authored SQL transformation logic and reporting datasets for procurement, production, inventory, finance, sales, marketing, and executive KPI tracking.
30+
Glue Pipelines
7+
Functions
1wk → 3h
Reporting
−35%
ETL Overhead
// 04 · projects

Featured platforms.

Open-source and personal builds that mirror enterprise patterns — Medallion, dbt, exactly-once streaming.

PROJECT · 01Medallion · ML · GCP

NYC 311 Active Intelligence Pipeline

Event-driven Medallion architecture processing 4.4M+ service requests in BigQuery, with Census ACS geospatial joins and an XGBoost model.

TERRAFORMPUB/SUBBIGQUERYBQ MLLOOKER
4.4M+
Records
R² 0.786
Model Fit
ACS
Geospatial
PROJECT · 02dbt · Snowflake · Airflow

Snowflake Analytics Engineering

dbt pipeline with staging / intermediate / mart layers and 40+ automated quality tests across 5M+ subscription and attribution records.

SOURCESDBTSNOWFLAKEAIRFLOWANALYTICS
5M+
Records
25
dbt Models
40+
DQ Tests
PROJECT · 03Kafka · Spark · Delta

Real-Time Streaming Platform

Exactly-once Kafka-to-Delta streaming with sub-5s end-to-end latency, processing 10,000 events/minute across 3 topics.

KAFKASPARK STREAMINGBRONZESILVER
10k/min
Events
< 5s
Latency
3
Topics
// 05 · architecture

Reference architectures.

Switch between the three platforms I've shipped most. Hover any node for the spec.

Azure Lakehouse
ADFDATABRICKSDELTA LAKESYNAPSEPOWER BI
node inspector

Hover a node to inspect responsibilities.

// 06 · resume

One PDF, full archive.

latest resume

Shreyas Mysore SudeshData Engineer

Full role history, project metrics, certifications, and the complete tech stack in one PDF.

  • FORMATPDF · A4
  • UPDATED2026 · Q2
  • ROLES3 · 3+ years
  • CERTS1 active · 2 planned
  • EDUM.S. BA · UAlbany

// contact

Let's build something data-driven.

Shreyas Mysore Sudesh — Data Engineer

Shreyas Mysore Sudesh

Data Engineer

United States