Senior Data Engineer with 6+ years delivering enterprise-scale pipelines, ETL/ELT workflows, and cloud analytics platforms in regulated financial services. Proven in PySpark, Hadoop, Kafka, and AWS — from raw ingestion to ML-ready feature stores. Founder of Queuella LLC, a live multi-tenant SaaS serving customers across the US, Nepal, and India.
Georgetown, TX · dipen.p.yadav@gmail.com · 737-298-5520
Core Competencies
Professional Experience
Senior Data Engineer
Liberty Mutual Insurance Company
Remote | Austin, TX
- ●Architected end-to-end enterprise data pipelines using PySpark, Hadoop, and Hive on AWS EMR, processing multi-TB insurance datasets to power analytics, reporting, and ML workloads.
- ●Spearheaded migration of core insurance data systems to AWS (S3, EMR, Glue, Redshift), designing ETL/ELT workflows that reduced data processing errors and testing cycles by 85%.
- ●Implemented data quality and governance frameworks — schema validation, null-rate monitoring, row-count reconciliation, and data lineage tracking — ensuring audit compliance in regulated environments.
- ●Designed star-schema and dimensional data models supporting Tableau and Power BI dashboards across underwriting, claims, and finance functions.
- ●Partnered with data scientists to build ML-ready feature pipelines ensuring point-in-time correctness and feature consistency between training and inference environments.
- ●Led and mentored a cross-functional Agile team, driving monthly insurance reports and data extracts with structured stakeholder reporting.
- ●Optimized Spark job performance via partition tuning, broadcast joins, data skew mitigation, and strategic caching — reducing runtimes and cluster costs for mission-critical batch workloads.
Founder & Engineering Lead
Queuella — Service Operations SaaS ↗
2024 – PresentGeorgetown, TX
- ●Architected and directed development of a production multi-tenant SaaS platform (Next.js 16 / TypeScript / PostgreSQL) with a native Swift iOS companion app serving paying customers across the US, Nepal, and India.
- ●Designed a multi-service AWS backend: S3, SQS, SNS/SES, Cognito (including zero-data-loss migration from Clerk), ECS, Bedrock, and EventBridge.
- ●Integrated Square API across five surfaces (Catalog, Team Members, Labor, Locations, Payments) enabling automatic real-time sync of a client's full business profile on integration.
- ●Built a native iOS app (Swift, MVVM) with SSE streaming, APNs push notifications, Keychain auth, and deep link routing — published and serving live customers.
- ●Delivered full Stripe billing infrastructure: subscriptions, webhooks, usage-based billing, and multi-location pricing with a custom superadmin pricing editor.
- ●Established a production-grade testing strategy: Vitest unit suites across 8 domains and 20+ Playwright E2E suites covering auth, kiosk privacy, scheduling, and billing.
Engineering Lead — Business Automation
iBrows Studio | Multi-Location Beauty Brand
Georgetown, TX
- ●Built a fully automated multi-location payroll processing system (Python, AWS Lambda, EventBridge cron) eliminating all manual payroll processing across bi-monthly pay cycles.
- ●Engineered a Square API pipeline consuming Labor API (timecards + break deduction), Team Members API, and Payments API — transforming raw data into fully computed payroll using pandas with per-employee rules.
- ●Built a timezone-aware timecard anomaly detection engine flagging clock-in/out violations, oversized shifts, and probable auto-clockouts with configurable grace windows.
- ●Delivered a three-tier email notification system: branded HTML pay stubs, timecard exception reports, and a full manager summary with payroll totals and outstanding issues.
Data Engineer
Cognizant Technology Solutions
Austin, TX
- ●Built a real-time pipeline for high-volume social media stream analysis using Apache Kafka and Spark Streaming, end-to-end from ingestion to output sink.
- ●Conducted large-scale COVID-19 data analysis on AWS EMR producing statistical insights and predictive model inputs from multi-source healthcare datasets.
- ●Developed Power BI dashboards for call center analytics, integrating multiple data sources and delivering BI reporting for business stakeholders.
Student Research Assistant
Texas Tech University
Lubbock, TX
- ●Automated data extraction and processing pipelines from large-scale system log files using Python and Bash scripting, improving data accuracy and enabling structured analytical workflows.
Technical Skills
Languages
Big Data
Cloud / AWS
Databases
APIs / Integration
AI / LLM
ETL / ELT
Data Quality
ML Frameworks
BI / Viz
DevOps / Other
Education
M.S. Data Science
University of Texas at Austin
Aug 2025B.S. Computer Engineering
Texas Tech University
May 2020Licenses & Certifications
Academy Accreditation — Databricks Lakehouse Fundamentals
Issued Oct 2022
Advances in Deep Learning
Issued Jan 2026
Design Principles & Causal Inference
Issued Dec 2024
Data Structures & Algorithms
Issued Dec 2024
Optimization
Issued May 2025
Deep Learning
Issued Aug 2024
Data Science for Health Discovery and Innovation
Issued May 2024
Foundations of Regression and Predictive Modeling
Issued May 2024
Machine Learning
Issued Dec 2021
Probability and Simulation-Based Inference
Issued Nov 2021
Get In Touch
Open to New Opportunities
Available for senior data engineering, staff engineering, and founding engineer roles.


