available_for_hire == true

> From raw data to decisions.

Data engineer based in Germany. I own the full data stack — ingest, warehouse, pipelines, LLM automation, and BI — so small teams get numbers they can actually trust.

see the work

about.

I'm Utkarsh, a data engineer who likes the messy middle: the place where business problems become schemas, schemas become pipelines, and pipelines become decisions.

Right now I'm the entire data & automation function at Noritual Lab, a matcha e-commerce startup in Berlin. Solo DE reporting to the founder. I picked the stack, stood up the warehouse, and own everything from CREATE SCHEMA to the KPI dashboard founders open every Monday. Before that I worked on agentic AI research at BioMed X (Heidelberg), and I hold an MS in Applied Data Science & Analytics from SRH Heidelberg (GPA 1.9).

I care about systems that are actually used: small, debuggable, documented, and cheap to run. In a typical week I'm writing DAGs, shipping Django, generating the monthly P&L, and tuning a Claude-API invoice classifier — all in the same repo.

// now running Noritual's monthly P&L · scaling the invoice pipeline · open to data eng & AI eng roles.

stack.

tools I use daily, weekly, or know well enough to ship with.

Python PostgreSQL BigQuery SQL Cloud Run Cloud Scheduler GCS Docker GitHub Actions Secret Manager Cloud SQL Pub/Sub

Claude API LangChain LangGraph Milvus LangSmith RAG HuggingFace Whisper PyTorch XGBoost scikit-learn

Django Streamlit Power BI Tableau Apps Script Notion API Amazon SP-API Shopify API Airflow dbt pandas Pydantic FastAPI React Electron TypeScript

how I work.

a few things I believe about building data systems.

boring beats clever.

An ugly cron job in production beats an elegant DAG in a notebook. Ship small, iterate, log everything.

ii.

pipelines serve a question.

Every pipeline exists to answer something someone actually asks, not to look good in a diagram.

iii.

docs are part of done.

A README, a runbook, a diagram. Every project. If the next engineer can't find it, it doesn't exist.

selected work.

seven pieces. from production pipelines to research code.

noritual lab · production · 2025

kpi dashboard

case study →

Full internal web app, end-to-end with one teammate in ~2 months. Scoped requirements with the team, designed a star-schema Postgres backend, built a Django UI with CRUD, role-based access (user / admin / founder), and forgot-password flow. Ingests from Notion API, Google Sheets, and direct UI inputs.

→ single source of truth for founders' weekly review meetings. 10–12 active users.

Django
PostgreSQL
Power BI
Notion API
GCP VM
RBAC

// star schema · postgres

noritual lab · production · 2026

invoice automation

case study →

Processes ~1,000 transactions/month: classifies which need invoices (~25% do), then matches each one to its PDF with confidence-scored Claude API screening. Hive-style partitioned GCS layout (raw → accepted → trash) for clean backfills and audit. Replaced accountant fees previously spent on transaction-linking.

→ the hardest problem I'm proud of solving — invoice/PO matching at scale.

Claude API
Python
GCS
Cloud Scheduler
Gmail API
pdfplumber

// pipeline · ~1k tx/mo

noritual lab · production · 2025

p&l platform

case study →

Own the company's monthly P&L process since Aug 2025. Designed the P&L data model end-to-end, built a dedicated Postgres DB ingesting from Amazon SP-API, Shopify, and Sheets. Multi-currency FX conversion, three-way matching (PO ↔ goods receipt ↔ invoice), month-end close support.

→ replaced the founder's prior (inaccurate) self-built P&L with an owned, defensible monthly cadence.

PostgreSQL
BigQuery
Amazon SP-API
Shopify API
Python
dbt-style layering

// monthly P&L · since aug 2025

noritual lab · v1.0 · 2026

pricing & margin app

case study →

Electron + React desktop tool for multi-channel pricing analysis. Three calculation modes, channel-specific COGS, editable fee configuration, and a real-time margin dashboard backed by Postgres.

→ real margin visibility per channel in one click.

Electron
React 19
MUI
PostgreSQL
Node
Jest

// margin % by channel

thesis · research · 2026

seoul bike forecasting

case study →

Master's thesis comparing 7 deep architectures (LSTM, TCN, TCN-LSTM, attention variants) for hourly demand forecasting. Best model: Multi-Scale TCN+LSTM at 88.83% R². Identified and corrected a data-leakage flaw in the original published benchmark.

→ academic rigor: research finding, not just a model exercise.

PyTorch
TensorFlow
XGBoost
scikit-learn
pandas

// hourly demand · TCN+LSTM

side project · scoped · 2026

bollywood pulse

case study →

12-week flagship build: Airflow, dbt, multilingual NLP (Hindi, Punjabi, English), XGBoost trend prediction, Streamlit dashboard. Cross-platform fuzzy matching across Spotify, YouTube, Genius, with MLflow tracking and a silent GitHub Actions collector.

→ early-signal layer for trend bets, not post-hoc charts.

Airflow
dbt
HuggingFace
XGBoost
AWS Athena
Streamlit

// ingest · model · serve

open source · 2025

speech2insight

case study →

5-stage pipeline taking audio to insight: Whisper transcription, sentiment (TextBlob + DistilRoBERTa), LSA topic modeling, T5 chunked summarization with BLEU/ROUGE eval. Dockerised with ffmpeg bundled, full CI/CD on GHCR.

→ open source · evaluated · reproducible.

Whisper
HuggingFace
Streamlit
Docker
GitHub Actions

// 5-stage pipeline · audio → insight

experience.

May 2025 — current

Data Engineer · Noritual Lab (Berlin)

Solo data & automation engineer at a matcha e-commerce startup. Reports to the founder. Own the GCP/Postgres warehouse, the KPI dashboard (Django, 10–12 users), monthly P&L close, and a Claude-API invoice pipeline processing ~1,000 transactions/month. Vendor migrations saved ~€1,250/year.
Oct 2024 — Mar 2025

Research Student, Agentic AI · BioMed X (Heidelberg)

Built a multi-agent system with LangGraph + LangChain for academic paper discovery and retrieval. RAG pipeline on Milvus, Streamlit front-end, Docker, CI/CD, LangSmith observability.
Apr 2024 — Mar 2026

MS Applied Data Science & Analytics · SRH Heidelberg

GPA 1.9 (German scale · high distinction). Thesis on hybrid TCN-LSTM architectures for demand forecasting (best model 88.83% R²) — identified a data-leakage flaw in the published CUBIST benchmark paper.
Jul 2019 — May 2023

BE Computer Engineering · Mumbai University

GPA 8.39 / 10.

contact.

Looking for a data engineer who ships? Or just want to swap notes on Airflow, LLMs, or German bureaucracy?

utkarsh.sawant21@gmail.com

> From raw data to decisions.

about.

stack.

how I work.

boring beats clever.

pipelines serve a question.

docs are part of done.

selected work.

experience.

Data Engineer · Noritual Lab (Berlin)

Research Student, Agentic AI · BioMed X (Heidelberg)

MS Applied Data Science & Analytics · SRH Heidelberg

BE Computer Engineering · Mumbai University

contact.