available_for_hire == true

> From raw data to decisions.

Data engineer based in Germany. I own the full data stack — ingest, warehouse, pipelines, LLM automation, and BI — so small teams get numbers they can actually trust.

01

about.

I'm Utkarsh, a data engineer who likes the messy middle: the place where business problems become schemas, schemas become pipelines, and pipelines become decisions.

Right now I'm the entire data & automation function at Noritual Lab, a matcha e-commerce startup in Berlin. Solo DE reporting to the founder. I picked the stack, stood up the warehouse, and own everything from CREATE SCHEMA to the KPI dashboard founders open every Monday. Before that I worked on agentic AI research at BioMed X (Heidelberg), and I hold an MS in Applied Data Science & Analytics from SRH Heidelberg (GPA 1.9).

I care about systems that are actually used: small, debuggable, documented, and cheap to run. In a typical week I'm writing DAGs, shipping Django, generating the monthly P&L, and tuning a Claude-API invoice classifier — all in the same repo.

// now running Noritual's monthly P&L · scaling the invoice pipeline · open to data eng & AI eng roles.

02

stack.

tools I use daily, weekly, or know well enough to ship with.

Python PostgreSQL BigQuery SQL Cloud Run Cloud Scheduler GCS Docker GitHub Actions Secret Manager Cloud SQL Pub/Sub
Claude API LangChain LangGraph Milvus LangSmith RAG HuggingFace Whisper PyTorch XGBoost scikit-learn
Django Streamlit Power BI Tableau Apps Script Notion API Amazon SP-API Shopify API Airflow dbt pandas Pydantic FastAPI React Electron TypeScript
03

how I work.

a few things I believe about building data systems.

i.

boring beats clever.

An ugly cron job in production beats an elegant DAG in a notebook. Ship small, iterate, log everything.

ii.

pipelines serve a question.

Every pipeline exists to answer something someone actually asks, not to look good in a diagram.

iii.

docs are part of done.

A README, a runbook, a diagram. Every project. If the next engineer can't find it, it doesn't exist.

04

selected work.

seven pieces. from production pipelines to research code.

01
noritual lab · production · 2025

kpi dashboard

case study →

Full internal web app, end-to-end with one teammate in ~2 months. Scoped requirements with the team, designed a star-schema Postgres backend, built a Django UI with CRUD, role-based access (user / admin / founder), and forgot-password flow. Ingests from Notion API, Google Sheets, and direct UI inputs.

→ single source of truth for founders' weekly review meetings. 10–12 active users.

  • Django
  • PostgreSQL
  • Power BI
  • Notion API
  • GCP VM
  • RBAC
fct_kpi key_result · value · period dim_dateday · month · qtr dim_orgteam · level dim_objectiveid · name · target dim_userrole: u / a / f // star schema · postgres
02
noritual lab · production · 2026

invoice automation

case study →

Processes ~1,000 transactions/month: classifies which need invoices (~25% do), then matches each one to its PDF with confidence-scored Claude API screening. Hive-style partitioned GCS layout (raw → accepted → trash) for clean backfills and audit. Replaced accountant fees previously spent on transaction-linking.

→ the hardest problem I'm proud of solving — invoice/PO matching at scale.

  • Claude API
  • Python
  • GCS
  • Cloud Scheduler
  • Gmail API
  • pdfplumber
Gmail PDFs Txn ledger Claude screen + score Matcher conf ≥ 0.92 accepted/ trash/ P&L // pipeline · ~1k tx/mo
03
noritual lab · production · 2025

p&l platform

case study →

Own the company's monthly P&L process since Aug 2025. Designed the P&L data model end-to-end, built a dedicated Postgres DB ingesting from Amazon SP-API, Shopify, and Sheets. Multi-currency FX conversion, three-way matching (PO ↔ goods receipt ↔ invoice), month-end close support.

→ replaced the founder's prior (inaccurate) self-built P&L with an owned, defensible monthly cadence.

  • PostgreSQL
  • BigQuery
  • Amazon SP-API
  • Shopify API
  • Python
  • dbt-style layering
€20k €15k €10k €5k Aug Sep Oct Nov Dec May revenue cogs net // monthly P&L · since aug 2025
04
noritual lab · v1.0 · 2026

pricing & margin app

case study →

Electron + React desktop tool for multi-channel pricing analysis. Three calculation modes, channel-specific COGS, editable fee configuration, and a real-time margin dashboard backed by Postgres.

→ real margin visibility per channel in one click.

  • Electron
  • React 19
  • MUI
  • PostgreSQL
  • Node
  • Jest
40% 30% 20% 10% amzn.de 38% amzn.uk 29% shopify 33% b2b 21% // margin % by channel
05
thesis · research · 2026

seoul bike forecasting

case study →

Master's thesis comparing 7 deep architectures (LSTM, TCN, TCN-LSTM, attention variants) for hourly demand forecasting. Best model: Multi-Scale TCN+LSTM at 88.83% R². Identified and corrected a data-leakage flaw in the original published benchmark.

→ academic rigor: research finding, not just a model exercise.

  • PyTorch
  • TensorFlow
  • XGBoost
  • scikit-learn
  • pandas
actual predicted R² = 88.83% // hourly demand · TCN+LSTM
06
side project · scoped · 2026

bollywood pulse

case study →

12-week flagship build: Airflow, dbt, multilingual NLP (Hindi, Punjabi, English), XGBoost trend prediction, Streamlit dashboard. Cross-platform fuzzy matching across Spotify, YouTube, Genius, with MLflow tracking and a silent GitHub Actions collector.

→ early-signal layer for trend bets, not post-hoc charts.

  • Airflow
  • dbt
  • HuggingFace
  • XGBoost
  • AWS Athena
  • Streamlit
Spotify API YouTube API Genius lyrics dbt fuzzy match · NLP XGB trend // ingest · model · serve
07
open source · 2025

speech2insight

case study →

5-stage pipeline taking audio to insight: Whisper transcription, sentiment (TextBlob + DistilRoBERTa), LSA topic modeling, T5 chunked summarization with BLEU/ROUGE eval. Dockerised with ffmpeg bundled, full CI/CD on GHCR.

→ open source · evaluated · reproducible.

  • Whisper
  • HuggingFace
  • Streamlit
  • Docker
  • GitHub Actions
audio .mp3 whisper stt sentiment distil topic LSA summary T5 insight .json // 5-stage pipeline · audio → insight
05

experience.

  1. May 2025 — current

    Data Engineer · Noritual Lab (Berlin)

    Solo data & automation engineer at a matcha e-commerce startup. Reports to the founder. Own the GCP/Postgres warehouse, the KPI dashboard (Django, 10–12 users), monthly P&L close, and a Claude-API invoice pipeline processing ~1,000 transactions/month. Vendor migrations saved ~€1,250/year.

  2. Oct 2024 — Mar 2025

    Research Student, Agentic AI · BioMed X (Heidelberg)

    Built a multi-agent system with LangGraph + LangChain for academic paper discovery and retrieval. RAG pipeline on Milvus, Streamlit front-end, Docker, CI/CD, LangSmith observability.

  3. Apr 2024 — Mar 2026

    MS Applied Data Science & Analytics · SRH Heidelberg

    GPA 1.9 (German scale · high distinction). Thesis on hybrid TCN-LSTM architectures for demand forecasting (best model 88.83% R²) — identified a data-leakage flaw in the published CUBIST benchmark paper.

  4. Jul 2019 — May 2023

    BE Computer Engineering · Mumbai University

    GPA 8.39 / 10.

06

contact.

Looking for a data engineer who ships? Or just want to swap notes on Airflow, LLMs, or German bureaucracy?

utkarsh.sawant21@gmail.com