About Experience Skills Projects Education Contact

Waseem Khan

Full-Stack Engineer & AI Specialist

Fulbright Scholar and Carnegie Mellon graduate building intelligent systems at the intersection of AI, law, and public policy.

01

About

I have spent over fifteen years building web applications, machine learning pipelines, and intelligent document processing systems across four countries and three continents.

My work ranges from SaaS platforms that secured multi-million dollar contracts to AI legal research systems, Android apps with over a million downloads, and training data for ChatGPT and Gemini. I care about building systems that make a measurable difference.

At Pakistan's Inland Revenue Service, I built tools that increased audit throughput from 20 cases to 200 per month and identified PKR 800 million in potential tax revenue through data analytics.

02

Experience

AI Engineer

Local Law Firm, Islamabad

2025 — Present

Building an AI-powered legal research platform for Pakistani court judgments. Combines dense vector search, BM25 sparse retrieval, and Neo4j graph traversal.

  • Built intelligent PDF extraction pipeline reducing LLM OCR costs by 80-90%
  • Implemented Reciprocal Rank Fusion with Jina Reranker for hybrid retrieval
  • Built citation network in Neo4j for automatic precedent discovery
Python Neo4j Qdrant FastAPI Gemini

Senior LLM Trainer & Reviewer

Mercor.com, USA

2024 — Present

Contributing to training data for frontier models including ChatGPT 5.2 Deep Research and Gemini 3.0 Deep Research.

  • Crafted 100+ PhD-level CS research questions with golden solutions and rubrics
  • Fine-tuned data science training data and code optimizations for SOTA models
LLMs Data Science Python

ML Engineer & Back-end Developer

Turing.com, Palo Alto, CA

2024 — 2025

Built training data pipelines and agentic tools for Google and Apple.

  • Crafted 500+ Python training samples for Google Gemini 2.0
  • Built backend pipeline for training data verification and automated ingestion
  • Created agentic workflow tools for Apple's internal LLM platform
Python LLMs Agentic AI

Full-Stack Developer

Off The Line, Pittsburgh, USA

2023 — 2024

Built a SaaS chef management portal from scratch that helped secure a multi-million dollar UPMC contract.

  • Full-stack MVP using React, TailwindCSS, Laravel, Django, MySQL
  • CI/CD pipeline via GitHub Actions reducing deployment time by 50%+
  • Fine-tuned Llama-2-7b for medically tailored recipe suggestions using RAG
React Laravel Django AWS

Deputy Commissioner

Inland Revenue Service, Pakistan

2018 — 2025

Applied data analytics and automation to transform government tax audit processes.

  • Built automated sales tax auditor using OCR and PDF extraction — 10x audit throughput
  • Identified 30,000+ tax evasion cases, PKR 800M estimated revenue increase
Python AWS Glue OCR

Web Developer & Team Leader

Bureau of Emigration, Pakistan

2016 — 2018

  • Led PKR 120M federal project, completed 30% ahead of schedule
  • Automated registration system increasing processing efficiency by 60%
  • Led dev team, conducted code reviews, mentored junior developers
Laravel MySQL Leadership

Web Engineer

Freelancer.com

2008 — 2016

  • Built 100+ web applications with 95% satisfaction and 5/5-star rating
  • 40%+ repeat client rate; ranked Top 100 Coders of Pakistan
PHP Laravel Vue.js WordPress

Android Developer

Google Play Store, Freelance

2017 — 2022

  • Document Scanner with deep learning — 100K downloads in first year
  • Vehicle Verification app ranked #2 in category — 1M+ downloads
Java TensorFlow OCR Android
03

Skills

Web Development

HTML5, CSS3, TailwindCSS, JavaScript, TypeScript, React + Redux, Vue.js, Laravel, Django, Flask, FastAPI, Nginx

Cloud & DevOps

AWS, GCP, Cloudflare, GitHub Actions CI/CD, Docker, Debian

Machine Learning & AI

LLM Fine-tuning, RAG, Prompt Engineering, Scikit-learn, TensorFlow, Deep Learning, CNN, Agentic Workflows

OCR & Data Extraction

Tesseract, Google Vision, docTR, PyMuPDF, pdfminer.six, BeautifulSoup, Scrapy, Selenium

Languages

Python, JavaScript, TypeScript, PHP, Java, C++, SQL

Databases

PostgreSQL, MySQL, Neo4j, Qdrant, NoSQL, SQLite3, NumPy, Pandas

Tools & Workflow

Jira, Asana, Figma, Slack, Jupyter, Android Studio, UI/UX Design, Agile

Communication

Proficient English (written & verbal), Stakeholder Management, Requirements Gathering

04

Projects

2025

AI Legal Research Platform

Hybrid retrieval system for Pakistani court judgments combining vector search, BM25, and graph-based citation traversal. Selective LLM OCR reduces costs by 80-90%.

Python · Neo4j · Qdrant · FastAPI · Gemini

Off The Line — Chef SaaS

Full-stack SaaS for medically tailored meal management. MVP helped secure a multi-million dollar UPMC pilot. Includes fine-tuned Llama-2-7b for recipe suggestions.

React · Laravel · Django · MySQL · AWS

Asaan Business

SaaS accounting solution for Pakistani SMEs. Streamlined accounting for 100+ businesses with inventory, purchase/sales, and employee management.

Vue.js · Laravel · MySQL

CMU 2023

MeloMerge

Platform connecting music enthusiasts to browse and join local jams. Final project for CMU's #1 ranked web development course.

Flask · React · MySQL · TailwindCSS

100K+ downloads

Document Scanner App

Android app with deep learning edge detection trained on 1M+ images. Integrated Tesseract and Google Vision OCR for text extraction.

Java · TensorFlow · OCR · Android

Bureau of Emigration Portal

PKR 120M federal government project. Automated registration system increasing processing efficiency by 60%, delivered 30% ahead of schedule.

Laravel · MySQL

CMU 2022

Air Quality Prediction — Allegheny County

Predicted air quality trends and respiratory disease correlations using supervised ML, with data from APIs, CSV, and web scraping. Visualized insights via Django dashboard.

Python · Scikit-learn · Django · PostgreSQL

05

Education

Carnegie Mellon University

MS in Public Policy with Data Analytics

2022 — 2024

Fulbright Scholar. ML for Public Policy, Web Development, Database Management, Decision Analytics, Data Visualization, Statistics.

NUST, Islamabad

MS in Computer Science

2013 — 2015

Intelligent Information Systems. Neural Networks, Data Mining, Advanced Operating Systems.

UET Taxila

BS in Electrical Engineering

2007 — 2011

Computer Architecture, Circuit Analysis, Electronics, Power Systems.

Recognition

US Fulbright Scholarship

2022 — 2024

Heinz Dean's List, Carnegie Mellon University

2023

Top 100 Coders of Pakistan

2012 — 2018

Best Mathematical Modelling Project

2015

06

Contact

Interested in working together? I am always open to discussing new projects, technical challenges, or opportunities.