Waseem Khan
Full-Stack Engineer & AI Specialist
Fulbright Scholar and Carnegie Mellon graduate building intelligent systems at the intersection of AI, law, and public policy.
About
I have spent over fifteen years building web applications, machine learning pipelines, and intelligent document processing systems across four countries and three continents.
My work ranges from SaaS platforms that secured multi-million dollar contracts to AI legal research systems, Android apps with over a million downloads, and training data for ChatGPT and Gemini. I care about building systems that make a measurable difference.
At Pakistan's Inland Revenue Service, I built tools that increased audit throughput from 20 cases to 200 per month and identified PKR 800 million in potential tax revenue through data analytics.
Experience
AI Engineer
Local Law Firm, Islamabad
2025 — Present
Building an AI-powered legal research platform for Pakistani court judgments. Combines dense vector search, BM25 sparse retrieval, and Neo4j graph traversal.
- Built intelligent PDF extraction pipeline reducing LLM OCR costs by 80-90%
- Implemented Reciprocal Rank Fusion with Jina Reranker for hybrid retrieval
- Built citation network in Neo4j for automatic precedent discovery
Senior LLM Trainer & Reviewer
Mercor.com, USA
2024 — Present
Contributing to training data for frontier models including ChatGPT 5.2 Deep Research and Gemini 3.0 Deep Research.
- Crafted 100+ PhD-level CS research questions with golden solutions and rubrics
- Fine-tuned data science training data and code optimizations for SOTA models
ML Engineer & Back-end Developer
Turing.com, Palo Alto, CA
2024 — 2025
Built training data pipelines and agentic tools for Google and Apple.
- Crafted 500+ Python training samples for Google Gemini 2.0
- Built backend pipeline for training data verification and automated ingestion
- Created agentic workflow tools for Apple's internal LLM platform
Full-Stack Developer
Off The Line, Pittsburgh, USA
2023 — 2024
Built a SaaS chef management portal from scratch that helped secure a multi-million dollar UPMC contract.
- Full-stack MVP using React, TailwindCSS, Laravel, Django, MySQL
- CI/CD pipeline via GitHub Actions reducing deployment time by 50%+
- Fine-tuned Llama-2-7b for medically tailored recipe suggestions using RAG
Deputy Commissioner
Inland Revenue Service, Pakistan
2018 — 2025
Applied data analytics and automation to transform government tax audit processes.
- Built automated sales tax auditor using OCR and PDF extraction — 10x audit throughput
- Identified 30,000+ tax evasion cases, PKR 800M estimated revenue increase
Web Developer & Team Leader
Bureau of Emigration, Pakistan
2016 — 2018
- Led PKR 120M federal project, completed 30% ahead of schedule
- Automated registration system increasing processing efficiency by 60%
- Led dev team, conducted code reviews, mentored junior developers
Web Engineer
Freelancer.com
2008 — 2016
- Built 100+ web applications with 95% satisfaction and 5/5-star rating
- 40%+ repeat client rate; ranked Top 100 Coders of Pakistan
Android Developer
Google Play Store, Freelance
2017 — 2022
- Document Scanner with deep learning — 100K downloads in first year
- Vehicle Verification app ranked #2 in category — 1M+ downloads
Skills
Web Development
HTML5, CSS3, TailwindCSS, JavaScript, TypeScript, React + Redux, Vue.js, Laravel, Django, Flask, FastAPI, Nginx
Cloud & DevOps
AWS, GCP, Cloudflare, GitHub Actions CI/CD, Docker, Debian
Machine Learning & AI
LLM Fine-tuning, RAG, Prompt Engineering, Scikit-learn, TensorFlow, Deep Learning, CNN, Agentic Workflows
OCR & Data Extraction
Tesseract, Google Vision, docTR, PyMuPDF, pdfminer.six, BeautifulSoup, Scrapy, Selenium
Languages
Python, JavaScript, TypeScript, PHP, Java, C++, SQL
Databases
PostgreSQL, MySQL, Neo4j, Qdrant, NoSQL, SQLite3, NumPy, Pandas
Tools & Workflow
Jira, Asana, Figma, Slack, Jupyter, Android Studio, UI/UX Design, Agile
Communication
Proficient English (written & verbal), Stakeholder Management, Requirements Gathering
Projects
2025
AI Legal Research Platform
Hybrid retrieval system for Pakistani court judgments combining vector search, BM25, and graph-based citation traversal. Selective LLM OCR reduces costs by 80-90%.
Python · Neo4j · Qdrant · FastAPI · Gemini
2023
meetotl.comOff The Line — Chef SaaS
Full-stack SaaS for medically tailored meal management. MVP helped secure a multi-million dollar UPMC pilot. Includes fine-tuned Llama-2-7b for recipe suggestions.
React · Laravel · Django · MySQL · AWS
2021
asaan.businessAsaan Business
SaaS accounting solution for Pakistani SMEs. Streamlined accounting for 100+ businesses with inventory, purchase/sales, and employee management.
Vue.js · Laravel · MySQL
CMU 2023
MeloMerge
Platform connecting music enthusiasts to browse and join local jams. Final project for CMU's #1 ranked web development course.
Flask · React · MySQL · TailwindCSS
100K+ downloads
Document Scanner App
Android app with deep learning edge detection trained on 1M+ images. Integrated Tesseract and Google Vision OCR for text extraction.
Java · TensorFlow · OCR · Android
2016
beoe.gov.pkBureau of Emigration Portal
PKR 120M federal government project. Automated registration system increasing processing efficiency by 60%, delivered 30% ahead of schedule.
Laravel · MySQL
CMU 2022
Air Quality Prediction — Allegheny County
Predicted air quality trends and respiratory disease correlations using supervised ML, with data from APIs, CSV, and web scraping. Visualized insights via Django dashboard.
Python · Scikit-learn · Django · PostgreSQL
Education
Carnegie Mellon University
MS in Public Policy with Data Analytics
2022 — 2024
Fulbright Scholar. ML for Public Policy, Web Development, Database Management, Decision Analytics, Data Visualization, Statistics.
NUST, Islamabad
MS in Computer Science
2013 — 2015
Intelligent Information Systems. Neural Networks, Data Mining, Advanced Operating Systems.
UET Taxila
BS in Electrical Engineering
2007 — 2011
Computer Architecture, Circuit Analysis, Electronics, Power Systems.
Recognition
US Fulbright Scholarship
2022 — 2024
Heinz Dean's List, Carnegie Mellon University
2023
Top 100 Coders of Pakistan
2012 — 2018
Best Mathematical Modelling Project
2015
Contact
Interested in working together? I am always open to discussing new projects, technical challenges, or opportunities.