A 3-week pipeline: ETL → LLM tagging → Full-stack chat app. Built with Python, FastAPI, Ollama, and Gemini.
The Pipeline
Three weeks, one cohesive AI system. Each week builds on the last.
Job Listings ETL
Extract .mhtml archives → Bronze → Silver → Gold SQLite
AI Skill Tagger
LLM batch tagging + resume skill gap detection
KYouth Chat
Full-stack chat app with PDF resume analysis
# Run full ETL pipeline end-to-endpython main.py all # Or run individual stagespython main.py ingest # .mhtml → bronze HTMLpython main.py process # HTML → silver JSONpython main.py load # JSON → gold SQLitepython main.py profile # data quality report 🥉 Bronze: Extracted 100 files🥈 Silver: Processed 84 / Skipped 16🥇 Gold: Inserted 84 records --- DATA QUALITY REPORT ---Total Records: 84Missing Values → job_title: 0, company: 0Avg Description Length: 2654 charsJob Listings ETL
Parses raw .mhtml web archives through a 4-layer Medallion Architecture: Source → Bronze → Silver → Gold into SQLite. This ensures data integrity and high-fidelity extraction from non-standard formats.
AI-Powered Skill Gap Analysis
Sends job descriptions to Gemini/Ollama in batches of 3, extracts tech stacks with strict format validation, then cross-references your resume to surface missing skills.
uv run tag_data.py Analyzed Job 91347112: Java, Spring Boot, Python, REST APIs, CI/CDAnalyzed Job 91533584: PHP, Python, Node.js, MySQL, Docker, AWSAnalyzed Job 91554915: Python, Docker, GitHub Actions, PrometheusAnalyzed Job 91597624: Python, SQL, Google Cloud, AWS, PostgreSQLTotal tokens used: 2433, took 10486.325ms uv run find_skill_gaps.py gaps=['aws', 'docker', 'github actions', 'java', 'postgresql', 'prometheus', 'spring boot', 'sql', 'rest apis']Resume Helper Chat App
FastAPI backend with Jinja2 frontend. Upload a resume PDF to trigger real-time skill gap analysis. Switch between local Ollama models and cloud Gemini mid-conversation.
# Option A — Docker (recommended)docker compose up --build -ddocker exec -it week_3-ollama-1 ollama pull llama3.1 # Option B — Local devcd week_3/backenduv run uvicorn --app-dir src --host 0.0.0.0 --port 8001 app:app cd week_3/frontenduv run uvicorn --app-dir src --host 0.0.0.0 --port 8000 app:appHow to Run
Docker is the fastest path. Local dev for machines without enough resources for the Ollama container.
Docker
Recommendedcp .env.example .env# edit .env — add GEMINI_API if using Geminidocker compose up --build -ddocker exec -it week_3-ollama-1 ollama pull llama3.1# optionally pull more:# ollama pull gemma3 phi3 deepseek-r1:1.5bLocal Dev
ollama pull llama3.1cd week_3/backenduv syncuv run uvicorn --app-dir src --host 0.0.0.0 --port 8001 app:appcd week_3/frontenduv syncuv run uvicorn --app-dir src --host 0.0.0.0 --port 8000 app:appkyouth-project/|-- week_1/ # ETL pipeline| |-- main.py # entry point| |-- src/| | |-- ingestor.py # .mhtml → bronze| | |-- processor.py # bronze → silver| | |-- loader.py # silver → gold SQLite| | `-- profiler.py # data quality report| `-- data/ # source / bronze / silver / gold||-- week_2/ # LLM skill tagger| |-- tag_data.py # batch LLM tagging| |-- find_skill_gaps.py| `-- prompt_model.py # Gemini / Ollama adapter|`-- week_3/ # full-stack chat app |-- backend/ # FastAPI :8001 |-- frontend/ # Jinja2 :8000 |-- landing/ # Next.js :3000 `-- docker-compose.yml| Variable | Service | Default | Description |
|---|---|---|---|
| CHAT_MODEL | backend | llama3.1 | Fallback model if none selected in UI |
| GEMINI_API | backend | — | Google Gemini API key (cloud models only) |
| OLLAMA_HOST | backend | http://ollama:11434 | Ollama server URL |
| BACKEND_URL | frontend | http://backend:8000 | Backend service URL |
| DB_PATH | backend | data/jobs_d1.db | SQLite jobs database path |
Tech Stack
Every tool chosen for a reason. No framework soup.