AI Pipeline · Weeks 1–3

From Raw Data toAI-Powered Skill Gap Analysis

A 3-week pipeline: ETL → LLM tagging → Full-stack chat app. Built with Python, FastAPI, Ollama, and Gemini.

View on GitHub See the Pipeline

scroll

Overview

The Pipeline

Three weeks, one cohesive AI system. Each week builds on the last.

WEEK 1

Job Listings ETL

Extract .mhtml archives → Bronze → Silver → Gold SQLite

Complete

WEEK 2

AI Skill Tagger

LLM batch tagging + resume skill gap detection

Complete

WEEK 3

KYouth Chat

Full-stack chat app with PDF resume analysis

Complete

Week 1 — ETL Pipeline

main.pybash

# Run full ETL pipeline end-to-end
python main.py all
 
# Or run individual stages
python main.py ingest    # .mhtml → bronze HTML
python main.py process   # HTML → silver JSON
python main.py load      # JSON → gold SQLite
python main.py profile   # data quality report
 
🥉 Bronze: Extracted 100 files
🥈 Silver: Processed 84 / Skipped 16
🥇 Gold:   Inserted 84 records
 
--- DATA QUALITY REPORT ---
Total Records: 84
Missing Values → job_title: 0, company: 0
Avg Description Length: 2654 chars

Job Listings ETL

Parses raw .mhtml web archives through a 4-layer Medallion Architecture: Source → Bronze → Silver → Gold into SQLite. This ensures data integrity and high-fidelity extraction from non-standard formats.

0Jobs Processed

0%Schema Validated

Week 2 — LLM Analysis

AI-Powered Skill Gap Analysis

Sends job descriptions to Gemini/Ollama in batches of 3, extracts tech stacks with strict format validation, then cross-references your resume to surface missing skills.

find_skill_gaps.pybash

uv run tag_data.py
 
Analyzed Job 91347112: Java, Spring Boot, Python, REST APIs, CI/CD
Analyzed Job 91533584: PHP, Python, Node.js, MySQL, Docker, AWS
Analyzed Job 91554915: Python, Docker, GitHub Actions, Prometheus
Analyzed Job 91597624: Python, SQL, Google Cloud, AWS, PostgreSQL
Total tokens used: 2433, took 10486.325ms
 
uv run find_skill_gaps.py
 
gaps=['aws', 'docker', 'github actions', 'java', 'postgresql',
      'prometheus', 'spring boot', 'sql', 'rest apis']

0Jobs Tagged

0LLM Models

0Skill Gaps Found

Models

Geminillama3.1phi3deepseek-r1

Skill Gaps Found

awsdockergithub actionsjavapostgresqlprometheusspring bootsqlrest apis

Week 3 — Full-Stack App

Resume Helper Chat App

FastAPI backend with Jinja2 frontend. Upload a resume PDF to trigger real-time skill gap analysis. Switch between local Ollama models and cloud Gemini mid-conversation.

docker-composebash

# Option A — Docker (recommended)
docker compose up --build -d
docker exec -it week_3-ollama-1 ollama pull llama3.1
 
# Option B — Local dev
cd week_3/backend
uv run uvicorn --app-dir src --host 0.0.0.0 --port 8001 app:app
 
cd week_3/frontend
uv run uvicorn --app-dir src --host 0.0.0.0 --port 8000 app:app

Architecture

Browser
  └── Frontend :8000
        └── Backend :8001
              ├── Ollama :11434
              ├── Gemini API
              └── SQLite DB

PDF Upload

Attach any resume PDF — skills extracted automatically

Multi-Model

Switch llama3.1, gemma3, phi3, deepseek, Gemini mid-chat

Skill Gap Report

Matched vs missing skills returned inline in chat

Deployment

How to Run

Docker is the fastest path. Local dev for machines without enough resources for the Ollama container.

Prerequisites:Python 3.14DockerOllamauv8 GB RAMGemini API Key (optional)

Docker

Recommended

1Configure environment

cp .env.example .env
# edit .env — add GEMINI_API if using Gemini

2Build and start all containers

docker compose up --build -d

3Pull Ollama model (one-time)

docker exec -it week_3-ollama-1 ollama pull llama3.1
# optionally pull more:
# ollama pull gemma3 phi3 deepseek-r1:1.5b

4Open the app

http://localhost:8000

Local Dev

1Install prerequisites & pull model

ollama pull llama3.1

2Start backend

cd week_3/backend
uv sync
uv run uvicorn --app-dir src --host 0.0.0.0 --port 8001 app:app

3Start frontend

cd week_3/frontend
uv sync
uv run uvicorn --app-dir src --host 0.0.0.0 --port 8000 app:app

4Open the app

http://localhost:8000

Repository Structure

kyouth-project/
|-- week_1/               # ETL pipeline
|   |-- main.py           # entry point
|   |-- src/
|   |   |-- ingestor.py   # .mhtml → bronze
|   |   |-- processor.py  # bronze → silver
|   |   |-- loader.py     # silver → gold SQLite
|   |   `-- profiler.py   # data quality report
|   `-- data/             # source / bronze / silver / gold
|
|-- week_2/               # LLM skill tagger
|   |-- tag_data.py       # batch LLM tagging
|   |-- find_skill_gaps.py
|   `-- prompt_model.py   # Gemini / Ollama adapter
|
`-- week_3/               # full-stack chat app
    |-- backend/          # FastAPI  :8001
    |-- frontend/         # Jinja2   :8000
    |-- landing/          # Next.js  :3000
    `-- docker-compose.yml

Environment Variables

Variable	Service	Default	Description
CHAT_MODEL	backend	llama3.1	Fallback model if none selected in UI
GEMINI_API	backend	—	Google Gemini API key (cloud models only)
OLLAMA_HOST	backend	http://ollama:11434	Ollama server URL
BACKEND_URL	frontend	http://backend:8000	Backend service URL
DB_PATH	backend	data/jobs_d1.db	SQLite jobs database path