How I scraped and analize 5.1 million jobs using LLaMA 7B

Juin 10, 2025

—

After graduating in Computer Science from the University of Genoa, I moved to Dublin, and quickly realized how broken the job hunt had become. Ghost jobs, reposted listings, shady recruiters… it was chaos.

So I decided to fix it. I built a scraper that pulls fresh jobs directly from 100k+ verified company career pages, and fine-tuned a LLaMA 7B model (trained on synthetic data from LLaMA 70B) to extract useful info from job posts: salary, remote, visa, required skills, etc.

The result? A clean, up-to-date database of 5.1M+ real jobs , a platform designed to help you skip the spam and get to the point: applying to jobs that actually fit you.

I also built a CV-to-job matching tool, just upload your CV, and it finds the most relevant jobs instantly. It’s 100% free and live now here

(If you’re still skeptical but curious to test it, you can just upload a CV with fake personal information, those fields aren’t used in the matching anyway.)

💬 Do you have any ideas or feedback on this project? I’d love to hear them!

💡 Got questions about how I built the agent, the matching algorithms, or the scraper? Ask away, I’m happy to share everything I’ve learned.

submitted by /u/Separate-Breath2267 to r/learnmachinelearning
[link] [comments]

How I scraped and analize 5.1 million jobs using LLaMA 7B

Commentaires

Laisser un commentaire Annuler la réponse