how to handle queries without obvious keywords?

Hello r/learnmachinelearning ,

I’m working on a legal QA app and I’ve hit a bit of a roadblock. I generated embeddings using LegalBERT and set up retrieval, but I’m running into issues when testing.

Here’s the situation:
When I test relational quality, I try a question and check the top-5 retrieved results. If the query includes clear keywords, the system works well. But if the query is less explicit, the results are far off.

For example, suppose I ask:

The correct retrieval should be the Second Amendment, but unless I explicitly include the word “firearm” or “weapon”, my model doesn’t find it. Adding keywords makes it work (which makes sense), but this limits usability.

How can I handle cases where the user query doesn’t share an obvious keyword overlap with the underlying text? Are there effective techniques for this type of embedding problem?

submitted by /u/Interesting_Good8181 to r/learnmachinelearning
[link] [comments]


Commentaires

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *