Why does every ML paper feel impossible to read at the start

I open a new paper, and the first page already feels like a wall. Not the equations, but the language “Without loss of generality”, “Convergence in distribution”, …

I spend more time googling terms than reading the actual idea.

Some say just push through, it’s just how it works, and I spend 3hr just to have basic annotations.

Others say only read the intro and conclusion. But how are you supposed to get value when 80 percent of the words are unclear.

And the dependencies of cites, dependencies of context. It just explodes. We know that.

Curious how people here actually read papers without drowning 🙂

more thoughts and work to be posted in r/mentiforce

Edit: Take an example, for Attention Is All You Need, there’s an expression of Attention(Q, K, V) = softmax(QK^T)V/root(dk). But the actual tensor process isn’t just that, it has batch and layers before these tensor multiplications.

So do you or domain experts around you really know that? Or is that people have to read the code, even for experts.

The visual graph does not make it better. I know the author tried their best to express to me. But the fact that I still don’t clearly know that makes my feeling even worse.

submitted by /u/Calm_Woodpecker_9433 to r/learnmachinelearning
[link] [comments]


Commentaires

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *