Najczęściej f to funkcja softmax:
$$ \text{softmax}(\mathbf{y})_{i,t} = \frac{e^{y_{i,t}}}{\sum_{j=1}^K e^{y_{j,t}}} $$
Źródło: "Language Models are Unsupervised Multitask Learners" A. Radford, J. Wu, R. Child, D. Luan,D. Amodei, I. Sutskever
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">This is mind blowing.<br><br>With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you.<br><br>W H A T <a href="https://t.co/w8JkrZO4lk">pic.twitter.com/w8JkrZO4lk</a></p>— Sharif Shameem (@sharifshameem) <a href="https://twitter.com/sharifshameem/status/1282676454690451457?ref_src=twsrc%5Etfw">July 13, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
This is mind blowing.
— Sharif Shameem (@sharifshameem) July 13, 2020
With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you.
W H A T pic.twitter.com/w8JkrZO4lk
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">A demo of the attention mechanism of DeepMind's AlphaCode as it completes a coding question.<br><br>Now consider having 100s of browser tabs open and the attention corresponded to clicking on buttons and keyboard keys. <a href="https://t.co/mU0Cywm9N3">pic.twitter.com/mU0Cywm9N3</a></p>— dave (@dmvaldman) <a href="https://twitter.com/dmvaldman/status/1602326660220600324?ref_src=twsrc%5Etfw">December 12, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
A demo of the attention mechanism of DeepMind's AlphaCode as it completes a coding question.
— dave (@dmvaldman) December 12, 2022
Now consider having 100s of browser tabs open and the attention corresponded to clicking on buttons and keyboard keys. pic.twitter.com/mU0Cywm9N3
Implementacja wraz z wyjaśnieniem: https://threadreaderapp.com/thread/1470406419786698761.html