![[ICO]](/icons/blank.gif) | Name | Last modified | Size | Description |
![[PARENTDIR]](/icons/back.gif) | Parent Directory | | - | |
![[DIR]](/icons/folder.gif) | also/ | 2025-03-24 11:02 | - | |
![[DIR]](/icons/folder.gif) | knowledge/ | 2022-10-10 22:13 | - | |
![[ ]](/icons/layout.gif) | Bahdanau-2015.pdf | 2015-07-01 00:00 | 434K | Attention |
![[ ]](/icons/layout.gif) | Bengio-2003.pdf | 2003-07-01 00:00 | 137K | Language Models |
![[ ]](/icons/layout.gif) | Bottou-1990.pdf | 1990-07-01 00:00 | 1.4M | Modular Models |
![[ ]](/icons/layout.gif) | Bottou-1991.pdf | 1991-07-01 00:00 | 128K | Stochastic Gradient Descent (SGD) |
![[ ]](/icons/layout.gif) | Collobert-2011.pdf | 2011-07-01 00:00 | 415K | Language Models |
![[ ]](/icons/layout.gif) | Cybenko-1989.pdf | 1989-07-01 00:00 | 598K | Universal approximation |
![[ ]](/icons/layout.gif) | DeepSeek-2025.pdf | 2025-07-01 00:00 | 1.3M | |
![[ ]](/icons/layout.gif) | Devlin-2019.pdf | 2019-07-01 00:00 | 757K | Bidirectional Encoder Representation Transformer (BERT) |
![[ ]](/icons/layout.gif) | Dinh-2015.pdf | 2015-07-01 00:00 | 1.6M | Normalizing Flows, Invertible Neural Networks (INN) |
![[ ]](/icons/layout.gif) | Elman-1990.pdf | 1990-07-01 00:00 | 2.8M | Recurrent Networks (RNN) |
![[ ]](/icons/layout.gif) | Girshick-2014.pdf | 2014-07-01 00:00 | 6.2M | Object Localization, R-CNN |
![[ ]](/icons/layout.gif) | Glorot-2010.pdf | 2010-07-01 00:00 | 1.6M | Rectified Linear Units (ReLU), Xavier Initialization |
![[ ]](/icons/layout.gif) | Goodfellow-2014.pdf | 2014-07-01 00:00 | 518K | Generative Adversarial Networks (GAN) |
![[ ]](/icons/layout.gif) | He-2015.pdf | 2015-07-01 00:00 | 800K | Residual Networks (ResNet), Kaiming Initialization |
![[ ]](/icons/layout.gif) | Hinton-1981.pdf | 1981-07-01 00:00 | 2.0M | Parallel Distributed Processing (PDP) |
![[ ]](/icons/layout.gif) | Hinton-2005.pdf | 2005-07-01 00:00 | 827K | Deep Belief Networks |
![[ ]](/icons/layout.gif) | Hinton-2006.pdf | 2006-07-01 00:00 | 361K | Restricted Bolzmann Machines (RBM) |
![[ ]](/icons/layout.gif) | Hochreiter-1997.pdf | 1997-07-01 00:00 | 388K | Long Short-Term Memory (LSTM) |
![[ ]](/icons/layout.gif) | HubelWiesel-1959.pdf | 1959-07-01 00:00 | 1.8M | Biological Vision |
![[ ]](/icons/layout.gif) | Ioffe-2015.pdf | 2015-07-01 00:00 | 169K | Batch Normalization (Batchnorm) |
![[ ]](/icons/layout.gif) | Isola-2017.pdf | 2017-07-01 00:00 | 1.9M | Image-to-Image translation (Pix2Pix) |
![[ ]](/icons/layout.gif) | Kaplan-2020.pdf | 2020-07-01 00:00 | 2.4M | |
![[ ]](/icons/layout.gif) | Kingma-2013.pdf | 2013-07-01 00:00 | 3.7M | Variational Autoencoders (VAE) |
![[ ]](/icons/layout.gif) | Kingma-2015.pdf | 2015-07-01 00:00 | 571K | The ADAM optimizer |
![[ ]](/icons/layout.gif) | Kipf-2017.pdf | 2017-07-01 00:00 | 853K | Graph Neural Networks (GNN) |
![[ ]](/icons/layout.gif) | Kohonen-1972.pdf | 1972-07-01 00:00 | 1.1M | Associative Memory |
![[ ]](/icons/layout.gif) | Krizhevsky-2012.pdf | 2012-07-01 00:00 | 1.4M | AlexNet, GPU, ImageNet, Dropout |
![[ ]](/icons/layout.gif) | Krogh-1991.pdf | 1991-07-01 00:00 | 1.5M | Weight Decay |
![[ ]](/icons/layout.gif) | LeCun-1989.pdf | 1989-07-01 00:00 | 1.9M | Convolutional Networks (CNN) |
![[ ]](/icons/layout.gif) | Lettvin-1959.pdf | 1959-07-01 00:00 | 2.5M | Frog Neuron Codes |
![[ ]](/icons/layout.gif) | McCullochPitts-1943.pdf | 1943-07-01 00:00 | 1.2M | Logical Neurons |
![[ ]](/icons/layout.gif) | Meng-2022.pdf | 2022-07-01 00:00 | 9.9M | |
![[ ]](/icons/layout.gif) | Mikolov-2013.pdf | 2013-07-01 00:00 | 109K | Word2Vec, Semantic Vector Composition |
![[ ]](/icons/layout.gif) | Mildenhall-2020.pdf | 2020-07-01 00:00 | 7.9M | Neural Radiance Fields (NeRF) |
![[ ]](/icons/layout.gif) | Minh-2013.pdf | 2013-07-01 00:00 | 472K | Atari, Deep Reinforcement Learning |
![[ ]](/icons/layout.gif) | Minsky-1969.pdf | 1969-07-01 00:00 | 1.0M | Limitations of Perceptrons |
![[ ]](/icons/layout.gif) | Ouyang-2022.pdf | 2022-07-01 00:00 | 1.7M | |
![[ ]](/icons/layout.gif) | Quiroga-2005.pdf | 2005-07-01 00:00 | 403K | Single Neurons in Humans |
![[ ]](/icons/layout.gif) | Radford-2018.pdf | 2018-07-01 00:00 | 569K | Autoregressive Generative Pretrained Transformer (GPT) |
![[ ]](/icons/layout.gif) | Radford-2021.pdf | 2021-07-01 00:00 | 6.5M | Contrastive Language-Image representation (CLIP) |
![[ ]](/icons/layout.gif) | Rombach-2022.pdf | 2022-07-01 00:00 | 2.3M | |
![[ ]](/icons/layout.gif) | Rosenblatt-1958.pdf | 1958-07-01 00:00 | 1.6M | Perceptron Model |
![[ ]](/icons/layout.gif) | Rumelhart-1986.pdf | 1986-07-01 00:00 | 1.3M | Backpropagation |
![[ ]](/icons/layout.gif) | Scarselli-2009.pdf | 2009-07-01 00:00 | 1.4M | Graph Neural Networks (GNN) |
![[ ]](/icons/layout.gif) | Senior-2020.pdf | 2020-07-01 00:00 | 7.6M | AlphaFold, Deep Learning Computational Chemistry |
![[ ]](/icons/layout.gif) | Shao-2024.pdf | 2025-03-24 11:10 | 1.8M | |
![[ ]](/icons/layout.gif) | Silver-2016.pdf | 2016-07-01 00:00 | 1.6M | AlphaGo, Neural Tree Search |
![[ ]](/icons/layout.gif) | Sohl-Dickstein-2015.pdf | 2015-07-01 00:00 | 4.2M | Diffusion Models |
![[ ]](/icons/layout.gif) | Solla-1988.pdf | 1988-07-01 00:00 | 2.4M | Cross Entropy Loss |
![[ ]](/icons/layout.gif) | Sutskever-2014.pdf | 2014-07-01 00:00 | 140K | Neural Machine Translation, Seq2Seq |
![[ ]](/icons/layout.gif) | Szegedy-2013.pdf | 2013-07-01 00:00 | 6.3M | Adversarial Examples |
![[ ]](/icons/layout.gif) | Vaswani-2017.pdf | 2017-07-01 00:00 | 2.1M | Attention is All You Need: Transformers |
![[ ]](/icons/layout.gif) | Vincent-2010.pdf | 2010-07-01 00:00 | 1.3M | Denoising Autoencoders (DAE) |
![[ ]](/icons/layout.gif) | Vinyals-2015.pdf | 2015-07-01 00:00 | 752K | Show and Tell, Image Captioning |
![[ ]](/icons/layout.gif) | Yosinksi-2014.pdf | 2014-07-01 00:00 | 481K | Transfer Learning |
![[ ]](/icons/layout.gif) | Zeiler-2014.pdf | 2014-07-01 00:00 | 35M | Visualization, Salience Maps |
![[ ]](/icons/layout.gif) | Zhang-2016.pdf | 2016-07-01 00:00 | 394K | Rethinking Generalization, Memorization |