Papers & Articles
-
(2025) Peri-LN: Revisiting Layer Normalization in the Transformer Architecture — Kim et al.
View paper
-
(2019) Deep Equilibrium Models — Bai et al.
View paper
-
(2023) Compact and Optimal Deep Learning with Recurrent Parameter Generators — Wang et al.
WACV
View paper
-
(2024) LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language
Models — Faiz et al.
ICLR
View paper
-
(2024) The FineWeb Datasets: Decanting the Web for the Finest Text Data at
Scale — Penedo et al.
NEURIPS
View paper
-
(2025) SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language
Model — Allal et al.
arXiv
View paper
-
(2024) DeepSeek-V3 Technical Report — DeepSeek-AI et al.
arXiv
View paper
-
(2020) Scaling Laws for Neural Language Models — Kaplan et al.
arXiv
View paper
-
(2022) Training Compute-Optimal Large Language Models — Hoffmann et al.
arXiv
View paper
-
(2021) Highly accurate protein structure prediction with AlphaFold — Jumper et al.
Nature
View paper
-
(2020) Denoising Diffusion Probabilistic Models — Ho et al.
NEURIPS
View paper
-
(2017) Attention is All you Need — Vaswani et al.
Advances in Neural Information Processing Systems ...
View paper
-
(2022) Sliced Recursive Transformer — Shen et al.
ECCV
View paper
-
(2023) Looped Transformers as Programmable Computers — Giannou et al.
ICML
View paper
-
(2019) Universal Transformers — Dehghani et al.
ICLR
View paper
-
(2024) Looped Transformers are Better at Learning Learning Algorithms — Yang et al.
ICLR
View paper
-
(2019) Language Models are Unsupervised Multitask Learners — Radford et al.
OpenAI
View paper
Books
-
(2013) The Meaning of Marriage: Facing the Complexities of Commitment with the Wisdom of God — Keller, Timothy and Keller, Kathy
View book