Papers & Articles

  • (2025) Peri-LN: Revisiting Layer Normalization in the Transformer Architecture — Kim et al.
    View paper
  • (2019) Deep Equilibrium Models — Bai et al.
    View paper
  • (2023) Compact and Optimal Deep Learning with Recurrent Parameter Generators — Wang et al.
    WACV
    View paper
  • (2024) LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models — Faiz et al.
    ICLR
    View paper
  • (2024) The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale — Penedo et al.
    NEURIPS
    View paper
  • (2025) SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language Model — Allal et al.
    arXiv
    View paper
  • (2024) DeepSeek-V3 Technical Report — DeepSeek-AI et al.
    arXiv
    View paper
  • (2020) Scaling Laws for Neural Language Models — Kaplan et al.
    arXiv
    View paper
  • (2022) Training Compute-Optimal Large Language Models — Hoffmann et al.
    arXiv
    View paper
  • (2021) Highly accurate protein structure prediction with AlphaFold — Jumper et al.
    Nature
    View paper
  • (2020) Denoising Diffusion Probabilistic Models — Ho et al.
    NEURIPS
    View paper
  • (2017) Attention is All you Need — Vaswani et al.
    Advances in Neural Information Processing Systems ...
    View paper
  • (2022) Sliced Recursive Transformer — Shen et al.
    ECCV
    View paper
  • (2023) Looped Transformers as Programmable Computers — Giannou et al.
    ICML
    View paper
  • (2019) Universal Transformers — Dehghani et al.
    ICLR
    View paper
  • (2024) Looped Transformers are Better at Learning Learning Algorithms — Yang et al.
    ICLR
    View paper
  • (2019) Language Models are Unsupervised Multitask Learners — Radford et al.
    OpenAI
    View paper

Books

  • (2013) The Meaning of Marriage: Facing the Complexities of Commitment with the Wisdom of God — Keller, Timothy and Keller, Kathy
    View book