- Latency vs Throughput oriented design
Resources
- README | GPU Glossary
- CUDA C++ Programming Guide — CUDA C++ Programming Guide
- The CUDA Parallel Programming Model - 1. Concepts - Fang’s Notebook
- A history of NVidia Stream Multiprocessor
- GPU Programming: When, Why and How? — GPU programming: why, when and how? documentation
- Programming Massively Parallel Processors
- Is Parallel Programming Hard, And, If So, What Can You Do About It?
- CUDA by Example
- CUDA Books: Self taught · GitHub
- Cornell Virtual Workshop: Understanding GPU Architecture
- GPU Programming - YouTube
- The AI/ML Engineer’s starter guide to GPU Programming
- Basic facts about GPUs | Damek Davis’ Website
- Introduction to CUDA Programming With GPU Puzzles
- Introduction to CUDA Programming for Python Developers | PySpur | AI Agent Builder
- gfxcourses.stanford.edu/cs149/fall23/courseinfo
- GitHub - NVIDIA/accelerated-computing-hub: NVIDIA curated collection of educational resources related to general purpose GPU programming.
- GitHub - AlphaGPU/leetgpu-challenges: LeetGPU Challenges
- GitHub - Infatoshi/cuda-course
- GitHub - Maharshi-Pandya/cudacodes: Learnings and programs related to CUDA
- GitHub - gpu-mode/lectures: Material for gpu-mode lectures
- A Meticulous Guide to Advances in Deep Learning Efficiency over the Years | Alex L. Zhang
- Writing CPU ML Kernels with XNNPACK
- Making GPUs Actually Fast: A Deep Dive into Training Performance - YouTube
- supplement to 0.420