ai – @ the Bach

Where LLMs and Cosine Similarity Fit Into the Stack

8 May 20266 May 2026 by grant

Large Language Models (LLMs) sit at the top of this entire stack as enormous neural networks built from layers of tensor operations running on GPUs. At their core, LLMs are fundamentally prediction systems trained to model relationships between tokens, concepts, and patterns in language. Every word, sentence, or document is converted into high-dimensional numerical representations … Read more

Matrices, Tensors, TensorFlow, and the CUDA Stack — The Mathematics and Infrastructure Behind Modern AI

8 May 20266 May 2026 by grant

Modern AI Runs on Mathematics Modern AI looks magical from the outside. You type a prompt into ChatGPT, an image appears from a diffusion model, or a voice assistant responds naturally in real time. Underneath all of it is something surprisingly fundamental: massive amounts of matrix multiplication. Modern AI is built on layers that stack … Read more

NVIDIA GPU Architecture, CUDA, and PTX — How Modern GPU Computing Actually Works

8 May 20266 May 2026 by grant

When people talk about modern AI, high-performance computing, or accelerated graphics, the conversation almost always arrives at NVIDIA.But the real story is not just the hardware. It’s the layered software and execution model built around the GPU: Together, these form one of the most influential computing stacks of the last two decades. From Graphics Card … Read more

AI Assistants Compared — Architecture vs Marketecture

8 May 202611 April 2026 by grant

Executive Summary The current wave of “AI comparison charts” (ChatGPT vs Gemini vs Claude vs others) are not wrong—but they are not reliable. They conflate: This article reframes the comparison using: The Core Problem Most comparisons: 👉 Example flaw:“Perplexity = best for research”→ In reality, it is a retrieval + UX layer over models, not a … Read more