Where LLMs and Cosine Similarity Fit Into the Stack

Large Language Models (LLMs) sit at the top of this entire stack as enormous neural networks built from layers of tensor operations running on GPUs. At their core, LLMs are fundamentally prediction systems trained to model relationships between tokens, concepts, and patterns in language. Every word, sentence, or document is converted into high-dimensional numerical representations … Read more

Matrices, Tensors, TensorFlow, and the CUDA Stack โ€” The Mathematics and Infrastructure Behind Modern AI

Modern AI Runs on Mathematics Modern AI looks magical from the outside. You type a prompt into ChatGPT, an image appears from a diffusion model, or a voice assistant responds naturally in real time. Underneath all of it is something surprisingly fundamental: massive amounts of matrix multiplication. Modern AI is built on layers that stack … Read more

NVIDIA GPU Architecture, CUDA, and PTX โ€” How Modern GPU Computing Actually Works

When people talk about modern AI, high-performance computing, or accelerated graphics, the conversation almost always arrives at NVIDIA.But the real story is not just the hardware. Itโ€™s the layered software and execution model built around the GPU: Together, these form one of the most influential computing stacks of the last two decades. From Graphics Card … Read more

AI Assistants Compared โ€” Architecture vs Marketecture

Executive Summary The current wave of โ€œAI comparison chartsโ€ (ChatGPT vs Gemini vs Claude vs others) are not wrongโ€”but they are not reliable. They conflate: This article reframes the comparison using: The Core Problem Most comparisons: ๐Ÿ‘‰ Example flaw:โ€œPerplexity = best for researchโ€โ†’ In reality, it is a retrieval + UX layer over models, not a … Read more