{"id":405,"date":"2026-05-06T19:59:50","date_gmt":"2026-05-06T09:59:50","guid":{"rendered":"https:\/\/www.the-bach.kiwi\/?p=405"},"modified":"2026-05-08T12:13:57","modified_gmt":"2026-05-08T02:13:57","slug":"where-llms-and-cosine-similarity-fit-into-the-stack","status":"publish","type":"post","link":"https:\/\/www.the-bach.kiwi\/index.php\/2026\/05\/06\/where-llms-and-cosine-similarity-fit-into-the-stack\/","title":{"rendered":"Where LLMs and Cosine Similarity Fit Into the Stack"},"content":{"rendered":"\n<p>Large Language Models (LLMs) sit at the top of this entire stack as enormous neural networks built from layers of tensor operations running on GPUs. At their core, LLMs are fundamentally prediction systems trained to model relationships between tokens, concepts, and patterns in language. Every word, sentence, or document is converted into high-dimensional numerical representations called <strong>embeddings<\/strong> \u2014 essentially very large vectors inside tensor space. During training and inference, the model continuously performs matrix multiplications, attention calculations, and tensor transformations across billions or trillions of parameters using CUDA-accelerated GPU hardware.<\/p>\n\n\n\n<p>One of the key mathematical tools used in this embedding space is <strong>cosine similarity<\/strong>. Cosine similarity measures how closely two vectors point in the same direction regardless of their absolute magnitude. In AI systems this becomes extremely useful because semantically similar concepts tend to occupy nearby positions in vector space. For example, embeddings for \u201cdog\u201d and \u201cpuppy\u201d may point in similar directions, while \u201cdog\u201d and \u201cspaceship\u201d are much farther apart.<\/p>\n\n\n\n<p>Mathematically, cosine similarity measures the angle between vectors:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\cos(\\theta)=\\frac{A\\cdot B}{|A||B|}<\/pre><\/div>\n\n\n\n<p class=\"has-text-align-center\"><code>cos(\u03b8) = (A \u00b7 B) \/ (||A|| ||B||)<\/code> <\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>(A<code> \u00b7 <\/code>B)<\/code> is the dot product<\/li>\n\n\n\n<li><code>(|A|) and (|B|)<\/code> are vector magnitudes<\/li>\n\n\n\n<li>the result ranges from:\n<ul class=\"wp-block-list\">\n<li><code>1<\/code> \u2192 highly similar<\/li>\n\n\n\n<li><code>0<\/code> \u2192 unrelated<\/li>\n\n\n\n<li><code>-1<\/code> \u2192 opposite directions<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>In practical AI systems, cosine similarity is heavily used in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>semantic search<\/li>\n\n\n\n<li>vector databases<\/li>\n\n\n\n<li>retrieval-augmented generation (RAG)<\/li>\n\n\n\n<li>recommendation systems<\/li>\n\n\n\n<li>embedding clustering<\/li>\n\n\n\n<li>memory retrieval<\/li>\n\n\n\n<li>context ranking<\/li>\n<\/ul>\n\n\n\n<p>This is why modern AI systems often talk about:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>vector stores<\/li>\n\n\n\n<li>embedding search<\/li>\n\n\n\n<li>nearest neighbour retrieval<\/li>\n<\/ul>\n\n\n\n<p>A RAG system, for example, converts documents into embeddings, stores them as vectors, and then compares a user query embedding against millions of stored vectors using cosine similarity to find the most semantically relevant information before passing context into the LLM.<\/p>\n\n\n\n<p>Conceptually, the modern AI stack now looks something like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Human Language\n      \u2193\nTokenisation\n      \u2193\nEmbeddings (Vectors)\n      \u2193\nTensor Operations\n      \u2193\nTransformer Layers\n      \u2193\nMatrix Multiplication\n      \u2193\nCUDA \/ Tensor Cores\n      \u2193\nGPU Hardware\n<\/code><\/pre>\n\n\n\n<p>So while CUDA, tensors, and GPUs provide the computational machinery, embeddings and cosine similarity provide much of the semantic geometry that allows LLMs to model meaning, relationships, and contextual understanding within high-dimensional vector spaces.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><sub><sup>Note: This article was developed using AI-assisted drafting and editing tools, including ChatGPT, with human direction, review, and refinement.<\/sup><\/sub><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Large Language Models (LLMs) sit at the top of this entire stack as enormous neural networks built from layers of tensor operations running on GPUs. At their core, LLMs are fundamentally prediction systems trained to model relationships between tokens, concepts, and patterns in language. Every word, sentence, or document is converted into high-dimensional numerical representations &#8230; <a title=\"Where LLMs and Cosine Similarity Fit Into the Stack\" class=\"read-more\" href=\"https:\/\/www.the-bach.kiwi\/index.php\/2026\/05\/06\/where-llms-and-cosine-similarity-fit-into-the-stack\/\" aria-label=\"Read more about Where LLMs and Cosine Similarity Fit Into the Stack\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[17,20,21,18,22,19],"class_list":["post-405","post","type-post","status-publish","format-standard","hentry","category-skunkworks","tag-ai","tag-cosine-similarity","tag-embeddings","tag-llm","tag-machine-learning","tag-vectors"],"_links":{"self":[{"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/posts\/405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/comments?post=405"}],"version-history":[{"count":4,"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/posts\/405\/revisions"}],"predecessor-version":[{"id":415,"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/posts\/405\/revisions\/415"}],"wp:attachment":[{"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/media?parent=405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/categories?post=405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.the-bach.kiwi\/index.php\/wp-json\/wp\/v2\/tags?post=405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}