In partnership with

The GTM bets that shouldn't have worked, and did

One grew revenue 50x after half his team quit over the strategy. One brought in 50K signups in a single day with no paid budget. One generated 100M+ views from a stunt that took 50 hours to conceive. One asked every prospect to demo the product themselves instead of demoing it for them.

None of them followed the safe playbook. They treated GTM like an experiment, moved before they had proof, and made bets most founders would never get approved.

HubSpot for Startups documented all 6 stories in the free Bold Bets Playbook. The risks they took, why it was risky, and what it returned.

Welcome to Grind Engineer, your guide to becoming a better software engineer! No fluff. Pure engineering insights.

Added Job Opening in the end of the article!

Search on Google for "best running shoes 2026" and it returns exactly what you expected. Search your company's internal Slack for "deployment failed on prod" and you get nothing, even though 50 messages exist about it, just phrased differently.

The difference? Google understands meaning. Slack, by default, matches keywords.

Embeddings and vector databases are the technology that gives machines the ability to understand meaning. They power every AI search, recommendation, and RAG system you use today. Yet most engineers treat them as a black box.

This article opens the box.

TL;DR: An embedding converts any piece of content (text, image, code) into a list of numbers where similar meanings land close together in space. A vector database stores and searches those numbers at scale. Together they are the foundation of every AI powered search system built in the last 3 years.

What Is an Embedding?

An embedding is a list of floating point numbers that represents the meaning of a piece of content.

# The word "king" becomes something like this
embedding = [0.23, -0.41, 0.87, 0.12, -0.66, ... ]  # 1536 numbers total

That list of numbers is not random. It is the result of an embedding model that was trained to capture semantic meaning numerically. The key property: content with similar meaning produces similar numbers.

King and Queen produce vectors close to each other in 1536 dimensional space. King and Potato produce vectors far apart. Cat and Feline produce vectors almost identical.

This is the entire concept. Everything else is implementation.

Vector Space Intuition

Imagine a 2D map where words are plotted as points. Synonyms cluster together. Related concepts cluster in the same neighborhood. Unrelated words are far apart.

Real embeddings work the same way, just in 1536 dimensions instead of 2. You cannot visualize 1536 dimensions, but the math works identically. Distance between two points in this space represents semantic distance between their meanings.

Three ways to measure distance:

Metric

What it measures

Best for

Cosine similarity

Angle between two vectors (0 to 1)

Most text embeddings. Direction matters, not magnitude.

Euclidean distance

Straight line distance between points

Image embeddings, spatial data

Dot product

Projection of one vector onto another

Faster than cosine on L2 normalized vectors

💡 Key Insight: You do not choose the distance metric. The embedding model does. When you use OpenAI's text-embedding-3-small, you use cosine similarity because that is what the model was trained with. Using the wrong metric on a model that expects another gives nonsense results.

Semantic Similarity vs Keyword Match

This is the most practical distinction in AI search.

Keyword match (traditional SQL, Elasticsearch BM25): Find documents that contain the exact words in the query. Fast, predictable, brittle.

-- This finds "deployment" but misses "release", "rollout", "push to prod"
SELECT * FROM messages WHERE content LIKE '%deployment failed%'

Semantic search (embeddings + vector search): Find documents whose meaning is similar to the query, regardless of exact words used.

# This finds "deployment", "release", "rollout", "push to prod"
# because their embeddings are close in vector space
results = vectorstore.similarity_search("deployment failed on prod", k=5)

Real example. The query "heart attack" semantically matches documents containing "myocardial infarction," "cardiac arrest," "chest pain with ECG changes," even though none of those documents contain the word "heart attack." A keyword search returns zero results. A semantic search returns the most relevant medical records.

Keyword Match

Semantic Search

Query: "heart attack" finds

Documents with "heart attack"

Documents about cardiac events (any phrasing)

Query: "cheap hotels" finds

Documents with those exact words

Documents about "affordable accommodation," "budget stays," "low cost lodging"

Speed

Very fast (inverted index)

Fast with ANN index, slower without

Best for

Exact ID lookups, known keywords

Natural language queries, meaning based retrieval

Most production systems in 2026 use hybrid search: keyword match (BM25) AND semantic search combined, then reranked. This gives you the precision of keyword search with the recall of semantic search.

How Embeddings Are Generated

An embedding model is a neural network trained to compress meaning into a fixed size vector. The training process teaches it that "I love dogs" and "I adore canines" should produce similar vectors, while "stock market crash" and "I love dogs" should produce distant vectors.

import openai

client = openai.OpenAI()

# Generate an embedding for any text
response = client.embeddings.create(
    input="How does database indexing work?",
    model="text-embedding-3-small"
)

vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}")   # 1536
print(f"First 5 values: {vector[:5]}")  # [0.023, -0.41, 0.87, ...]

You can embed anything: text, code, images, audio, user behavior logs. The embedding model determines what the vector captures and how many dimensions it uses. More dimensions generally means more nuance, but more storage and slower search.

Embedding Model Options

Model

Dimensions

Best for

Cost

text-embedding-3-small (OpenAI)

1536

General purpose text, fast, cheap

$0.02 per 1M tokens

text-embedding-3-large (OpenAI)

3072

Higher accuracy, nuanced retrieval

$0.13 per 1M tokens

embed-english-v3.0 (Cohere)

1024

English text, strong reranking support

Paid API

bge-large-en-v1.5 (BAAI)

1024

Open source, runs locally, strong benchmark performance

Free

jina-embeddings-v3 (Jina)

1024

Multilingual, long document support

Free tier + paid

nomic-embed-text (Nomic)

768

Open source, low memory, fast

Free

Rule of thumb: Start with text-embedding-3-small. It handles 90% of use cases, costs almost nothing, and you can always upgrade later. Use bge-large if you need to run locally (no API costs, full data privacy).

How Vector Databases Work Under the Hood

Storing embeddings in a normal database and running SELECT * WHERE similarity > 0.8 would require comparing your query vector against every stored vector. That is O(n) linear search. With 10 million documents, that is 10 million comparisons per query. At 100ms per million comparisons, every query takes 1 second. Unusable.

Vector databases solve this with Approximate Nearest Neighbor (ANN) indexing. The most widely used algorithm is HNSW (Hierarchical Navigable Small World graphs).

HNSW builds a multi layer graph where:

  1. The top layer has few nodes with long range connections (like highways)

  2. Lower layers have more nodes with shorter connections (like local roads)

  3. A query starts at the top, zooms toward the right neighborhood, then zooms in locally

This turns O(n) linear search into O(log n) graph traversal. With 10 million vectors, instead of 10 million comparisons, HNSW finds the nearest neighbors in roughly 20 to 30 graph hops.

The trade off: HNSW returns approximate nearest neighbors, not exact. You might miss the 4th most similar document. In practice, the top 1, 2, and 3 results are almost always correct, and the approximation error is irrelevant for search use cases.

Vector Database Options

Database

Best for

Runs locally

Managed cloud

Pinecone

Production, no infra management, serverless

No

Yes

Weaviate

Full search + vector hybrid, rich schema

Yes

Yes

Qdrant

High performance, Rust core, low memory

Yes

Yes

Chroma

Local development, prototyping, simple API

Yes

No

pgvector

Already use PostgreSQL, simple setup

Yes

Yes (via RDS, Supabase)

Redis Stack

Already use Redis, sub millisecond search

Yes

Yes

For most engineers: Start with pgvector if you already run PostgreSQL. Zero new infrastructure. Add CREATE EXTENSION vector; and you have vector search. Migrate to Pinecone or Qdrant if you hit performance limits at scale (tens of millions of vectors).

Real Use Cases

1. Semantic Search User types "how to reset password" into a support portal. Keyword search finds documents with those exact words. Semantic search also finds "forgot credentials," "account recovery," "login assistance," and "2FA troubleshooting," all of which are relevant.

2. RAG (Retrieval Augmented Generation) As covered in our previous article: embed your documents, store in a vector DB, embed the user's question, find the closest document chunks, paste them into the LLM prompt. The vector database is the retrieval engine powering RAG.

3. Recommendation Systems Embed product descriptions. Embed user purchase history (as a combined embedding of products they bought). Find products whose embeddings are close to the user embedding. This is how Spotify's "Discover Weekly" and Amazon's "Customers also bought" work at a high level.

4. Duplicate and Near Duplicate Detection Embed every document in your database. If two documents produce vectors with cosine similarity above 0.97, they are near duplicates. Used in spam detection, plagiarism detection, and deduplicating support tickets.

5. Code Search GitHub Copilot and similar tools embed your codebase. When you type a comment like "// get user by email", the system finds the semantically closest function in your codebase, even if the function is named fetchUserRecord(emailAddress).

What This Means For Engineers

  1. Embeddings are not magic. They are compression. An embedding model compresses a paragraph of text into 1536 numbers. Information is lost. But the most important information, semantic meaning, is preserved well enough to be useful for search and retrieval tasks.

  2. pgvector first, then dedicated vector DB. You do not need Pinecone to get started. If you already run PostgreSQL, add pgvector, embed your data, and start querying. Most applications never need to migrate beyond pgvector. Scale to a dedicated vector database only when you have millions of vectors and sub 10ms latency requirements.

  3. Semantic search does not replace keyword search. It complements it. For user queries that are natural language questions, semantic search wins. For queries that are exact IDs, codes, or known keywords, BM25 keyword search wins. Build hybrid search from day one and you get the best of both.

Job Openings

Follow me on Youtube · LinkedIn · X · Instagram to stay updated.

See you in the next one!
Signing Off, Scortier

Subscribe to keep reading

This content is free, but you must be subscribed to Grind Engineer to continue reading.

Already a subscriber?Sign in.Not now

Reply

Avatar

or to participate

Keep Reading