VectorDB and Embeddings: The Most Underrated Concept in AI Engineering

In partnership with

The GTM bets that shouldn't have worked, and did

One grew revenue 50x after half his team quit over the strategy. One brought in 50K signups in a single day with no paid budget. One generated 100M+ views from a stunt that took 50 hours to conceive. One asked every prospect to demo the product themselves instead of demoing it for them.

None of them followed the safe playbook. They treated GTM like an experiment, moved before they had proof, and made bets most founders would never get approved.

HubSpot for Startups documented all 6 stories in the free Bold Bets Playbook. The risks they took, why it was risky, and what it returned.

Read the free Bold Bets Playbook

❝

Welcome to Grind Engineer, your guide to becoming a better software engineer! No fluff. Pure engineering insights.

❝

Added Job Opening in the end of the article!

Search on Google for "best running shoes 2026" and it returns exactly what you expected. Search your company's internal Slack for "deployment failed on prod" and you get nothing, even though 50 messages exist about it, just phrased differently.

The difference? Google understands meaning. Slack, by default, matches keywords.

Embeddings and vector databases are the technology that gives machines the ability to understand meaning. They power every AI search, recommendation, and RAG system you use today. Yet most engineers treat them as a black box.

This article opens the box.

❝

TL;DR: An embedding converts any piece of content (text, image, code) into a list of numbers where similar meanings land close together in space. A vector database stores and searches those numbers at scale. Together they are the foundation of every AI powered search system built in the last 3 years.

What Is an Embedding?

An embedding is a list of floating point numbers that represents the meaning of a piece of content.

# The word "king" becomes something like this
embedding = [0.23, -0.41, 0.87, 0.12, -0.66, ... ]  # 1536 numbers total

That list of numbers is not random. It is the result of an embedding model that was trained to capture semantic meaning numerically. The key property: content with similar meaning produces similar numbers.

King and Queen produce vectors close to each other in 1536 dimensional space. King and Potato produce vectors far apart. Cat and Feline produce vectors almost identical.

This is the entire concept. Everything else is implementation.

Vector Space Intuition

Imagine a 2D map where words are plotted as points. Synonyms cluster together. Related concepts cluster in the same neighborhood. Unrelated words are far apart.

Real embeddings work the same way, just in 1536 dimensions instead of 2. You cannot visualize 1536 dimensions, but the math works identically. Distance between two points in this space represents semantic distance between their meanings.

Three ways to measure distance:

Metric	What it measures	Best for
Cosine similarity	Angle between two vectors (0 to 1)	Most text embeddings. Direction matters, not magnitude.
Euclidean distance	Straight line distance between points	Image embeddings, spatial data
Dot product	Projection of one vector onto another	Faster than cosine on L2 normalized vectors

❝

💡 Key Insight: You do not choose the distance metric. The embedding model does. When you use OpenAI's text-embedding-3-small, you use cosine similarity because that is what the model was trained with. Using the wrong metric on a model that expects another gives nonsense results.

Semantic Similarity vs Keyword Match

This is the most practical distinction in AI search.

Keyword match (traditional SQL, Elasticsearch BM25): Find documents that contain the exact words in the query. Fast, predictable, brittle.

-- This finds "deployment" but misses "release", "rollout", "push to prod"
SELECT * FROM messages WHERE content LIKE '%deployment failed%'

Semantic search (embeddings + vector search): Find documents whose meaning is similar to the query, regardless of exact words used.

# This finds "deployment", "release", "rollout", "push to prod"
# because their embeddings are close in vector space
results = vectorstore.similarity_search("deployment failed on prod", k=5)

Real example. The query "heart attack" semantically matches documents containing "myocardial infarction," "cardiac arrest," "chest pain with ECG changes," even though none of those documents contain the word "heart attack." A keyword search returns zero results. A semantic search returns the most relevant medical records.

	Keyword Match	Semantic Search
Query: "heart attack" finds	Documents with "heart attack"	Documents about cardiac events (any phrasing)
Query: "cheap hotels" finds	Documents with those exact words	Documents about "affordable accommodation," "budget stays," "low cost lodging"
Speed	Very fast (inverted index)	Fast with ANN index, slower without
Best for	Exact ID lookups, known keywords	Natural language queries, meaning based retrieval

Most production systems in 2026 use hybrid search: keyword match (BM25) AND semantic search combined, then reranked. This gives you the precision of keyword search with the recall of semantic search.

How Embeddings Are Generated

An embedding model is a neural network trained to compress meaning into a fixed size vector. The training process teaches it that "I love dogs" and "I adore canines" should produce similar vectors, while "stock market crash" and "I love dogs" should produce distant vectors.

import openai

client = openai.OpenAI()

# Generate an embedding for any text
response = client.embeddings.create(
    input="How does database indexing work?",
    model="text-embedding-3-small"
)

vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}")   # 1536
print(f"First 5 values: {vector[:5]}")  # [0.023, -0.41, 0.87, ...]

You can embed anything: text, code, images, audio, user behavior logs. The embedding model determines what the vector captures and how many dimensions it uses. More dimensions generally means more nuance, but more storage and slower search.

Embedding Model Options

Model	Dimensions	Best for	Cost
`text-embedding-3-small` (OpenAI)	1536	General purpose text, fast, cheap	$0.02 per 1M tokens
`text-embedding-3-large` (OpenAI)	3072	Higher accuracy, nuanced retrieval	$0.13 per 1M tokens
`embed-english-v3.0` (Cohere)	1024	English text, strong reranking support	Paid API
`bge-large-en-v1.5` (BAAI)	1024	Open source, runs locally, strong benchmark performance	Free
`jina-embeddings-v3` (Jina)	1024	Multilingual, long document support	Free tier + paid
`nomic-embed-text` (Nomic)	768	Open source, low memory, fast	Free

Rule of thumb: Start with text-embedding-3-small. It handles 90% of use cases, costs almost nothing, and you can always upgrade later. Use bge-large if you need to run locally (no API costs, full data privacy).

How Vector Databases Work Under the Hood

Storing embeddings in a normal database and running SELECT * WHERE similarity > 0.8 would require comparing your query vector against every stored vector. That is O(n) linear search. With 10 million documents, that is 10 million comparisons per query. At 100ms per million comparisons, every query takes 1 second. Unusable.

Vector databases solve this with Approximate Nearest Neighbor (ANN) indexing. The most widely used algorithm is HNSW (Hierarchical Navigable Small World graphs).

HNSW builds a multi layer graph where:

The top layer has few nodes with long range connections (like highways)
Lower layers have more nodes with shorter connections (like local roads)
A query starts at the top, zooms toward the right neighborhood, then zooms in locally

This turns O(n) linear search into O(log n) graph traversal. With 10 million vectors, instead of 10 million comparisons, HNSW finds the nearest neighbors in roughly 20 to 30 graph hops.

The trade off: HNSW returns approximate nearest neighbors, not exact. You might miss the 4th most similar document. In practice, the top 1, 2, and 3 results are almost always correct, and the approximation error is irrelevant for search use cases.

Vector Database Options

Database	Best for	Runs locally	Managed cloud
Pinecone	Production, no infra management, serverless	No	Yes
Weaviate	Full search + vector hybrid, rich schema	Yes	Yes
Qdrant	High performance, Rust core, low memory	Yes	Yes
Chroma	Local development, prototyping, simple API	Yes	No
pgvector	Already use PostgreSQL, simple setup	Yes	Yes (via RDS, Supabase)
Redis Stack	Already use Redis, sub millisecond search	Yes	Yes

For most engineers: Start with pgvector if you already run PostgreSQL. Zero new infrastructure. Add CREATE EXTENSION vector; and you have vector search. Migrate to Pinecone or Qdrant if you hit performance limits at scale (tens of millions of vectors).

Real Use Cases

1. Semantic Search User types "how to reset password" into a support portal. Keyword search finds documents with those exact words. Semantic search also finds "forgot credentials," "account recovery," "login assistance," and "2FA troubleshooting," all of which are relevant.

2. RAG (Retrieval Augmented Generation) As covered in our previous article: embed your documents, store in a vector DB, embed the user's question, find the closest document chunks, paste them into the LLM prompt. The vector database is the retrieval engine powering RAG.

3. Recommendation Systems Embed product descriptions. Embed user purchase history (as a combined embedding of products they bought). Find products whose embeddings are close to the user embedding. This is how Spotify's "Discover Weekly" and Amazon's "Customers also bought" work at a high level.

4. Duplicate and Near Duplicate Detection Embed every document in your database. If two documents produce vectors with cosine similarity above 0.97, they are near duplicates. Used in spam detection, plagiarism detection, and deduplicating support tickets.

5. Code Search GitHub Copilot and similar tools embed your codebase. When you type a comment like "// get user by email", the system finds the semantically closest function in your codebase, even if the function is named fetchUserRecord(emailAddress).

What This Means For Engineers

Embeddings are not magic. They are compression. An embedding model compresses a paragraph of text into 1536 numbers. Information is lost. But the most important information, semantic meaning, is preserved well enough to be useful for search and retrieval tasks.
pgvector first, then dedicated vector DB. You do not need Pinecone to get started. If you already run PostgreSQL, add pgvector, embed your data, and start querying. Most applications never need to migrate beyond pgvector. Scale to a dedicated vector database only when you have millions of vectors and sub 10ms latency requirements.
Semantic search does not replace keyword search. It complements it. For user queries that are natural language questions, semantic search wins. For queries that are exact IDs, codes, or known keywords, BM25 keyword search wins. Build hybrid search from day one and you get the best of both.

❝

Job Openings

Software Engineer, New Grad @Stripe: Apply Here
Software Engineer, Payments and Risk @Stripe: Apply Here
Software Engineer, Data & AI @Stripe: Apply Here
Software Engineer (1+ YOE) @Stripe: Apply Here
Software Engineer 2, iOS @Uber: Apply Here

Follow me on Youtube · LinkedIn · X · Instagram to stay updated.

See you in the next one!
Signing Off, Scortier

VectorDB and Embeddings: The Most Underrated Concept in AI Engineering

The GTM bets that shouldn't have worked, and did

What Is an Embedding?

Vector Space Intuition

Semantic Similarity vs Keyword Match

How Embeddings Are Generated

Embedding Model Options

How Vector Databases Work Under the Hood

Vector Database Options

Real Use Cases

What This Means For Engineers

Reply

Keep Reading

Subscribe to Grind Engineer

VectorDB and Embeddings: The Most Underrated Concept in AI Engineering

The GTM bets that shouldn't have worked, and did

What Is an Embedding?

Vector Space Intuition

Semantic Similarity vs Keyword Match

How Embeddings Are Generated

Embedding Model Options

How Vector Databases Work Under the Hood

Vector Database Options

Real Use Cases

What This Means For Engineers

Subscribe to keep reading

Reply

Keep Reading

Subscribe to Grind Engineer