The GTM bets that shouldn't have worked, and did
One grew revenue 50x after half his team quit over the strategy. One brought in 50K signups in a single day with no paid budget. One generated 100M+ views from a stunt that took 50 hours to conceive. One asked every prospect to demo the product themselves instead of demoing it for them.
None of them followed the safe playbook. They treated GTM like an experiment, moved before they had proof, and made bets most founders would never get approved.
HubSpot for Startups documented all 6 stories in the free Bold Bets Playbook. The risks they took, why it was risky, and what it returned.
Welcome to Grind Engineer, your guide to becoming a better software engineer! No fluff. Pure engineering insights.
Added Job Opening in the end of the article!
Search on Google for "best running shoes 2026" and it returns exactly what you expected. Search your company's internal Slack for "deployment failed on prod" and you get nothing, even though 50 messages exist about it, just phrased differently.
The difference? Google understands meaning. Slack, by default, matches keywords.
Embeddings and vector databases are the technology that gives machines the ability to understand meaning. They power every AI search, recommendation, and RAG system you use today. Yet most engineers treat them as a black box.
This article opens the box.
TL;DR: An embedding converts any piece of content (text, image, code) into a list of numbers where similar meanings land close together in space. A vector database stores and searches those numbers at scale. Together they are the foundation of every AI powered search system built in the last 3 years.

What Is an Embedding?
An embedding is a list of floating point numbers that represents the meaning of a piece of content.
# The word "king" becomes something like this
embedding = [0.23, -0.41, 0.87, 0.12, -0.66, ... ] # 1536 numbers totalThat list of numbers is not random. It is the result of an embedding model that was trained to capture semantic meaning numerically. The key property: content with similar meaning produces similar numbers.
King and Queen produce vectors close to each other in 1536 dimensional space. King and Potato produce vectors far apart. Cat and Feline produce vectors almost identical.
This is the entire concept. Everything else is implementation.
Vector Space Intuition
Imagine a 2D map where words are plotted as points. Synonyms cluster together. Related concepts cluster in the same neighborhood. Unrelated words are far apart.
Real embeddings work the same way, just in 1536 dimensions instead of 2. You cannot visualize 1536 dimensions, but the math works identically. Distance between two points in this space represents semantic distance between their meanings.
Three ways to measure distance:
Metric | What it measures | Best for |
|---|---|---|
Cosine similarity | Angle between two vectors (0 to 1) | Most text embeddings. Direction matters, not magnitude. |
Euclidean distance | Straight line distance between points | Image embeddings, spatial data |
Dot product | Projection of one vector onto another | Faster than cosine on L2 normalized vectors |
💡 Key Insight: You do not choose the distance metric. The embedding model does. When you use OpenAI's text-embedding-3-small, you use cosine similarity because that is what the model was trained with. Using the wrong metric on a model that expects another gives nonsense results.
Semantic Similarity vs Keyword Match
This is the most practical distinction in AI search.
Keyword match (traditional SQL, Elasticsearch BM25): Find documents that contain the exact words in the query. Fast, predictable, brittle.
-- This finds "deployment" but misses "release", "rollout", "push to prod"
SELECT * FROM messages WHERE content LIKE '%deployment failed%'Semantic search (embeddings + vector search): Find documents whose meaning is similar to the query, regardless of exact words used.
# This finds "deployment", "release", "rollout", "push to prod"
# because their embeddings are close in vector space
results = vectorstore.similarity_search("deployment failed on prod", k=5)Real example. The query "heart attack" semantically matches documents containing "myocardial infarction," "cardiac arrest," "chest pain with ECG changes," even though none of those documents contain the word "heart attack." A keyword search returns zero results. A semantic search returns the most relevant medical records.
Keyword Match | Semantic Search | |
|---|---|---|
Query: "heart attack" finds | Documents with "heart attack" | Documents about cardiac events (any phrasing) |
Query: "cheap hotels" finds | Documents with those exact words | Documents about "affordable accommodation," "budget stays," "low cost lodging" |
Speed | Very fast (inverted index) | Fast with ANN index, slower without |
Best for | Exact ID lookups, known keywords | Natural language queries, meaning based retrieval |
Most production systems in 2026 use hybrid search: keyword match (BM25) AND semantic search combined, then reranked. This gives you the precision of keyword search with the recall of semantic search.
How Embeddings Are Generated
An embedding model is a neural network trained to compress meaning into a fixed size vector. The training process teaches it that "I love dogs" and "I adore canines" should produce similar vectors, while "stock market crash" and "I love dogs" should produce distant vectors.
import openai
client = openai.OpenAI()
# Generate an embedding for any text
response = client.embeddings.create(
input="How does database indexing work?",
model="text-embedding-3-small"
)
vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}") # 1536
print(f"First 5 values: {vector[:5]}") # [0.023, -0.41, 0.87, ...]You can embed anything: text, code, images, audio, user behavior logs. The embedding model determines what the vector captures and how many dimensions it uses. More dimensions generally means more nuance, but more storage and slower search.
Embedding Model Options
Model | Dimensions | Best for | Cost |
|---|---|---|---|
| 1536 | General purpose text, fast, cheap | $0.02 per 1M tokens |
| 3072 | Higher accuracy, nuanced retrieval | $0.13 per 1M tokens |
| 1024 | English text, strong reranking support | Paid API |
| 1024 | Open source, runs locally, strong benchmark performance | Free |
| 1024 | Multilingual, long document support | Free tier + paid |
| 768 | Open source, low memory, fast | Free |
Rule of thumb: Start with text-embedding-3-small. It handles 90% of use cases, costs almost nothing, and you can always upgrade later. Use bge-large if you need to run locally (no API costs, full data privacy).
How Vector Databases Work Under the Hood
Storing embeddings in a normal database and running SELECT * WHERE similarity > 0.8 would require comparing your query vector against every stored vector. That is O(n) linear search. With 10 million documents, that is 10 million comparisons per query. At 100ms per million comparisons, every query takes 1 second. Unusable.
Vector databases solve this with Approximate Nearest Neighbor (ANN) indexing. The most widely used algorithm is HNSW (Hierarchical Navigable Small World graphs).
HNSW builds a multi layer graph where:
The top layer has few nodes with long range connections (like highways)
Lower layers have more nodes with shorter connections (like local roads)
A query starts at the top, zooms toward the right neighborhood, then zooms in locally
This turns O(n) linear search into O(log n) graph traversal. With 10 million vectors, instead of 10 million comparisons, HNSW finds the nearest neighbors in roughly 20 to 30 graph hops.
The trade off: HNSW returns approximate nearest neighbors, not exact. You might miss the 4th most similar document. In practice, the top 1, 2, and 3 results are almost always correct, and the approximation error is irrelevant for search use cases.

Vector Database Options
Database | Best for | Runs locally | Managed cloud |
|---|---|---|---|
Pinecone | Production, no infra management, serverless | No | Yes |
Weaviate | Full search + vector hybrid, rich schema | Yes | Yes |
Qdrant | High performance, Rust core, low memory | Yes | Yes |
Chroma | Local development, prototyping, simple API | Yes | No |
pgvector | Already use PostgreSQL, simple setup | Yes | Yes (via RDS, Supabase) |
Redis Stack | Already use Redis, sub millisecond search | Yes | Yes |
For most engineers: Start with pgvector if you already run PostgreSQL. Zero new infrastructure. Add CREATE EXTENSION vector; and you have vector search. Migrate to Pinecone or Qdrant if you hit performance limits at scale (tens of millions of vectors).
Real Use Cases
1. Semantic Search User types "how to reset password" into a support portal. Keyword search finds documents with those exact words. Semantic search also finds "forgot credentials," "account recovery," "login assistance," and "2FA troubleshooting," all of which are relevant.
2. RAG (Retrieval Augmented Generation) As covered in our previous article: embed your documents, store in a vector DB, embed the user's question, find the closest document chunks, paste them into the LLM prompt. The vector database is the retrieval engine powering RAG.
3. Recommendation Systems Embed product descriptions. Embed user purchase history (as a combined embedding of products they bought). Find products whose embeddings are close to the user embedding. This is how Spotify's "Discover Weekly" and Amazon's "Customers also bought" work at a high level.
4. Duplicate and Near Duplicate Detection Embed every document in your database. If two documents produce vectors with cosine similarity above 0.97, they are near duplicates. Used in spam detection, plagiarism detection, and deduplicating support tickets.
5. Code Search GitHub Copilot and similar tools embed your codebase. When you type a comment like "// get user by email", the system finds the semantically closest function in your codebase, even if the function is named fetchUserRecord(emailAddress).
What This Means For Engineers
Embeddings are not magic. They are compression. An embedding model compresses a paragraph of text into 1536 numbers. Information is lost. But the most important information, semantic meaning, is preserved well enough to be useful for search and retrieval tasks.
pgvector first, then dedicated vector DB. You do not need Pinecone to get started. If you already run PostgreSQL, add pgvector, embed your data, and start querying. Most applications never need to migrate beyond pgvector. Scale to a dedicated vector database only when you have millions of vectors and sub 10ms latency requirements.
Semantic search does not replace keyword search. It complements it. For user queries that are natural language questions, semantic search wins. For queries that are exact IDs, codes, or known keywords, BM25 keyword search wins. Build hybrid search from day one and you get the best of both.
Job Openings
Software Engineer, New Grad @Stripe: Apply Here
Software Engineer, Payments and Risk @Stripe: Apply Here
Software Engineer, Data & AI @Stripe: Apply Here
Software Engineer (1+ YOE) @Stripe: Apply Here
Software Engineer 2, iOS @Uber: Apply Here
See you in the next one!
Signing Off, Scortier



