Embedding Space Explorer
Visualizing how neural networks understand document similarity using the Enron email corpus.
Click on any point to see email details and nearest neighbors.
How It Works
This visualization projects high-dimensional document embeddings into 3D space using UMAP. Each point represents an email from the Enron corpus—a dataset of ~500,000 emails released during the 2001 investigation.
Documents that appear close together are semantically similar according to the embedding model. You can explore how different neural networks organize this space differently, and how fine-tuning on legal text changes what "similarity" means.
What You Can Do
- Hover over any point to see an email preview
- Click a point to see the full email and its nearest neighbors
- Search to find documents containing specific terms
- Switch models to see how fine-tuning reorganizes the embedding space
Why This Matters
In retrieval-augmented generation (RAG) systems, embedding quality directly determines what documents get retrieved. Understanding how embeddings cluster—and where they fail—is essential for building reliable document search in high-stakes domains like legal e-discovery.