Embedding Space Explorer

Visualizing how neural networks understand document similarity using the Enron email corpus.

Color by

Model

Drag to rotate · Scroll to zoom · Click a point to explore

Click on any point to see email details and nearest neighbors.

How It Works

This visualization projects high-dimensional document embeddings into 3D space using UMAP. Each point represents an email from the Enron corpus—a dataset of ~500,000 emails released during the 2001 investigation.

Documents that appear close together are semantically similar according to the embedding model. You can explore how different neural networks organize this space differently, and how fine-tuning on legal text changes what "similarity" means.

What You Can Do

Hover over any point to see an email preview
Click a point to see the full email and its nearest neighbors
Search to find documents containing specific terms
Switch models to see how fine-tuning reorganizes the embedding space

Why This Matters

In retrieval-augmented generation (RAG) systems, embedding quality directly determines what documents get retrieved. Understanding how embeddings cluster—and where they fail—is essential for building reliable document search in high-stakes domains like legal e-discovery.