---
title: Serendipity, or how I stopped being bored with my own photos
url: https://photostructure.com/about/random-samples/
description: The design story behind PhotoStructure's random samples and hierarchical tags.
date: 2026-05-24
keywords: library, metadata, tags
---


I made the first prototype of PhotoStructure in the early 2000s, stealing development time during my commute.

I built the home page like every photo app I'd used: a screen-full of the most recent photos in my camera roll.

Within a week, I'd memorized the page without trying. The novelty had worn off. **It was boring. It didn't spark joy.**

This caused no small amount of panic---was this project a waste of time? And if the whole point of PhotoStructure was to help you find and _enjoy_ a lifetime of memories, this was a big usability problem. Why scroll if you've seen it all before?

## 🐌 The naive fix that wasn't

I had a brainwave: could I show random photos instead of just recent ones? Technically, it was simple: `ORDER BY RANDOM()` instead of `ORDER BY captured_at DESC`.

It was surprisingly, painfully slow. I was using an embedded Java database that has long since been abandoned, and I'll spare you the nerdy details, but _woof_ it took a long time to load the home page. Tens of seconds.

And the bigger the library, the worse it got — and PhotoStructure was specifically supposed to handle libraries with hundreds of thousands (or maybe even millions!) of photos.

**But every time the page rendered, those random photos sparked joy:** photos I hadn't seen for years.

The performance problem wasn't going to be solved for another decade, because I had no idea how to solve it.

But the boredom issue was solved. I had a design that worked, even if it was slow.

## #️⃣ The folksonomy detour

At my day job, I was leading a product metadata team, working on what seemed to be a completely different problem set, but would turn out to help me avoid a huge design mistake in PhotoStructure.

It was the peak of Web 2.0, and much ink was spilled on the glory and promise of [folksonomies](https://en.wikipedia.org/wiki/Folksonomy): free-form, user-supplied tags with no enforced hierarchy. A decade later it would be popularized via Twitter's "hashtag."

I gave talks about it. I evangelized it internally. _I felt like they could be big_.

Sometimes you have to live with something for a bit to feel the pros and cons, and it turns out that **folksonomies are mostly a mess.**

- We had tag consistency issues with even our most diligent employees. Most of us are not librarians, and we don't have the discipline to reliably and consistently apply the same appropriate tag for any given subject, location, or context.

- They're ambiguous. `#apple` is a fruit or a company. `#java` is an island, a programming language, or a cup of coffee. Without hierarchies, nothing connects `#louvre` and `#eiffeltower`.

A solution that was attempted was imposing a taxonomy: a curated hierarchy that the tags slot into. Or, fancier still, projecting multiple _ontological trees_ from a given taxonomy---but good curation is expensive, schemas ossify, everything is debatable, there are more edge and corner cases than you expect, and the world keeps inventing categories that don't apply to the existing tree. And mapping a free-form tag space onto a curated taxonomy (ontological projection) produces brittle, lossy results.

## 🧪 The hypothesis

I had a hunch. If hierarchical tagging could be _automated_, if the structure could come from the data itself instead of from a human curator, you'd get the discovery benefits of a taxonomy without the maintenance cost of one. And if you could browse hierarchies, you could also _swing across trees where the branches touch_: browse through `Where/France/Paris`, view the `When/2018/July` stream, and hop into another memory branch. Each media file could live in many streams of relevance.

## 💡 The insight

The hierarchies already exist. They're hiding in plain sight, inside the metadata.

📅 Dates: `Year`/`Month`/`Day`

📁 Files: `Folder`/`Subfolder`/`…`

📷 Cameras and lenses: `Make`/`Model`

🗺️ Locations: `Country`/`State`/`County`/`City`

👪 (Many) people's names: `Family`/`Given`

📎 Heck, even [media types](https://en.wikipedia.org/wiki/Media_type) (greybeards still call them MIME types) have a hierarchy, like `image`/`jpeg` and `video`/`av1`.

I didn't need to _build_ a taxonomy. I needed to _extract_ the ones already implicit in every photo's metadata, and then make them browseable as [hierarchical trees](/faq/whats-a-hierarchical-tag/).

That's where the name Photo**Structure** comes from! The structure was already there.

## 🥄 A taste of everything

Once browsing was hierarchical, the original boredom problem dissolved. I could surface a random sample within each child tag instead of from the whole library. The home page shows a few from each year, not twenty from last weekend. Click `When`, and you get a taste from every year you've ever taken a photo. Click a year, and you get a taste from every month. 

It also helps with different "photo velocity": the 5,000+ photos you took on your India adventure resolve to a few dozen samples, while the 50 photos you took on a weekend in your backyard resolve to a few. You get a taste of everything, not just the most recent, and not just the most prolific.

For a while I worried I was wielding hierarchical tags as a golden hammer, projecting structure onto everything because it was the tool I had. After living with it for a few years, I think it's defensible: the hierarchies were already there.

Every page is different. Every visit pulls up a different set. You find yourself looking at vacation photos from 2014 you'd forgotten existed.

{{< figure src="/img/2020/11/when-year-samples.jpg" caption="Every time you visit your `When/` page, you'll see new samples from every year" >}}

## 🍀 A name for the feeling

Researchers in collaborative filtering call this _serendipity_: showing you items you wouldn't have found on your own, and didn't know you wanted. It's a real metric, distinct from accuracy. A "most recent" feed nails accuracy and bores you anyway.

Random sampling within hierarchical tags isn't the _only_ way to chase serendipity, but it's fast, local, transparent, and works.

## 🕰️ The proof

I used to do PhotoStructure development using my own photo library with 500k+ photos and videos, but I found I'd unexpectedly lose an hour here and there, just browsing through old memories.

I now develop and test with a library containing several hundred exemplar photos and videos from camera review websites. I validate scaling performance with my own library, but only with a timer on my desk to keep me from getting distracted by browsing.

Every page is a little different. Every visit, you rediscover something: a serendipity engine for your own life.

