What do you mean by “deduplicate”?
Browsing with PhotoStructure is designed to be fast and fun.
As you navigate through your photos and videos, and you have duplicate photos or videos, clicking “next” or “previous” can result in seeing the same thing. But wait: did you not click the button? Is this a bug? Either way: these browsing stutters aren’t fun.
To avoid this, PhotoStructure automatically detects duplicate photo and video variations, and only shows you the “best” variant.
Why you may have duplicates 🔗
There are several reasons why you might have 2 or more copies or variations of any given photo or video:
RAW+JPEG pairs 🔗
Most current digital cameras and even some smartphones support “shooting raw.”
These raw files encode higher sensor fidelity than JPEGs. This additional information can allow you to “post-process” files to get better dynamic range, restore highlight and shadow details, and adjust color balance, with much more flexibility than an JPEG.
Unfortunately, raw images are slow to process, and many image applications can’t handle these files. Most cameras allow shooting in “RAW+JPEG,” where each time you push the shutter button, a JPEG file as well as a RAW image file is written to your memory card. If PhotoStructure didn’t know that these are actually the same image, you’d see two (or more) photos with the same image while browsing your PhotoStructure library.
Cloud backups 🔗
Several photo cloud backup services downsample your photos and videos, and strip much of the metadata from your files, as well.
If you download a local backup from your cloud service, these photos and videos will be duplicates of your original files.
Local edits 🔗
When you make edits to your images, some software will write to a new file rather than overwriting your original.
Local backups 🔗
If you’ve used backup software you’ll have several copies of your photos and videos where the backup destination was configured.
How this relates to automatic organization 🔗
If you’ve enabled automatic organization, PhotoStructure errs on the side of caution, and copies each valid, unique image into your library.
If exactly the same file is found (i.e., precisely the same stream of bytes on disk), it won’t be copied into your library again. All other variants to the image, though, will be copied.
As an example, in the above cases, both the raw and JPEG files will be copied into your library, as well as any unique files from cloud service backups, and local edits.
How files are aggregated 🔗
A number of metadata tags are examined in each file, and if both files have a value for a given tag, and they substantively differ, the files are considered to be different assets.
If the captured-at time matches, but an insufficient number of other metadata tags match, PhotoStructure will compare the actual images of the files. If they are substantively different, the files are considered to be different assets.
You can use the info
tool to compare files
and see if PhotoStructure considers them eligible to be associated to the same
asset.
How does PhotoStructure pick which file to show? 🔗
PhotoStructure picks the “best” image or video based on:
-
Coarse image resolution. Substantively larger image variants will “win” over smaller image variants. To break ties, it then looks at:
-
The most recently updated file. More recently edited variants will “win” over originals. To break ties, it then looks at:
-
Where the file resides. Files in your PhotoStructure library are preferred to files on other volumes (as those can be unmounted and subsequently unavailable). To break ties, it then looks at:
-
The filename. If a file is part of a series, or includes the word “cover”. The “cover” file or file with lowest count wins. To break ties, it then looks at:
-
The mimetype. If the browser directly supports the mimetype of the file, that means it’s easier to stream that file to the browser to support “zoom.”
-
If there are still ties, the least asciibetically-valued file wins.
Since version 1.0 (released in 2020), PhotoStructure supports modifying the
order of these heuristics via the variantSortCriteria
library
setting. Here’s a
description of the fields referenced in this setting:
resolution
: the coarse image resolution. Similar megapixel resolution differences are considered equivalent.fileSize
: the coarse file size. Similar megabyte sizes differences are considered equivalent. This isn’t included in the default sort criteria, as image quality and file size are only loosely correlated due to different compression codecs and recompression artifacts.mtime
: the coarse file modified time (rounded to 2-minute resolution).schemeIdx
: captures “where the file resides” (it references the asset file URI scheme).isCover
: true if the filename includes the word “cover”.count
: set if a filename has a “count”.isBrowserSupported
: true if most browsers can render the given file’s mimetype.