What do you mean by “deduplicate”?
Browsing with PhotoStructure is designed to be fast and fun.
As you navigate through your photos and videos, and you have duplicate photos or videos, clicking “next” or “previous” can result in seeing the same thing. But wait: did you not click the button? Is this a bug? Either way: these browsing stutters aren’t fun.
To avoid this, PhotoStructure automatically detects duplicate photo and video variations, and only shows you the “best” variant.
Why you may have duplicates 🔗
There are several reasons why you might have 2 or more copies or variations of any given photo or video:
RAW+JPEG pairs 🔗
Most current digital cameras and even some smartphones support “shooting raw.”
These raw files encode higher sensor fidelity than JPEGs. This additional information can allow you to “post-process” files to get better dynamic range, restore highlight and shadow details, and adjust color balance, with much more flexibility than an JPEG.
Unfortunately, raw images are slow to process, and many image applications can’t handle these files. Most cameras allow shooting in “RAW+JPEG,” where each time you push the shutter button, a JPEG file as well as a RAW image file is written to your memory card. If PhotoStructure didn’t know that these are actually the same image, you’d see two (or more) photos with the same image while browsing your PhotoStructure library.
Cloud backups 🔗
Several photo cloud backup services downsample your photos and videos, and strip much of the metadata from your files, as well.
If you download a local backup from your cloud service, these photos and videos will be duplicates of your original files.
Local edits 🔗
When you make edits to your images, some software will write to a new file rather than overwriting your original.
Local backups 🔗
If you’ve used backup software you’ll have several copies of your photos and videos where the backup destination was configured.
How this relates to automatic organization 🔗
If you’ve enabled automatic organization, PhotoStructure errs on the side of caution, and copies each valid, unique image into your library.
If exactly the same file is found (i.e., precisely the same stream of bytes on disk), it won’t be copied into your library again. All other variants to the image, though, will be copied.
As an example, in the above cases, both the raw and JPEG files will be copied into your library, as well as any unique files from cloud service backups, and local edits.
How files are aggregated 🔗
A number of metadata tags are examined in each file, and if both files have a value for a given tag, and they substantively differ, the files are considered to be different assets.
If the captured-at time matches, but an insufficient number of other metadata tags match, PhotoStructure will compare the actual images of the files. If they are substantively different, the files are considered to be different assets.
You can use the info
tool to compare files
and see if PhotoStructure considers them eligible to be associated to the same
asset.
How does PhotoStructure pick which file to show? 🔗
In general, PhotoStructure picks the “best” image or video variation with the largest resolution that lives in your library.
In an effort to make PhotoStructure’s “best” pick be predictable, though,
there are a number of other file metadata attributes that PhotoStructure also
uses. The variantSortCriteria
library
setting allows you to
customize how PhotoStructure picks your library’s “best”.
Here’s the list of those fields, in default priority order, as of v2023:
-
resolution
: the coarse image resolution. Similar megapixel resolution differences are considered equivalent. -
schemeIdx
: captures “where the file resides” (it references the asset file URI scheme). This prefers files stored in your library over files found outside your library. -
capturedAtPrecision
: variations that contain more reliable captured-at metadata will be preferred. -
metadataCoverage
: prefer files with more fields with metadata we care about -
isBrowserSupported
: prefer files we can directly stream to the browser without re-rendering or transcoding -
isEditOrUpdate
: prefer files whose basename includes “edit” or “update”. Many editing applications will save “file-updated.jpg” instead of overwriting the original file. -
isCover
: If we have a burst files, prefer the “burst cover” -
count
: If there are many copies of a file (image.jpg, image (1).jpg, image (2).jpg), prefer the one with the highest number (assuming that’s the latest copy) -
mtime
: prefer that newest version. Note that many backup applications don’t retainmtime
correctly, so we don’t really trust this value -
basename
: this helps make sorting deterministic if all other factors are the same -
parentBasename
: this helps make sorting deterministic -
uri
: this is simply to make sorting deterministic if all other factors are the same