2024 PhotoStructure release notes
Note that these are notes for versions released in 2024.
Please see the current release notes.
v2024.3.3-beta
-prealpha released 2024-03-20 and promoted to -beta on 2024-03-24
Replaced the ExifTool health check timeout with
statTimeoutMs(which defaults to 30s – prior builds timed out after 7 seconds)π Node.js v21 support: fixed
DeprecationWarning: The ``punycode`` module is deprecated.π¦ Update Docker image to Node.js 20.11
π¦ Update SQLite tooling to 3.45
v2024.3.2-beta
-prealpha released 2024-03-14 π₯§ and promoted to -beta on 2024-03-20
π The warning message
Error: env(): failed to read .env filecaused sync to fail to run. This warning is now only emitted by the main service, and only if the file exists.π Database migrations were edited to try to gracefully recover from some types of partially-applied migrations. This should remedy many issues like this.
π Geolocation fields are now deleted if GPS is (0,0). Upgraded libraries will auto-resync all assets tagged with
Where|Ghana|Western|Takoradi(the nearest city to Null Island).π¦ The
/settingspage redirected to the health check page if settings took longer than a second to fetch(!!).π¦ Added several more
/System/Volumeexclude globs to avoid macOS system subdirectories (thanks for the assist, AlanH!π¦ Add
tagGeoSynonymssetting:Due to EXIF and XMP specification drift, there are several ways for geolocation information to be encoded in files. When PhotoStructure applies the “tagGeoTemplate”, we’ll use these “synonyms” to build the geo tag (first synonym with a value wins). See https://exiftool.org/forum/index.php?topic=13811.msg74413#msg74413 for details.
π¦ Add
writeGeolocationTagsToLibraryCopiessetting:When enabled, inferred geolocation tags will backfill into Country/State/City tags in the library copy (by default, into an XMP sidecar).
This defaults to false, as reverse-geo lookup results can change over time (and should be done on-demand, rather than stored statically and drift into inaccuracies).
π¦ Merged
commandTimeoutMsandstatTimeoutMssettings–they both defaulted to 30s, and given the presence oftaskTimeoutMs(which defaults to 2m) having all three timeouts was confusing.π¦
syncCronTZnow defaults toTZifTZis a valid IANA time zone (like “America/Los_Angeles”).π¦
infoimprovements: include captured-at raw EXIF values for files, and if--load-libraryis specified, include db library setup metadata (likelibraryDbFile,libraryDbBackupDir, anduseReplica).
v2024.3.1-prealpha “Zep”
Released 2024-03-08
π Extended database migration timeouts to 2 minutes by default. See
dbMaintenanceTimeoutMssetting for details. Should resolve this issue.π Added new migration to re-assert the
Progresstable schema. Should resolve this issue.π The webservice now re-writes
settings.tomlfiles from prior versions, to ensure the latest settings are visible. Thanks for reporting, @tkohhh! Older versions ofsettings.tomlare now moved to./archive(it had been./old).π
syncwon’t be started if any health checks post fatal errors.π
mainrenders service startup errors tostderrnow and still tries to spin up the web service (in an effort to try to get the health check page to the user)
v2024.3.0-prealpha
Released 2024-03-08
βοΈ Version format change
I’m adopting a simpler version format: $year.$month.$build, where $build starts at zero at the beginning of the month, and gets incremented for every prealpha, alpha, beta, or stable release. For non-stable releases, -$channel is appended to the version format.
As an example, a build might be v2024.1.7-beta. If it proves sufficiently stable, the same code may be re-released as v2024.1.7.
πΊοΈ New geo location tagger
PhotoStructure now adds Where/Country/Region/City tags for those photos and videos with Latitude and Longitude metadata.
Note that this feature uses an embedded geo database, so no network access is required. This initial implementation only includes cities with a population of 1000 or greater. See the new tagGeo and tagGeoTemplate settings for more details.
π Sync improvements
Previous builds of PhotoStructure had two work queues: one single-threaded work queue for videos, and one multithreaded work queue for images. This was a hack workaround to prevent concurrent ffmpeg invocations. ffmpeg attempts to use all cores by default, resulting in CPU overscheduling.
We’ve since found a fairly reliable way to single-thread ffmpeg, so sync now schedules both video and image work in a single queue, which greatly simplifies the code, and results in higher parallelism (!!). Anecdotally, prior builds would sync several hundred exemplar videos and photos in roughly 3 minutes. This build now completes that same task in under 90 seconds on the same hardware.
PhotoStructure’s task queuing system was also rewritten. Previous builds used a completely separate SQLite schema and database for work scheduling, in an attempt to keep that workload partitioned from the web service. With the new taskListCap setting, task schedulers receive backpressure if the Task table is “full.” This backpressure ensures the table doesn’t grow unbounded, so it felt safe to migrate it into the models database and delete the work queue database. This also allows web to schedule work for sync reliably without socket RPC or JSON watchfile overhead (again, allowing another good chunk of code to be deleted).
π New stuck-task watchdog
For larger libraries with tens of thousands of ffmpeg transcodes, a sync could get “stuck” waiting for an ffmpeg transcode completion that exited abnormally. v2 builds had a hard timeout value based on video duration, but that proved problematic for slower computers and for more advanced codecs that require more computation to decode, so video transcode timeouts were dropped in v2023. More advanced video transcode timeouts were built that adjusted dynamically based on current system performance and processed pixel count, but this implementation was difficult to test rigorously, and the least-squares interpolation implementation was replaced with a new stuck-task watchdog.
When users reported their sync process was “stuck,” they’d always report that their system’s CPU was idle but that things weren’t done.
So, instead of fancy-pants pixels-processed-per-mimetype least-squares timeout interpolation complexity, why can’t PhotoStructure just do what the users are doing in these situations?
So now it does!
While sync is currently processing, every five minutes it will check if the system load is “idle” (by default, less than 50% of one busy core, but this is adjustable). If it is, any task that has run longer than the last check will be assumed to be stuck, and will be marked as failed. See the new stuckCheckIntervalMs and minBusyPct settings for more details.
To make this work, Tasks are now be abortable externally, and know how to clean up gracefully, including killing child processes and notifying sync-report.
β¨ Improvements and bug fixes
β¨ Sync is now scheduled by a crontab entry. Prior builds waited a static amount of time between completion of last sync and start of next sync, which resulted in unpredictable sync run times. By default PhotoStructure will now kick off
syncevery night at 2AM local time, but this is configurable now–and don’t worry, any scheduling overruns are automatically skipped. SeesyncCronandsyncCronTZsettings for details – be sure to setsyncCronTZto ensure “2AM” really is in local time.π Fixed
Error: cannot store REAL value in INTEGER column Progress.completePct. This could cause library upgrades from v1.1.0 to fail as well. Thanks for reporting, Alan!π Using force-sync via the nav menu in some situations would only work once, as the persistent operation wasn’t resolved after sync completed if there were any rejected tasks. This should be resolved.
π
.NoMediacould be ignored in some situations after initial directory scans. This should be resolved.π There were several edge cases that could prevent
syncfrom properly no-op’ing unchanged files, which could result insynctaking a long time to process previously-scanned directories. This should be resolved.π PhotoStructure for Docker’s About > Sync Information table could show the library path twice. This should be resolved. Thanks for the report, Gijsh!
π/π¦ Overlapping “Empty trash” and “Remove assets” actions could result in only a subset of assets actually being removed or excluded. These operations have been converted to the new task infrastructure with serial mutexes to avoid issues around concurrency.
π/π¦ LibRAW’s support for current flagship mirrorless camera RAW file formats is lacking. PhotoStructure can still show a preview for those RAW images (most of which embed a full-resolution JPEG), so we’re changing the default setting for
validateRawImagestofalsein this build. Future builds will probably switch to rawtherapee for RAW rasterization.π¦
Empty trashandRemove assetsnow write sync report records.π¦
SyncDirectorywrites both scan-complete and sync-complete sync report records with elapsed time.π¦ Added
excludeHiddensetting:PhotoStructure may check for filesystem “hidden” metadata flags on macOS and Windows filesystems, and automatically skip importing those files.
As of v2023, this defaults to “false”, as most people don’t use this filesystem feature, and it’s expensive for PhotoStructure to check for this flag on every file it imports.
This setting is ignored on Linux systems.
π¦ Added
skipWriteVolumeUuidFilesWithNoMediasetting:When true, PhotoStructure will NOT write files with universally unique identifiers into the root directory of volumes that have been marked with a NoMedia file or folder. If writeVolumeUuidFiles is false this setting is ignored.
π¦ Added
workQueueHighWater/taskListCapsetting:When PhotoStructure scans a directory, the first step is to walk the directory and search for files to import. When the work queue is larger than this value, sync will pause looking for additional files to process. This limits the size of the work queue to not fall over when there are hundreds of thousands or millions of files to import due to IOWAIT or memory oversubscription. Note that until the last batch of work is scheduled, ETAs will be inaccurate.
Set this to 0 to disable.
π¦ Added
maxValidFutureMssetting:If PhotoStructure encounters a year that is more than this value in the future, it will consider that source to be invalid and look elsewhere for the captured-at date for that given file.
Set to 0 to disable future date filtering.
π¦ Added
forceFilterssetting:When set, all files filters will be applied to visited files. If this is false, files already in the library database will be assumed to be validly passing all import filters. This is set to true by default when rebuilding libraries.
This setting is transient and only set via environment variables.
π¦ Setting
cpuBusyPercent=0now supports “single threaded” mode, which tells sync to:- ignore system load
- consume one CPU core (roughly)
Note that we don’t do CPU pinning, so load from the single-threaded process will probably bounce across portions of different cores, depending on your OS. Expect system load to be about 0.75-1.5 (or about ~75-150% of a core) due to graphics, SQLite, ExifTool overhead.
π¦ Database maintenance tasks and relevant health check timeouts have been extended to 1 minute and can be configured with
dbMaintenanceTimeoutMssetting, which default to 1 minute per database operationπ¦ Maintenance tasks check periodically if the service is ending, rather than running to completion (which could take several minutes, causing the library database to be left in a corrupt state).
π¦
taskTimeoutMsandcommandTimeoutMscan now be validly0: previous builds would pass this value directly on to batch-cluster, which does not accept values of less than 10.π¦
AbortErrors are no longer considered “retriable” (which could cause issues under high concurrency).π¦ Tag asset count rebuilds now run non-recursive updates for leaf tags, which are dramatically faster than CTE queries. This can speed up tag asset count rebuilds for larger libraries by more than 5-10x. You can force a
Tag.assetCountrebuild with./photostructure info --reindex.π¦ Replace
axioswith directnode:http(less dependencies are always better)π¦ Volume and mountpoint watches are now only enabled if
scanAllDrivesis enabled. This may reduce idle CPU and disk activity.π¦ When available, volume and mountpoint metadata reads directly from
/proc/instead of forkingdfandmountto gather the same information.π¦ Library rebuilds now serialize only asset re-aggregation tasks. All other steps are parallelized.
π¦
UV_THREADPOOL_SIZEcan be overridden via the newwebUvThreadsandsyncUvThreadssettings:Higher values may allow for more concurrent requests, but may also consume more memory and CPU and overwhelm non-SSD storage. The default is 4, which should be fine for most installations. Read more about
UV_THREADPOOL_SIZE: https://nodejs.org/api/cli.html#cli_uv_threadpool_size_sizeπ Several tools didn’t respond correctly to the
--no-coloroption.π¦ Non-retriable library database errors now force-close and reopen the database handle, improving error recovery.
π¦ Most file operations now use “work-in-progress” files. These files are now unique per-call (using
.WIP-${RANDOM_SHORT_UID}-${destination_basename}), which helps avoid issues from inadvertent concurrent file operations.π¦ Database models are now batch-reloaded on upsert, which can dramatically reduce db query load during
sync.π¦ When
syncis killed or shut down while actively importing files, those files are now marked as a newcanceledstate in the sync report. The next timesyncis restarted, those tasks should be retried automatically. Prior builds would mark those files as failed and require another fullsyncrun to recover gracefully.
π¦ list improvements:
π¦ The
listtool has a bunch of new options:--primary Only include the primary, or "best" asset file variation found for every asset. See https://phstr.com/dedupe for details. --no-primary Exclude primary asset file variation for every asset. This is mutually exclusive with the --primary option, and returns all rows that option omits. -0, --print0 Print each full native path name to standard output, followed by a null character (instead of the newline character). This is suitable for properly handling filenames that include whitespace characters in shell pipelines using commands like xargs, which has a "--null" mode which expects filenames to be separated by the null character. This cannot be used with --json or --dump. --todo List the currently enqueued files that sync is going to process next. Implies --json. Does not support --print0. --tags List all tag paths along with their counts. Implies --json. Does not support --print0.π
list --limitworks now. Prior builds could miss adding the sqlLIMITclause to the query.π
list --jsonnow emits a valid JSON array of objects, so you can pipe the output to, say,jq .Prior builds would emit individual JSON objects separated by newlines, which most JSON-consuming tools don’t know how to deal with.

