Integrations

Checksums

Checksums are a sequence of numbers and letters produced by running a file through an algorithm which uses a cryptographic hash to produce a unique identifier. Even the slightest change in the original file will produce a different checksum, thus they can be employed to ensure file integrity but also to prevent resource duplication.

To enable checksums add the below to your config.php file:

$file_checksums = true;

It is possible to block duplicate files based on checksums although this has a performance impact so is disabled by default, to enable this use:

$file_upload_block_duplicates=true;

Please note that this will not work reliably with $file_checksums_offline=true unless the the checksum script is run frequently.

Note that metadata will affect the checksum of a resource. If you have the same file with different embedded metadata, the resulting checksum will differ and so will not be classed as a duplicate.

Advanced settings

Once this is enabled newly uploaded resources will have their checksum generated automatically, checksums can be retrospectively generated for previously uploaded resources by running the script:

pages/tools/update_checksums.php --recreate

Important

  • This script should be run from the command line
  • The scheduling priority should be modified and the process I/O scheduling class and priority set so it won't negatively impact the server if attempting to run on a large number of resources
  • You may need to disable the maximum execution time for CLI for PHP for the same reason

Other settings

$file_checksums_50k

You may want to consider whether the whole file should be checked.  By default  the checksum is only calculated on the first 50k of a file. To use the whole file set this config option to false.

$file_checksums_offline

You should also consider whether to generate checksums in real time - if set to true then a background cron job (scheduled task) must be used to run the pages/tools/update_checksums.php script

Note that by default this option was set to true until version 10.4 after that release it will default to false.

Viewing checksums

The stored checksums can be viewed by choosing the 'CSV export - metadata' option for any set of search results and selecting the 'Include data from all accessible fields' option

Resource file integrity validation

Refer to this page on file integrity checking