Checksums are a sequence of numbers and letters produced by running a file through an algorithm which uses a cryptographic hash to produce a unique identifier. Even the slightest change in the original file will produce a different checksum, thus they can be employed to ensure file integrity but also to prevent resource duplication.
To enable checksums add the below to your config.php file:
$file_checksums = true;
It is possible to block duplicate files based on checksums although this has a performance impact so is disabled by default, to enable this use:
Please note that this will not work reliably with $file_checksums_offline=true unless the the checksum script is run frequently.
Note that metadata will affect the checksum of a resource. If you have the same file with different embedded metadata, the resulting checksum will differ and so will not be classed as a duplicate.
Once this is enabled newly uploaded resources will have their checksum generated automatically, checksums can be retrospectively generated for previously uploaded resources by running the script:
This should be run from Command Line and the scheduling priority should be modified and the process I/O scheduling class and priority set so it won't bring down the server if attempting to run on a large number of resources. Please note, you may need to disable the execution time for CLI for PHP for the same reasons.
You may want to consider how much of the file to generate checksums for, by default this is the first 50k of the file.
You should also consider whether to generate checksums in real time - a background cron job (scheduled task) must be used (ie. pages/tools/update_checksums.php). This option is enabled by default but the scheduled task MUST be set in order to work.
The stored checksums can be viewed by choosing the 'CSV export - metadata' option for any set of search results and selecting the 'Include data from all accessible fields' option