Coding standards
Security in ResourceSpace
Developer reference
Database
Action functions
Admin functions
Ajax functions
Annotation functions
API functions
Collections functions
Comment functions
Config functions
CSV export functions
Dash functions
Debug functions
Encryption functions
Facial recognition functions
File functions
General functions
Language functions
Log functions
Login functions
Message functions
Migration functions
Node functions
PDF functions
Plugin functions
Render functions
Reporting functions
Request functions
Research functions
Slideshow functions
Theme permission functions
User functions
Video functions
Database functions
Metadata functions
Resource functions
Search functions
Map functions
Job functions
Tab functions
Test functions

cleanse_string()

Description

Removes characters from a string, for example full stops, prior to keyword splitting - {@see split_keywords()}.
Also makes the string lower case ready for indexing.

(comma and colon)
or minus for NOT searches).

Parameters

ColumnTypeDefaultDescription
$string string Text value which needs to be pre-processed
$preserve_separators bool Set to false to separate keywords when specifying multiple field/keyword pairs
$for_search bool false: string { global $config_separators; // Replace some HTML entities with empty space // Most of them should already be in $config_separators // but others Set to true if you need certain characters to be preserved during a search (e.g. the wildcard
­ like htmlentities$string
| ENT_QUOTES
'UTF-8'; []; if $for_search
' '
ENT_QUOTES

Location

include/search_functions.php lines 2213 to 2261

Definition

 
function cleanse_string($string$preserve_separators$for_search false): string
{
    global 
$config_separators;

    
// Replace some HTML entities with empty space
    // Most of them should already be in $config_separators
    // but others, like ­ don't have an actual character that we can copy and paste
    // to $config_separators
    
$string htmlentities($stringENT_QUOTES ENT_SUBSTITUTE'UTF-8');
    
$string str_replace(' '' '$string);
    
$string str_replace('­'' '$string);
    
$string str_replace('‘'' '$string);
    
$string str_replace('’'' '$string);
    
$string str_replace('“'' '$string);
    
$string str_replace('”'' '$string);
    
$string str_replace('–'' '$string);

    
// Revert the htmlentities as otherwise we lose ability to identify certain text e.g. diacritics
    
$string html_entity_decode($stringENT_QUOTES'UTF-8');

    
$unicode_allowlist = [];

    if (
$for_search) {
        
$unicode_allowlist[] = '*'# wildcard

        // Preserve hyphen so we know which keywords to omit from the search (for a NOT search)
        
if ((substr($string01) == "-" || strpos($string" -") !== false) && strpos($string" - ") === false) {
            
$unicode_allowlist[] = '-';
        }

        
// Preserve the exclamation mark when doing a special search
        
if (substr($string01) == "!" && strpos(substr($string1), "!") === false) {
            
$unicode_allowlist[] = '!';
        }
    }

    
$separators array_diff($config_separators$unicode_allowlist);

    if (!
$preserve_separators) {
        
# Also strip out the separators used when specifying multiple field/keyword pairs (comma and colon)
        
$separators[] = ",";
        
$separators[] = ":";
    }

    return 
mb_strtolower(
        
allow_unicode_characters(str_replace($separators' '$string), $unicode_allowlist),
        
'UTF-8'
    
);
}

This article was last updated 18th January 2026 19:05 Europe/London time based on the source file dated 30th December 2025 09:15 Europe/London time.