Coding standards
Security in ResourceSpace
Developer reference
Database
Action functions
Admin functions
Ajax functions
Annotation functions
API functions
Collections functions
Comment functions
Config functions
CSV export functions
Dash functions
Debug functions
Encryption functions
Facial recognition functions
File functions
General functions
Language functions
Log functions
Login functions
Message functions
Migration functions
Node functions
PDF functions
Plugin functions
Render functions
Reporting functions
Request functions
Research functions
Slideshow functions
Theme permission functions
User functions
Video functions
Database functions
Metadata functions
Resource functions
Search functions
Map functions
Job functions
Tab functions
Test functions

cleanse_string()

Parameters

ColumnTypeDefaultDescription
$string
$preserve_separators
$preserve_hyphen false
$is_html false

Location

include/search_functions.php lines 2370 to 2419

Definition

 
function cleanse_string($string,$preserve_separators,$preserve_hyphen=false,$is_html=false)
    {
    
# Removes characters from a string prior to keyword splitting, for example full stops
    # Also makes the string lower case ready for indexing.
    
global $config_separators;
    
$separators=$config_separators;

    
// Replace some HTML entities with empty space
    // Most of them should already be in $config_separators
    // but others, like ­ don't have an actual character that we can copy and paste
    // to $config_separators
    
$string htmlentities($stringENT_QUOTES|ENT_SUBSTITUTE'UTF-8');
    
$string str_replace(' '' '$string);
    
$string str_replace('­'' '$string);
    
$string str_replace('‘'' '$string);
    
$string str_replace('’'' '$string);
    
$string str_replace('“'' '$string);
    
$string str_replace('”'' '$string);
    
$string str_replace('–'' '$string);

    
// Revert the htmlentities as otherwise we lose ability to identify certain text e.g. diacritics
    
$stringhtml_entity_decode($string,ENT_QUOTES,'UTF-8');

    if (
        
$preserve_hyphen
        
&& (substr($string,0,1) == "-" || strpos($string," -") !== false/*support minus as first character for simple NOT searches */
        
&& strpos($string," - ") == false
        
) {
            
# Preserve hyphen - used when NOT indexing so we know which keywords to omit from the search.
            
$separators=array_diff($separators,array("-")); # Remove hyphen from separator array.
        
}
    if (
substr($string,0,1)=="!" && strpos(substr($string,1),"!")===false)
            {
            
// If we have the exclamation mark configured as a config separator but we are doing a special search we don't want to remove it
            
$separators=array_diff($separators,array("!"));
            }

    if (
$preserve_separators)
            {
            return 
mb_strtolower(trim_spaces(str_replace($separators," ",$string)),'UTF-8');
            }
    else
            {
            
# Also strip out the separators used when specifying multiple field/keyword pairs (comma and colon)
            
$s=$separators;
            
$s[]=",";
            
$s[]=":";
            return 
mb_strtolower(trim_spaces(str_replace($s," ",$string)),'UTF-8');
            }
    }

This article was last updated 8th December 2024 20:35 Europe/London time based on the source file dated 27th November 2024 09:40 Europe/London time.