Unicode string functions for Php

Posted: June 24th, 2011 | Author: | Filed under: Php | Tags: , | No Comments »

A friend recently posted a question on twitter regarding lack of unicode support in php string functions. He was specifically after ucfirst, lcfirst and ucwords.

After doing some research I figured I would create a set of custom string functions with unicode support.

if (!function_exists('mb_ucfirst')){
	function mb_ucfirst($str, $e='utf-8') {
		return mb_strtoupper(mb_substr($str, 0, 1, $e), $e) . mb_substr($str, 1, mb_strlen($str, $e), $e);
	}
}

if (!function_exists('mb_lcfirst')){
	function mb_lcfirst($str, $e='utf-8') {
		return mb_strtolower(mb_substr($str, 0, 1, $e), $e) . mb_substr($str, 1, mb_strlen($str, $e), $e);
	}
}

if (!function_exists('mb_sentence')){
	function mb_sentence($str, $e='utf-8') {
		$string = '';
		$sentences = preg_split('/([.?!]+)/', $str, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
		foreach ($sentences as $key => $sentence) {
			$string .= ($key & 1) == 0 ? mb_ucfirst(mb_strtolower(trim($sentence), $e)) : $sentence .' ';
		}
		return trim($string);
	}
}

if (!function_exists('mb_ucwords')){
	function mb_ucwords($str, $e='utf-8'){
		return mb_convert_case($str, MB_CASE_TITLE, $e);
	}
}

Please note that some of the code was taken from comments and suggestions on php.net.
All I have done is modify, rewrite and rename the functions to suit my needs.

Please note that use of mb string functions is heavy on the cpu compared to normal string functions.
Hope this might be helpful for others struggling with unicode strings in php.