下面来总结一些常用的汉字转换成Unicode编码PHP程序实现代码,我们只要了解到Unicode编码与gbk编码之间的内置转换原理即可了.
汉字转换成unicode方法,代码如下:
- <?php
-
- function htou($c){
- $n = (ord($c[0]) & 0x1f) << 12;
- $n = (ord($c[1]) & 0x3f) << 6;
- $n = ord($c[2]) & 0x3f;
- return $n;
- }
-
-
- function my_utf8_unicode($str) {
- $encode='';
- for($i=0;$i<strlen($str);$i ){
- if(ord(substr($str,$i,1))> 0xa0){
- $encode.='&#'.htou(substr($str,$i,3)).';';
- $i =2;
- }else{
- $encode.='&#'.ord($str[$i]).';';
- }
- }
- return $encode;
- }
-
- echo my_utf8_unicode("哈哈ABC");
- ?>
汉字转换成unicode方法二,代码如下:
- function getUnicode($word)
- {
-
- $word0 = iconv('gbk', 'utf-8', $word);
- $word1 = iconv('utf-8', 'gbk', $word0);
- $word = ($word1 == $word) ? $word0 : $word;
-
- preg_match_all('#(?:[x00-x7F]|[xC0-xFF][x80-xBF]+)#s', $word, $array, PREG_PATTERN_ORDER);
- $return = array();
-
- foreach ($array[0] as $cc)
- {
- $arr = str_split($cc);
- $bin_str = '';
- foreach ($arr as $value)
- {
- $bin_str .= decbin(ord($value));
- }
- $bin_str = preg_replace('/^.{4}(.{4}).{2}(.{6}).{2}(.{6})$/','$1$2$3', $bin_str);
- $return[] = '&#' . bindec($bin_str) . ';';
- }
-
- return implode('', $return);
- }
函数用法,代码如下:
- $word = '一个汉字转换成Unicode四字节编码的PHP函数。';
- echo getUnicode($word);
-
-
-
-
-
-
这一组函数可以将汉字转成unicode编码,也可以将unicode解码成汉字.
将汉字转成Unicode的函数,代码如下:
- function uni_encode ($word)
- {
- $word0 = iconv('gbk', 'utf-8', $word);
- $word1 = iconv('utf-8', 'gbk', $word0);
- $word = ($word1 == $word) ? $word0 : $word;
- $word = json_encode($word);
- $word = preg_replace_callback('/\\u(w{4})/', create_function('$hex', 'return '&#'.hexdec($hex[1]).';';'), substr($word, 1, strlen($word)-2));
- return $word;
- }
对Unicode编码进行解码的函数,代码如下:
- function uni_decode ($uncode)
- {
- $word = json_decode(preg_replace_callback('/&#(d{5});/', create_function('$dec', 'return '\u'.dechex($dec[1]);'), '"'.$uncode.'"'));
- return $word;
- }
|