练习(面试题) ：字符串截取

来源：互联网发布：淘宝天下电话号码编辑：程序博客网时间：2024/06/05 01:02

在java中，字符串“abcd”与字符串“ab你好”的长度是一样，都是四个字符。

但对应的字节数不同，一个汉字占两个字节。

定义一个方法，按照指定的字节数来取子串。

如：对于“ab你好”，如果取三个字节，那么子串就是ab与“你”字的半个，那么半个就要舍弃。如果取四个字节就是“ab你”，取五个字节还是“ab你”。

在这里提醒一下：汉字编码GBK与UTF-8存在差异，在GBK中，汉字占两个字节，UTF-8中，汉字主要落在三区，大部分占了三个字节。

代码实现：

<span style="font-family:Times New Roman;font-size:14px;"><span style="font-family:Times New Roman;font-size:14px;">package cn.hncu.io.ex;import java.io.UnsupportedEncodingException;public class StringCutUtil {private StringCutUtil(){}public static String cutStringByByte(String str, int len){if(System.getProperty("file.encoding").equalsIgnoreCase("gbk")){return cutStringByByteGBK(str,len);}if(System.getProperty("file.encoding").equalsIgnoreCase("utf-8")){return cutStringByByteUTF8(str,len);}return "当前系统不支持中文!";}   private static String cutStringByByteGBK(String str, int len){   try {byte buf[] = str.getBytes("gbk");   int count=0;   for(int i=len-1;i>=0;i-- ){   if(buf[i]<0){   count++;   }else{   break;   }   }   if(count%2==0){   return new String(buf, 0, len,"gbk");   }else{   return new String(buf,0,len-1,"gbk");   }} catch (UnsupportedEncodingException e) { throw new RuntimeException("字符编码异常，该字符串中包含非gbk字符", e);}   }      private static String cutStringByByteUTF8(String str, int len){   try {   byte buf[] = str.getBytes("utf-8");   int count=0;   for(int i=len-1;i>=0;i-- ){   if(buf[i]<0){   count++;   }else{   break;   }   }   if(count%3==0){   return new String(buf, 0, len,"utf-8");   }else{   return new String(buf,0,len-count%3,"utf-8");   }   } catch (UnsupportedEncodingException e) {   throw new RuntimeException("字符编码异常，该字符串中包含非gbk字符", e);   }      }      public static void main(String[] args) {  //观察   String str ="ab你好h3h城市d琲琲";//汉字“琲”的编码是一负一正，并不是一般汉字的两个负数码值   byte bs[]=null;//   bs = str.getBytes();try {bs = str.getBytes("gbk");//bs = str.getBytes("utf-8");} catch (UnsupportedEncodingException e) {e.printStackTrace();}   for(byte b:bs){   System.out.print(b+" ");   }      //单元测试   System.out.println();   for(int i=1;i<=bs.length;i++){   //String ss = cutStringByByteGBK(str,i);   //String ss = cutStringByByteUTF8(str,i);   String ss = cutStringByByte(str,i);   System.out.println(i+": "+ss);   }      }}</span></span>

为了更加实用，不用在截取字符时考虑编码问题，我把两者做了统一处理，将会以系统默认编码进行截取。

0 0