StringComparision中区域性字符串的比较

来源：互联网发布：域名怎么购买编辑：程序博客网时间：2024/04/30 09:54

地球上有很多国家，这些国家有不同的语言。而且，在拉丁语系，有些语言跟英语又特别的像，但是表示的意思又不相同。所以，我们比较字符串的时候，会遇到这种区域性的问题。

一般我们比较两个字符串是否一样，或者说比较两个字符串的大小，都是使用String.Compare(str1, str2)进行比较。如果是0的话就表示两个字符串一样。其实，这时候大多情况下都是没有问题的，可能我们的程序只有局限在某在地方的人使用，而不是世界各个角落都使用。

如果是非常特殊的情况下，这种方式进行比较可能就会带来问题。我们举下边这个例子。首先，将CurrentCulture设置为丹麦的丹麦语，并比较字符串“Apple”和“Æble”。丹麦语将字符Æ 视为单个字母，并在字母表中将其排在 Z之后。因此，对于丹麦语区域性，字符串“Æble”比“Apple”大。接下来，将 CurrentCulture 设置为美国英语，并再次比较字符串“Apple”和“Æble”。这次，字符串“Æble”被确定为小于“Apple”。英语语言将字符Æ 视为一个特殊符号，并在字母表中将其排在字母 A 之前。

String.Compare(str1, str2)在进行比较的时候，是根据系统culture进行的，默认的是控制面板中设置的culture，如果在代码中额外进行了设置，那么就以最新设置的culture为准。

static void Main(string[] args){            string str1 = "Apple";            string str2 = "Æble";            // Sets the CurrentCulture to Danish in Denmark.            Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");            // Compares the two strings.            int result1 = String.Compare(str1, str2, StringComparison.Ordinal);            Console.WriteLine("\nWhen the CurrentCulture is \"da-DK\",\nthe " +                              "result of comparing {0} with {1} is: {2}", str1, str2,                              result1);            // Sets the CurrentCulture to English in the U.S.            Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");            // Compares the two strings.            int result2 = String.Compare(str1, str2, StringComparison.Ordinal);            Console.WriteLine("\nWhen the CurrentCulture is \"en-US\",\nthe " +                              "result of comparing {0} with {1} is: {2}", str1, str2,                              result2);}

这里，因为语言环境的不同，比较结果不同。默认只有两个参数的compare方法，String.Compare(str1, str2)，对于每个字符进行比较的时候，是根据这个字符在该种语言中的特定语境来比较的，也可以理解为这个字母在这种语言中的所表达的特定的意思。举个例子说，Æ在丹麦语中是你好的意思，Æ在英语中是吃饭的意思，意思不一样，字母的值就不一样。

还有一种特殊的比较方式是比较每个字符最深层次的UTF码的大小。所有的字符都用UTF8来进行表示。就上边这个特殊字符来讲，虽然在丹麦语与英语中表达的意思不一样，但是在电脑中存储这个字符使用的都是相同的UTF8吗，比如说\u1234，如果我们使用UTF来进行比较，那么我们比较的宗旨就是，我不管你这个字符在某个语言中的具体的意思，你长成这个样子，我就把你当成唯一一个字符。所以，如果把上边的代码改成如下，那么比较结果就是两种语言环境下，都是-133，这个值是UTF8码的最终差值的总和。

static void Main(string[] args){             string str1 = "Apple";            string str2 = "Æble";            // Sets the CurrentCulture to Danish in Denmark.            Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");            // Compares the two strings.            int result1 = String.Compare(str1, str2, StringComparison.Ordinal);            Console.WriteLine("\nWhen the CurrentCulture is \"da-DK\",\nthe " +                              "result of comparing {0} with {1} is: {2}", str1, str2,                              result1);            // Sets the CurrentCulture to English in the U.S.            Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");            // Compares the two strings.            int result2 = String.Compare(str1, str2, StringComparison.Ordinal);            Console.WriteLine("\nWhen the CurrentCulture is \"en-US\",\nthe " +                              "result of comparing {0} with {1} is: {2}", str1, str2,                              result2);}

如果有人问，那到底应该用哪种比较方式啊。我觉得还是用的UTF比较更多一些吧。因为这个具有唯一性，在任何的语言系统中值都是一定的，对于我们写代码是一种容易控制的方式。

0 0