Why Char.IsDigit returns true for chars which can#39;t be parsed to int?(为什么 Char.IsDigit 对于无法解析为 int 的字符返回 true?)
问题描述
我经常使用字符.IsDigit
来检查 char
是否是一个数字,这在 LINQ 查询中特别方便以预先检查 int.Parse
如下:"123".All(Char.IsDigit)
.
但是有些字符是数字,但不能像 ۵
那样解析为 int
.
//真bool isDigit = Char.IsDigit('۵');var文化 = CultureInfo.GetCultures(CultureTypes.SpecificCultures);整数;//错误的bool isIntForAnyCulture = 文化.Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num));
这是为什么?我的 int.Parse
-通过 Char.IsDigit
进行预检查是否不正确?
有 310 个字符是数字:
ListdigitList = Enumerable.Range(0, UInt16.MaxValue).Select(i => Convert.ToChar(i)).Where(c => Char.IsDigit(c)).ToList();
以下是 .NET 4 (ILSpy) 中 Char.IsDigit
的实现:
public static bool IsDigit(char c){如果 (char.IsLatin1(c)){返回 c >= '0' &&c <= '9';}返回 CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;}
那么为什么会有属于 DecimalDigitNumber
-category("十进制数字字符,即 0 到 9 范围内的字符...")在任何文化中都不会被解析为 int
吗?
这是因为它正在检查 Unicode数字,十进制数字"类别中的所有数字,如下所列:
http://www.fileformat.info/info/unicode/类别/Nd/list.htm
这并不意味着它是当前语言环境中的有效数字字符.事实上,使用int.Parse()
,你只能解析正常的英文数字,而不管区域设置如何.
例如,这不起作用:
int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));
即使 ٣
是有效的阿拉伯数字字符,并且ar"是阿拉伯语区域设置标识符.
Microsoft 文章 如何:解析 Unicode 数字" 指出那个:
<块引用><块引用>.NET Framework 解析为十进制的唯一 Unicode 数字是 ASCII 数字 0 到 9,由代码值 U+0030 到 U+0039 指定..NET Framework 将所有其他 Unicode 数字解析为字符.
但是,请注意,您可以使用 char.GetNumericValue()
将 unicode 数字字符转换为双精度数字.
返回值是 double 而不是 int 的原因是这样的:
Console.WriteLine(char.GetNumericValue('¼'));//打印 0.25
您可以使用类似的方法将字符串中的所有数字字符转换为它们的 ASCII 等价物:
public string ConvertNumericChars(string input){StringBuilder 输出 = new StringBuilder();foreach(输入中的字符ch){如果 (char.IsDigit(ch)){双值 = char.GetNumericValue(ch);if ((value >= 0) && (value <= 9) && (value == (int)value)){output.Append((char)('0'+(int)value));继续;}}output.Append(ch);}返回 output.ToString();}
I often use Char.IsDigit
to check if a char
is a digit which is especially handy in LINQ queries to pre-check int.Parse
as here: "123".All(Char.IsDigit)
.
But there are chars which are digits but which can't be parsed to int
like ۵
.
// true
bool isDigit = Char.IsDigit('۵');
var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
int num;
// false
bool isIntForAnyCulture = cultures
.Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num));
Why is that? Is my int.Parse
-precheck via Char.IsDigit
thus incorrect?
There are 310 chars which are digits:
List<char> digitList = Enumerable.Range(0, UInt16.MaxValue)
.Select(i => Convert.ToChar(i))
.Where(c => Char.IsDigit(c))
.ToList();
Here's the implementation of Char.IsDigit
in .NET 4 (ILSpy):
public static bool IsDigit(char c)
{
if (char.IsLatin1(c))
{
return c >= '0' && c <= '9';
}
return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;
}
So why are there chars that belong to the DecimalDigitNumber
-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int
in any culture?
It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:
http://www.fileformat.info/info/unicode/category/Nd/list.htm
It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse()
, you can ONLY parse the normal English digits, regardless of the locale setting.
For example, this doesn't work:
int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));
Even though ٣
is a valid Arabic digit character, and "ar" is the Arabic locale identifier.
The Microsoft article "How to: Parse Unicode Digits" states that:
The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.
However, note that you can use char.GetNumericValue()
to convert a unicode numeric character to its numeric equivalent as a double.
The reason the return value is a double and not an int is because of things like this:
Console.WriteLine(char.GetNumericValue('¼')); // Prints 0.25
You could use something like this to convert all numeric characters in a string into their ASCII equivalent:
public string ConvertNumericChars(string input)
{
StringBuilder output = new StringBuilder();
foreach (char ch in input)
{
if (char.IsDigit(ch))
{
double value = char.GetNumericValue(ch);
if ((value >= 0) && (value <= 9) && (value == (int)value))
{
output.Append((char)('0'+(int)value));
continue;
}
}
output.Append(ch);
}
return output.ToString();
}
这篇关于为什么 Char.IsDigit 对于无法解析为 int 的字符返回 true?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:为什么 Char.IsDigit 对于无法解析为 int 的字符返回
- 在哪里可以找到使用中的C#/XML文档注释的好例子? 2022-01-01
- 带有服务/守护程序应用程序的 Microsoft Graph CSharp SDK 和 OneDrive for Business - 配额方面返回 null 2022-01-01
- Web Api 中的 Swagger .netcore 3.1,使用 swagger UI 设置日期时间格式 2022-01-01
- 输入按键事件处理程序 2022-01-01
- MoreLinq maxBy vs LINQ max + where 2022-01-01
- WebMatrix WebSecurity PasswordSalt 2022-01-01
- C#MongoDB使用Builders查找派生对象 2022-09-04
- C# 中多线程网络服务器的模式 2022-01-01
- 良好实践:如何重用 .csproj 和 .sln 文件来为 CI 创建 2022-01-01
- 如何用自己压缩一个 IEnumerable 2022-01-01