为什么默认情况下 C++ 中的宽文件流会缩小写入的数据?-C/C++问题

Why does wide file-stream in C++ narrow written data by default?(为什么默认情况下 C++ 中的宽文件流会缩小写入的数据?)

本文介绍了为什么默认情况下 C++ 中的宽文件流会缩小写入的数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

老实说，我只是在 C++ 标准库中没有得到以下设计决策.将宽字符写入文件时，wofstream 会将 wchar_t 转换为 char 字符:


Honestly, I just don't get the following design decision in C++ Standard library. When writing wide characters to a file, the wofstream converts wchar_t into char characters:
#include <fstream>
#include <string>

int main()
{
    using namespace std;

    wstring someString = L"Hello StackOverflow!";
    wofstream file(L"Test.txt");

    file << someString; // the output file will consist of ASCII characters!
}

我知道这与标准 codecvt 有关.utf8的codecvt" rel="nofollow noreferrer">Boost.此外，utf16 提供了一个 codecvtbinary-mode/208431#208431">马丁约克在这里.问题是为什么 standard codecvt 转换宽字符?为什么不按原样写字符！

I am aware that this has to do with the standard codecvt. There is codecvt for utf8 in Boost. Also, there is a codecvt for utf16 by Martin York here on SO. The question is why the standard codecvt converts wide-characters? why not write the characters as they are!

另外，我们会用 C++0x 获得真正的 unicode 流 还是我在这里遗漏了什么?

Also, are we gonna get real unicode streams with C++0x or am I missing something here?

推荐答案

C++ 用于字符集的模型继承自 C，因此至少可以追溯到 1989 年.

The model used by C++ for charsets is inherited from C, and so dates back to at least 1989.

两个要点:

IO 是根据字符完成的.
确定字符序列化的宽度是语言环境的工作
默认语言环境(名为C")非常小(我不记得标准中的约束，这里它只能将 7 位 ASCII 作为窄字符集和宽字符集处理).
有一个名为"的环境确定的语言环境

所以要得到任何东西，你必须设置语言环境.

So to get anything, you have to set the locale.

如果我使用简单的程序

#include <locale>
#include <fstream>
#include <ostream>
#include <iostream>

int main()
{
    wchar_t c = 0x00FF;
    std::locale::global(std::locale(""));
    std::wofstream os("test.dat");
    os << c << std::endl;
    if (!os) {
        std::cout << "Output failed
";
    }
}

使用环境语言环境并将代码 0x00FF 的宽字符输出到文件中.如果我要求使用C"语言环境，我得到

which use the environment locale and output the wide character of code 0x00FF to a file. If I ask to use the "C" locale, I get

$ env LC_ALL=C ./a.out
Output failed

语言环境无法处理宽字符，我们会在 IO 失败时收到问题通知.如果我运行询问 UTF-8 语言环境，我会得到

the locale has been unable to handle the wide character and we get notified of the problem as the IO failed. If I run ask an UTF-8 locale, I get

$ env LC_ALL=en_US.utf8 ./a.out
$ od -t x1 test.dat
0000000 c3 bf 0a
0000003

(od -t x1 只是转储以十六进制表示的文件)，正是我对 UTF-8 编码文件的期望.

(od -t x1 just dump the file represented in hex), exactly what I expect for an UTF-8 encoded file.

这篇关于为什么默认情况下 C++ 中的宽文件流会缩小写入的数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！