读取二进制文件的惯用 C++17 标准方法是什么?

What is the idiomatic C++17 standard approach to reading binary files?(读取二进制文件的惯用 C++17 标准方法是什么?)

本文介绍了读取二进制文件的惯用 C++17 标准方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常我只会使用 C 风格的文件 IO,但我正在尝试现代 C++ 方法,包括使用 C++17 特定功能 std::bytestd::文件系统.

Normally I would just use C style file IO, but I'm trying a modern C++ approach, including using the C++17 specific features std::byte and std::filesystem.

将整个文件读入内存,传统方法:

Reading an entire file into memory, traditional method:

#include <stdio.h>
#include <stdlib.h>

char *readFileData(char *path)
{
    FILE *f;
    struct stat fs;
    char *buf;

    stat(path, &fs);
    buf = (char *)malloc(fs.st_size);

    f = fopen(path, "rb");
    fread(buf, fs.st_size, 1, f);
    fclose(f);

    return buf;
}

将整个文件读入内存,现代方法:

Reading an entire file into memory, modern approach:

#include <filesystem>
#include <fstream>
#include <string>
using namespace std;
using namespace std::filesystem;

auto readFileData(string path)
{
    auto fileSize = file_size(path);
    auto buf = make_unique<byte[]>(fileSize);
    basic_ifstream<byte> ifs(path, ios::binary);
    ifs.read(buf.get(), fileSize);
    return buf;
}

这看起来对吗?这可以改进吗?

Does this look about right? Can this be improved?

推荐答案

我个人更喜欢 std::vector 使用 std::string除非您正在阅读实际的文本文档.make_unique(fileSize); 的问题在于您会立即丢失数据的大小,并且必须将其放入单独的变量中.鉴于它不会零初始化,它可能比 std::vector 快一小部分.但我认为这可能总是被读取磁盘所花费的时间所掩盖.

Personally I prefer std::vector<std::byte>to using std::string unless you are reading an actual text document. The problem with make_unique<byte[]>(fileSize); is that you instantly lose the size of the data and have to carry it in a separate variable. It may be a tiny fraction faster than a std::vector<std::byte> given that it won't zero initialize. But I think that will probably always be overshadowed by the time taken reading off the disk.

所以对于一个二进制文件,我使用这样的东西:

So for a binary file I use something like this:

std::vector<std::byte> load_file(std::string const& filepath)
{
    std::ifstream ifs(filepath, std::ios::binary|std::ios::ate);

    if(!ifs)
        throw std::runtime_error(filepath + ": " + std::strerror(errno));

    auto end = ifs.tellg();
    ifs.seekg(0, std::ios::beg);

    auto size = std::size_t(end - ifs.tellg());

    if(size == 0) // avoid undefined behavior 
        return {}; 

    std::vector<std::byte> buffer(size);

    if(!ifs.read((char*)buffer.data(), buffer.size()))
        throw std::runtime_error(filepath + ": " + std::strerror(errno));

    return buffer;
}

这是我所知道的最快的方法.也避免了一个常见的判断文件中数据大小的错误,因为ifs.tellg()和最后打开文件后的文件大小不一定相同,ifs.seekg(0) 理论上不是定位文件开头的正确方法(尽管它在大多数地方都适用).

This is the fastest method I know of. It also avoids a common mistake in determining the size of the data in the file because ifs.tellg() is not necessarily the same as the file size after opening the file at the end and ifs.seekg(0) is not theoretically the correct way to locate the start of the file (even though it works in practice most places).

来自 std::strerror(errno) 的错误信息保证适用于 POSIX 系统(应该包括 Microsoft,但不确定).

The error message from std::strerror(errno) is guaranteed to work on POSIX systems (that should include Microsoft but not sure).

显然你可以使用 std::filesystem::path const&如果需要,文件路径 代替 std::string .

Obviously you can use std::filesystem::path const& filepath in place of std::string if you want.

另外,特别是对于 C++17 之前的版本,你可以使用 std::vectorstd::vector 如果您没有或不想使用 std::byte.

Also, especially for pre C++17, you can use std::vector<unsigned char> or std::vector<char> if you don't have or want to use std::byte.

这篇关于读取二进制文件的惯用 C++17 标准方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:读取二进制文件的惯用 C++17 标准方法是什么?