获取文件 SHA256 哈希码和校验和

Get a file SHA256 Hash code and Checksum(获取文件 SHA256 哈希码和校验和)

本文介绍了获取文件 SHA256 哈希码和校验和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

之前我问过一个关于组合 SHA1+MD5 的问题,但之后我明白了计算 SHA1 然后计算 lagrge 文件的 MD5 并不比 SHA256 快.在我的例子中,一个 4.6 GB 的文件大约需要 10 分钟,在 Linux 系统中使用默认实现 SHA256 和 (C# MONO).

Previously I asked a question about combining SHA1+MD5 but after that I understand calculating SHA1 and then MD5 of a lagrge file is not that faster than SHA256. In my case a 4.6 GB file takes about 10 mins with the default implementation SHA256 with (C# MONO) in a Linux system.

public static string GetChecksum(string file)
{
    using (FileStream stream = File.OpenRead(file))
    {
        var sha = new SHA256Managed();
        byte[] checksum = sha.ComputeHash(stream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}

然后我阅读 这个话题并根据他们所说的以某种方式更改我的代码:

Then I read this topic and somehow change my code according what they said to :

public static string GetChecksumBuffered(Stream stream)
{
    using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
    {
        var sha = new SHA256Managed();
        byte[] checksum = sha.ComputeHash(bufferedStream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}

但它没有这样的感情,大约需要9分钟.

But It doesn't have such a affection and takes about 9 mins.

然后我尝试在 Linux 中通过 sha256sum 命令测试我的文件以获得相同的文件,大约需要 28 秒,上面的代码和 Linux 命令都给出相同的结果!

Then I try to test my file through sha256sum command in Linux for the same file and It takes about 28 secs and both the above code and Linux command give the same result !

有人建议我阅读哈希码和校验和之间的差异,我访问了 这个主题解释了差异.

Someone advised me to read about differences between Hash Code and Checksum and I reach to this topic that explains the differences.

我的问题是:

  1. 是什么原因导致上述代码与Linux sha256sum在时间上如此不同?

上面的代码是做什么的?(我的意思是哈希码计算还是校验和计算?因为如果你在C#中搜索给出文件的哈希码和文件的校验和,它们都会到达上面的代码.)

What does the above code do ? (I mean is it the hash code calculation or checksum calculation? Because if you search about give a hash code of a file and checksum of a file in C#, they both reach to the above code.)

即使 SHA256 是抗碰撞的,是否存在针对 sha256sum 的任何动机攻击?

Is there any motivated attack against sha256sum even when SHA256 is collision resistant ?

如何使我的实现与 C# 中的 sha256sum 一样快?

How can I make my implementation as fast as sha256sum in C#?

推荐答案

  1. 我最好的猜测是在 File.Read 操作的 Mono 实现中有一些额外的缓冲.最近研究了一个大文件的校验和,在一台规格不错的 Windows 机器上,如果一切运行顺利,您应该预计每 Gb 大约 6 秒.

  1. My best guess is that there's some additional buffering in the Mono implementation of the File.Read operation. Having recently looked into checksums on a large file, on a decent spec Windows machine you should expect roughly 6 seconds per Gb if all is running smoothly.

奇怪的是,在不止一项基准测试中报告说 SHA-512 明显快于 SHA-256(参见下面的 3).另一种可能性是问题不在于分配数据,而在于一旦读取就处理字节.您也许可以在单个数组上使用 TransformBlock (和 TransformFinalBlock),而不是一口气读取流—我不知道这是否可行,但它值得调查.

Oddly it has been reported in more than one benchmark test that SHA-512 is noticeably quicker than SHA-256 (see 3 below). One other possibility is that the problem is not in allocating the data, but in disposing of the bytes once read. You may be able to use TransformBlock (and TransformFinalBlock) on a single array rather than reading the stream in one big gulp—I have no idea if this will work, but it bears investigating.

哈希码和校验和之间的区别是(几乎)语义.它们都计算出一个较短的魔术"数字,该数字对输入中的数据相当独特,但如果您有 4.6GB 的输入和 64B 的输出,那么相当"就会受到一定的限制.

The difference between hashcode and checksum is (nearly) semantics. They both calculate a shorter 'magic' number that is fairly unique to the data in the input, though if you have 4.6GB of input and 64B of output, 'fairly' is somewhat limited.

  • 校验和是不安全的,通过一些工作,您可以从足够多的输出中找出输入,从输出到输入倒推,并做各种不安全的事情.
  • 计算加密哈希需要更长的时间,但仅更改输入中的一位就会从根本上改变输出,并且对于良好的哈希(例如 SHA-512),没有已知的从输出返回到输入的方法.

MD5 是易碎的:如果需要,您可以在 PC 上制造输入以产生任何给定的输出.SHA-256(可能)仍然是安全的,但不会在几年内实现——如果您的项目的生命周期以数十年为单位,那么假设您需要对其进行更改.SHA-512 没有已知的攻击,可能在很长一段时间内都不会,而且由于它比 SHA-256 更快,我还是推荐它.基准测试表明,计算 SHA-512 所需的时间是 MD5 的 3 倍左右,所以如果您的速度问题可以得到解决,那就是要走的路.

MD5 is breakable: you can fabricate an input to produce any given output, if needed, on a PC. SHA-256 is (probably) still secure, but won't be in a few years time—if your project has a lifespan measured in decades, then assume you'll need to change it. SHA-512 has no known attacks and probably won't for quite a while, and since it's quicker than SHA-256 I'd recommend it anyway. Benchmarks show it takes about 3 times longer to calculate SHA-512 than MD5, so if your speed issue can be dealt with, it's the way to go.

不知道,除了上面提到的那些.你做得对.

No idea, beyond those mentioned above. You're doing it right.

对于一些简单的阅读,请参阅 Crypto.SE:SHA51 比 SHA256 快?

For a bit of light reading, see Crypto.SE: SHA51 is faster than SHA256?

针对评论中的问题进行编辑

校验和的目的是让您检查文件在您最初编写它的时间和您开始使用它的时间之间是否发生了变化.它通过产生一个小值(在 SHA512 的情况下为 512 位)来实现这一点,其中原始文件的每一位都至少对输出值有所贡献.哈希码的目的是相同的,此外,其他任何人都很难通过对文件进行精心管理的更改来获得相同的输出值.

The purpose of a checksum is to allow you to check if a file has changed between the time you originally wrote it, and the time you come to use it. It does this by producing a small value (512 bits in the case of SHA512) where every bit of the original file contributes at least something to the output value. The purpose of a hashcode is the same, with the addition that it is really, really difficult for anyone else to get the same output value by making carefully managed changes to the file.

前提是,如果校验和在开始时和检查时相同,则文件相同,如果它们不同,则文件肯定已更改.您在上面所做的是通过一种算法将整个文件提供给文件,该算法会滚动、折叠和旋转它读取的位以产生较小的值.

The premise is that if the checksums are the same at the start and when you check it, then the files are the same, and if they're different the file has certainly changed. What you are doing above is feeding the file, in its entirety, through an algorithm that rolls, folds and spindles the bits it reads to produce the small value.

例如:在我当前编写的应用程序中,我需要知道任何大小的文件的某些部分是否已更改.我将文件分成 16K 块,获取每个块的 SHA-512 哈希,并将其存储在另一个驱动器上的单独数据库中.当我来查看文件是否已更改时,我会复制每个块的哈希并将其与原始文件进行比较.由于我使用的是 SHA-512,因此更改文件具有相同散列的可能性小得难以想象,因此我可以自信地检测到 100 GB 数据的更改,同时仅在我的数据库中存储几 MB 散列.我在获取哈希的同时复制文件,这个过程完全是磁盘绑定的;将文件传输到 U 盘大约需要 5 分钟,其中 10 秒可能与哈希有关.

As an example: in the application I'm currently writing, I need to know if parts of a file of any size have changed. I split the file into 16K blocks, take the SHA-512 hash of each block, and store it in a separate database on another drive. When I come to see if the file has changed, I reproduce the hash for each block and compare it to the original. Since I'm using SHA-512, the chances of a changed file having the same hash are unimaginably small, so I can be confident of detecting changes in 100s of GB of data whilst only storing a few MB of hashes in my database. I'm copying the file at the same time as taking the hash, and the process is entirely disk-bound; it takes about 5 minutes to transfer a file to a USB drive, of which 10 seconds is probably related to hashing.

存储哈希的磁盘空间不足是我无法在帖子中解决的问题——买个 U 盘?

Lack of disk space to store hashes is a problem I can't solve in a post—buy a USB stick?

这篇关于获取文件 SHA256 哈希码和校验和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:获取文件 SHA256 哈希码和校验和