How does hashing have an o(1) search time?(散列如何有一个 o(1) 的搜索时间?)
问题描述
当我们使用 HashTable
存储数据时,据说搜索需要 o(1) 时间.我很困惑,谁能解释一下?
这是一个小的谎言——它可能需要更长的时间,但通常不会.p>
基本上,哈希表是一个包含所有要搜索的键的数组.数组中每个键的位置由 散列函数 确定,该函数可以是始终将相同输入映射到相同输出的任何函数.我们假设散列函数是O(1).
所以当我们在哈希表中插入一些东西时,我们使用哈希函数(我们称之为h)来找到放置它的位置,并把它放在那里.现在我们插入另一个东西,散列和存储.还有一个.每次我们插入数据,都需要 O(1) 的时间来插入(因为哈希函数是 O(1).
查找数据是一样的.如果我们想找到一个值,x,我们只需要找出h(x),它告诉我们x在哪里哈希表.所以我们也可以在 O(1) 中查找任何哈希值.
现在说谎:上面是一个非常好的理论,但有一个问题:如果我们插入数据并且数组的那个位置已经有东西怎么办?没有什么可以保证散列函数不会为两个不同的输入产生相同的输出(除非你有一个 完美的散列函数,但制作起来很棘手).因此,当我们插入时,我们需要采取以下两种策略之一:
- 在数组的每个位置存储多个值(例如,每个数组槽都有一个链表).现在,当您进行查找时,到达数组中的正确位置仍然是 O(1),但可能是线性搜索(希望是短的)链表.这称为分离链接".
- 如果您发现某些东西已经存在,请再次散列并找到另一个位置.重复直到你找到一个空的地方,然后把它放在那里.查找过程可以遵循相同的规则来查找数据.现在到达第一个位置仍然是 O(1),但是有一个潜在的(希望是短的)线性搜索在桌子周围反弹,直到你找到你想要的数据.这称为开放寻址".
基本上,这两种方法仍然大部分 O(1),但具有希望短的线性序列.对于大多数目的,我们可以假设它是 O(1).如果哈希表太满,那些线性搜索会变得越来越长,然后是时候重新哈希"了,这意味着创建一个更大的新哈希表并将所有数据重新插入其中.
When we use a HashTable
for storing data, it is said that searching takes o(1) time. I am confused, can anybody explain?
Well it's a little bit of a lie -- it can take longer than that, but it usually doesn't.
Basically, a hash table is an array containing all of the keys to search on. The position of each key in the array is determined by the hash function, which can be any function which always maps the same input to the same output. We shall assume that the hash function is O(1).
So when we insert something into the hash table, we use the hash function (let's call it h) to find the location where to put it, and put it there. Now we insert another thing, hashing and storing. And another. Each time we insert data, it takes O(1) time to insert it (since the hash function is O(1).
Looking up data is the same. If we want to find a value, x, we have only to find out h(x), which tells us where x is located in the hash table. So we can look up any hash value in O(1) as well.
Now to the lie: The above is a very nice theory with one problem: what if we insert data and there is already something in that position of the array? There is nothing which guarantees that the hash function won't produce the same output for two different inputs (unless you have a perfect hash function, but those are tricky to produce). Therefore, when we insert we need to take one of two strategies:
- Store multiple values at each spot in the array (say, each array slot has a linked list). Now when you do a lookup, it is still O(1) to arrive at the correct place in the array, but potentially a linear search down a (hopefully short) linked list. This is called "separate chaining".
- If you find something is already there, hash again and find another location. Repeat until you find an empty spot, and put it there. The lookup procedure can follow the same rules to find the data. Now it's still O(1) to get to the first location, but there is a potentially (hopefully short) linear search to bounce around the table till you find the data you are after. This is called "open addressing".
Basically, both approaches are still mostly O(1) but with a hopefully-short linear sequence. We can assume for most purposes that it is O(1). If the hash table is getting too full, those linear searches can become longer and longer, and then it is time to "re-hash" which means to create a new hash table of a much bigger size and insert all the data back into it.
这篇关于散列如何有一个 o(1) 的搜索时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:散列如何有一个 o(1) 的搜索时间?
- 如何使用WebFilter实现授权头检查 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- Java包名称中单词分隔符的约定是什么? 2022-01-01