Building Nested dictionary in Python reading in line by line from file(在 Python 中构建嵌套字典从文件中逐行读取)
问题描述
我处理嵌套字典的方式是这样的:
dicty = dict()tmp = dict()tmp["a"] = 1tmp["b"] = 2dicty["A"] = tmpdicty == {"A" : {"a" : 1, "b" : 1}}当我尝试在大文件上逐行读取时,问题就开始了.这是打印列表中每行的内容:
['proA', 'macbook', '0.666667']['proA','智能','0.666667']['proA','ssd','0.666667']['FrontPage', 'FrontPage', '0.710145']['FrontPage', '疑难解答', '0.971014']
我想最终得到一个嵌套字典(忽略小数):
{'FrontPage': {'frontpage': '0.710145', '疑难解答': '0.971014'},'proA':{'macbook':'0.666667','智能':'0.666667','ssd':'0.666667'}}
当我逐行阅读时,我必须检查文件中是否仍然找到第一个单词(它们都已分组),然后再将其作为完整的 dict 添加到更高的 dict 中.
>这是我的实现:
def doubleDict(filename):字典 = 字典()使用 open(filename, "r") 作为 f:行 = 0tmp = dict()旧词 = ""对于 f 中的行:values = line.rstrip().split(" ")打印(值)如果 oldword == values[0]:tmp[值[1]] = 值[2]别的:如果 oldword 不是 "":dicty[旧词] = tmptmp.clear()旧字 = 值[0]tmp[值[1]] = 值[2]行 += 1如果行 % 25 == 0:打印(字典)中断#打印(行)返回(字典)
我实际上希望在 Pandas 中有这个,但现在我会很高兴如果这能作为一个 dict 使用.出于某种原因,在阅读了前 5 行之后,我得到了:
{'proA': {'frontpage': '0.710145', '疑难解答': '0.971014'}},
这显然是不正确的.怎么了?
解决方案 使用 collections.defaultdict()
对象 自动实例化嵌套字典:
from collections import defaultdictdef doubleDict(文件名):dicty = defaultdict(dict)使用 open(filename, "r") 作为 f:对于 i,enumerate(f) 中的行:外部,内部,值 = line.split()dicty[外][内]=值如果我 % 25 == 0:打印(字典)中断#打印(行)返回(字典)
我在这里使用了 enumerate()
来生成行数;比保持一个单独的计数器运行要简单得多.
即使没有defaultdict
,您也可以让外部字典保留对嵌套字典的引用,并使用values[0]
再次检索它;无需保留 temp
引用:
<预><代码>>>>字典 = {}>>>字典['A'] = {}>>>字典['A']['a'] = 1>>>字典['A']['b'] = 2>>>独裁者{'A':{'a':1,'b':1}}
所有 defaultdict
所做的就是让我们不必测试我们是否已经创建了那个嵌套字典.而不是:
如果外部不在字典中:dicty[外] = {}dicty[外][内]=值
我们只是省略了 if
测试,因为 defaultdict
将为我们创建一个新的字典,如果键还不存在.
The way I go about nested dictionary is this:
dicty = dict()
tmp = dict()
tmp["a"] = 1
tmp["b"] = 2
dicty["A"] = tmp
dicty == {"A" : {"a" : 1, "b" : 1}}
The problem starts when I try to implement this on a big file, reading in line by line. This is printing the content per line in a list:
['proA', 'macbook', '0.666667']
['proA', 'smart', '0.666667']
['proA', 'ssd', '0.666667']
['FrontPage', 'frontpage', '0.710145']
['FrontPage', 'troubleshooting', '0.971014']
I would like to end up with a nested dictionary (ignore decimals):
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
As I am reading in line by line, I have to check whether or not the first word is still found in the file (they are all grouped), before I add it as a complete dict to the higher dict.
This is my implementation:
def doubleDict(filename):
dicty = dict()
with open(filename, "r") as f:
row = 0
tmp = dict()
oldword = ""
for line in f:
values = line.rstrip().split(" ")
print(values)
if oldword == values[0]:
tmp[values[1]] = values[2]
else:
if oldword is not "":
dicty[oldword] = tmp
tmp.clear()
oldword = values[0]
tmp[values[1]] = values[2]
row += 1
if row % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I would actually like to have this in pandas, but for now I would be happy if this would work as a dict. For some reason after reading in just the first 5 lines, I end up with:
{'proA': {'frontpage': '0.710145', 'troubleshooting': '0.971014'}},
which is clearly incorrect. What is wrong?
Use a collections.defaultdict()
object to auto-instantiate nested dictionaries:
from collections import defaultdict
def doubleDict(filename):
dicty = defaultdict(dict)
with open(filename, "r") as f:
for i, line in enumerate(f):
outer, inner, value = line.split()
dicty[outer][inner] = value
if i % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I used enumerate()
to generate the line count here; much simpler than keeping a separate counter going.
Even without a defaultdict
, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]
; there is no need to keep the temp
reference around:
>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}
All the defaultdict
then does is keep us from having to test if we already created that nested dictionary. Instead of:
if outer not in dicty:
dicty[outer] = {}
dicty[outer][inner] = value
we simply omit the if
test as defaultdict
will create a new dictionary for us if the key was not yet present.
这篇关于在 Python 中构建嵌套字典从文件中逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:在 Python 中构建嵌套字典从文件中逐行读取
- CTR 中的 AES 如何用于 Python 和 PyCrypto? 2022-01-01
- 我如何透明地重定向一个Python导入? 2022-01-01
- 计算测试数量的Python单元测试 2022-01-01
- 如何使用PYSPARK从Spark获得批次行 2022-01-01
- 使用 Cython 将 Python 链接到共享库 2022-01-01
- 检查具有纬度和经度的地理点是否在 shapefile 中 2022-01-01
- ";find_element_by_name(';name';)";和&QOOT;FIND_ELEMENT(BY NAME,';NAME';)";之间有什么区别? 2022-01-01
- YouTube API v3 返回截断的观看记录 2022-01-01
- 使用公司代理使Python3.x Slack(松弛客户端) 2022-01-01
- 我如何卸载 PyTorch? 2022-01-01