Getting duplicate keys in YAML using Python(使用 Python 在 YAML 中获取重复键)
问题描述
我们需要解析包含重复键的 YAML 文件,所有这些都需要解析.跳过重复是不够的.我知道这违反了 YAML 规范,我不想这样做,但我们使用的第三方工具支持这种用法,我们需要处理它.
We are in need of parsing YAML files which contain duplicate keys and all of these need to be parsed. It is not enough to skip duplicates. I know this is against the YAML spec and I would like to not have to do it, but a third-party tool used by us enables this usage and we need to deal with it.
文件示例:
build:
step: 'step1'
build:
step: 'step2'
解析后我们应该有一个类似的数据结构:
After parsing we should have a similar data structure to this:
yaml.load('file.yml')
# [('build', [('step', 'step1')]), ('build', [('step', 'step2')])]
dict
不能再用来表示解析后的内容了.
dict
can no longer be used to represent the parsed contents.
我正在寻找 Python 中的解决方案,但没有找到支持此功能的库,我错过了什么吗?
I am looking for a solution in Python and I didn't find a library supporting this, have I missed anything?
另外,我很乐意编写自己的东西,但想让它尽可能简单.ruamel.yaml
看起来像是 Python 中最先进的 YAML 解析器,而且看起来可扩展性适中,是否可以扩展它以支持重复字段?
Alternatively, I am happy to write my own thing but would like to make it as simple as possible. ruamel.yaml
looks like the most advanced YAML parser in Python and it looks moderately extensible, can it be extended to support duplicate fields?
推荐答案
PyYAML 只会默默地覆盖第一个条目,ruamel.yaml¹ 如果与旧 API 一起使用,将给出 DuplicateKeyFutureWarning
,并在新 API 中引发 DuplicateKeyError
.
PyYAML will just silently overwrite the first entry, ruamel.yaml¹ will give a DuplicateKeyFutureWarning
if used with the legacy API, and raise a DuplicateKeyError
with the new API.
如果您不想为所有类型创建完整的 Constructor
,覆盖 SafeConstructor
中的映射构造函数应该可以完成这项工作:
If you don't want to create a full Constructor
for all types, overwriting the mapping constructor in SafeConstructor
should do the job:
import sys
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor
yaml_str = """
build:
step: 'step1'
build:
step: 'step2'
"""
def construct_yaml_map(self, node):
# test if there are duplicate node keys
data = []
yield data
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
val = self.construct_object(value_node, deep=True)
data.append((key, val))
SafeConstructor.add_constructor(u'tag:yaml.org,2002:map', construct_yaml_map)
yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
print(data)
给出:
[('build', [('step', 'step1')]), ('build', [('step', 'step2')])]
但是,似乎没有必要将 step: 'step1'
放入列表中.以下将仅在存在重复项时创建列表(必要时可以通过缓存 self.construct_object(key_node, deep=True)
的结果进行优化):
However it doesn't seem necessary to make step: 'step1'
into a list. The following will only create the list if there are duplicate items (could be optimised if necessary, by caching the result of the self.construct_object(key_node, deep=True)
):
def construct_yaml_map(self, node):
# test if there are duplicate node keys
keys = set()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
if key in keys:
break
keys.add(key)
else:
data = {} # type: Dict[Any, Any]
yield data
value = self.construct_mapping(node)
data.update(value)
return
data = []
yield data
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
val = self.construct_object(value_node, deep=True)
data.append((key, val))
给出:
[('build', {'step': 'step1'}), ('build', {'step': 'step2'})]
几点:
- 可能不用说,这不适用于 YAML 合并键 (
<<: *xyz
) - 如果您需要 ruamel.yaml 的往返功能 (
yaml = YAML()
),则需要更复杂的construct_yaml_map
. 如果你想转储输出,你应该为此实例化一个新的
YAML()
实例,而不是重新使用用于加载的修补"实例(它可能有效,这只是为了确定):
- Probably needless to say, this will not work with YAML merge keys (
<<: *xyz
) - If you need ruamel.yaml's round-trip capabilities (
yaml = YAML()
) , that will require a more complexconstruct_yaml_map
. If you want to dump the output, you should instantiate a new
YAML()
instance for that, instead of re-using the "patched" one used for loading (it might work, this is just to be sure):
yaml_out = YAML(typ='safe')
yaml_out.dump(data, sys.stdout)
其中给出(带有第一个 construct_yaml_map
):
which gives (with the first construct_yaml_map
):
- - build
- - [step, step1]
- - build
- - [step, step2]
在 PyYAML 和 ruamel.yaml 中不起作用的是 yaml.load('file.yml')
.如果您不想自己 open()
文件,您可以这样做:
What doesn't work in PyYAML nor ruamel.yaml is yaml.load('file.yml')
. If you don't want to open()
the file yourself you can do:
from pathlib import Path # or: from ruamel.std.pathlib import Path
yaml = YAML(typ='safe')
yaml.load(Path('file.yml')
¹ 免责声明:我是该软件包的作者.
这篇关于使用 Python 在 YAML 中获取重复键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 Python 在 YAML 中获取重复键
- YouTube API v3 返回截断的观看记录 2022-01-01
- 我如何透明地重定向一个Python导入? 2022-01-01
- 我如何卸载 PyTorch? 2022-01-01
- 检查具有纬度和经度的地理点是否在 shapefile 中 2022-01-01
- 如何使用PYSPARK从Spark获得批次行 2022-01-01
- 使用 Cython 将 Python 链接到共享库 2022-01-01
- 计算测试数量的Python单元测试 2022-01-01
- 使用公司代理使Python3.x Slack(松弛客户端) 2022-01-01
- ";find_element_by_name(';name';)";和&QOOT;FIND_ELEMENT(BY NAME,';NAME';)";之间有什么区别? 2022-01-01
- CTR 中的 AES 如何用于 Python 和 PyCrypto? 2022-01-01