Remove Duplicates from Text File(从文本文件中删除重复项)
问题描述
我想从文本文件中删除重复的单词.
I want to remove duplicate word from a text file.
我有一些文本文件,其中包含如下内容:
i have some text file which contain such like following:
None_None
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
None_None
ColumnConverter_56963312
ColumnConverter_56963312
PredicatesFactory_56963424
PredicatesFactory_56963424
PredicateConverter_56963648
PredicateConverter_56963648
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
结果输出需要是:
None_None
ConfigHandler_56663624
ColumnConverter_56963312
PredicatesFactory_56963424
PredicateConverter_56963648
ConfigHandler_80134888
我只使用了这个命令:en=set(open('file.txt')但它不起作用.
I have used just this command: en=set(open('file.txt') but it does not work.
谁能帮我从文件中提取唯一的集合
Could anyone help me with how to extract only the unique set from the file
谢谢
推荐答案
这里是关于保留顺序的选项(与集合不同),但仍然具有相同的行为(请注意,EOL 字符被故意剥离并忽略空行)...
Here's about option that preserves order (unlike a set), but still has the same behaviour (note that the EOL character is deliberately stripped and blank lines are ignored)...
from collections import OrderedDict
with open('/home/jon/testdata.txt') as fin:
lines = (line.rstrip() for line in fin)
unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )
print unique_lines.keys()
# ['None_None', 'ConfigHandler_56663624', 'ColumnConverter_56963312',PredicatesFactory_56963424', 'PredicateConverter_56963648', 'ConfigHandler_80134888']
那么你只需要将上面的内容写入你的输出文件.
Then you just need to write the above to your output file.
这篇关于从文本文件中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:从文本文件中删除重复项


- padding='same' 转换为 PyTorch padding=# 2022-01-01
- pytorch 中的自适应池是如何工作的? 2022-07-12
- python check_output 失败,退出状态为 1,但 Popen 适用于相同的命令 2022-01-01
- 使用Heroku上托管的Selenium登录Instagram时,找不到元素';用户名'; 2022-01-01
- 分析异常:路径不存在:dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data; 2022-01-01
- 沿轴计算直方图 2022-01-01
- 如何将一个类的函数分成多个文件? 2022-01-01
- python-m http.server 443--使用SSL? 2022-01-01
- 如何在 Python 的元组列表中对每个元组中的第一个值求和? 2022-01-01
- 如何在 python3 中将 OrderedDict 转换为常规字典 2022-01-01