Saving utf-8 texts with json.dumps as UTF8, not as u escape sequence(将带有 json.dumps 的 utf-8 文本保存为 UTF8,而不是 u 转义序列)
问题描述
Sample code:
>>> import json
>>> json_string = json.dumps("ברי צקלה")
>>> print(json_string)
"u05d1u05e8u05d9 u05e6u05e7u05dcu05d4"
The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML).
Is there a way to serialize objects into UTF-8 JSON strings (instead of uXXXX
)?
Use the ensure_ascii=False
switch to json.dumps()
, then encode the value to UTF-8 manually:
>>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
>>> json_string
b'"xd7x91xd7xa8xd7x99 xd7xa6xd7xa7xd7x9cxd7x94"'
>>> print(json_string.decode())
"ברי צקלה"
If you are writing to a file, just use json.dump()
and leave it to the file object to encode:
with open('filename', 'w', encoding='utf8') as json_file:
json.dump("ברי צקלה", json_file, ensure_ascii=False)
Caveats for Python 2
For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open()
instead of open()
to produce a file object that encodes Unicode values for you as you write, then use json.dump()
instead to write to that file:
with io.open('filename', 'w', encoding='utf8') as json_file:
json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
Do note that there is a bug in the json
module where the ensure_ascii=False
flag can produce a mix of unicode
and str
objects. The workaround for Python 2 then is:
with io.open('filename', 'w', encoding='utf8') as json_file:
data = json.dumps(u"ברי צקלה", ensure_ascii=False)
# unicode(data) auto-decodes data to unicode if str
json_file.write(unicode(data))
In Python 2, when using byte strings (type str
), encoded to UTF-8, make sure to also set the encoding
keyword:
>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> d
{1: 'xd7x91xd7xa8xd7x99 xd7xa6xd7xa7xd7x9cxd7x94', 2: u'u05d1u05e8u05d9 u05e6u05e7u05dcu05d4'}
>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "u05d1u05e8u05d9 u05e6u05e7u05dcu05d4", "2": "u05d1u05e8u05d9 u05e6u05e7u05dcu05d4"}'
>>> json.loads(s)['1']
u'u05d1u05e8u05d9 u05e6u05e7u05dcu05d4'
>>> json.loads(s)['2']
u'u05d1u05e8u05d9 u05e6u05e7u05dcu05d4'
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
ברי צקלה
这篇关于将带有 json.dumps 的 utf-8 文本保存为 UTF8,而不是 u 转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:将带有 json.dumps 的 utf-8 文本保存为 UTF8,而不是 u 转义序列
- 如何在 python3 中将 OrderedDict 转换为常规字典 2022-01-01
- 沿轴计算直方图 2022-01-01
- 如何将一个类的函数分成多个文件? 2022-01-01
- 如何在 Python 的元组列表中对每个元组中的第一个值求和? 2022-01-01
- 分析异常:路径不存在:dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data; 2022-01-01
- padding='same' 转换为 PyTorch padding=# 2022-01-01
- pytorch 中的自适应池是如何工作的? 2022-07-12
- 使用Heroku上托管的Selenium登录Instagram时,找不到元素';用户名'; 2022-01-01
- python check_output 失败,退出状态为 1,但 Popen 适用于相同的命令 2022-01-01
- python-m http.server 443--使用SSL? 2022-01-01