Another lesson learned. I had a bunch of json files which define some language strings. Some of these files use UTF-8 others are plain 7bit ASCII. What I wanted to accomplish was just loading a file, sorting all the keys, pretty formatting it all and saving to another file. Pretty easy, right?
The task theoretically could be accomplished by
cat file.json | python -mjson.tool > formatted_file.json.
That works good for 7bit ASCII, but did not work well enough for UTF-8 encoded files, because the output characters were
encoded and instead of a single character I got something like
\u016. I did not know how to turn ensure_ascii to
False in this command call, so I wrote a simple script.
After some trial and errors I ended up with a script which critical part is below. I left the comments:
import json import codecs # just open the file... input_file = file("input_file.json", "r") # need to use codecs for output to avoid error in json.dump output_file = codecs.open("output_file.json", "w", encoding="utf-8") # read the file and decode possible UTF-8 signature at the beginning # which can be the case in some files. j = json.loads(input_file.read().decode("utf-8-sig")) # then output it, indenting, sorting keys and ensuring representation as it was originally json.dump(j, output_file, indent=4, sort_keys=True, ensure_ascii=False)
It worked very well and I could feed it with all the json files I had and just process them. I hope this piece of code can be useful for anyone experiencing the same problems.