JSON sucks

JSON sucks.

I actually use JSON all the time. It’s great. It’s trivial to use in every language I use. Python? Included. PHP? Included. Perl? Included. JS? Duh. But it’s honestly quite a terrible format.

It sits at this weird intersection between a binary format for machines to deal with, and a text format for humans to deal with. In the end, it doesn’t do either well.

The big problem

The big, stinking, honking elephant in the room is (drumroll please)… the commas!

If I had a dollar for every time I tried to change a service config that uses JSON, only to have it refuse to start back up because I accidentally left a comma on the end of a list… well, I wouldn’t be rich, but I’d have a full tank of gas.

JSON is not a good configuration language.

Commas are the biggest problem, but not the only one. JSON does not include comments. An ideal configuration language also includes some functions (for example to allow you to import environment variables, if nothing else).

JSON’s design

JSON is very specifically an interchange language, meant for computers to write, and for humans to only read.
Sadly, it’s not even very good at being readable, since most tools that output JSON don’t pretty-print it by default. This is particularly sad since the overhead of line formatting is insignificant compared to the overhead of JSON’s overly-verbose strings and objects.

Despite being an interchange language, JSON was only designed for compatibility with JavaScript.
Sadly, it was not designed as an interchange language; in fact, the syntax is precisely that of a programming language instead.

And, just to make it particularly stinky, the worst sin of all:
Despite being designed for compatibility with JavaScript, JSON is not actually (safely) compatible with JavaScript.
You must generate the JSON yourself for it to be safe to include on a webpage: simply parsing it to make sure it’s valid JSON is not enough, because "</script><script>alert('pwnd');</script>" is valid JSON but won’t do what you want as JavaScript. So if someone else is providing you JSON, say over some API, you have to decode and then re-encode it just to make sure it’s safe. Hooray. (To make matters worse, many JSON generators also aren’t safe to use in this situation. PHP will write "\/" safely, but jq will write "/", for example).
Why does this problem exist? Well, because the people who made JSON didn’t realize it was a problem at the time, I imagine. And so security holes open up forever because of a bad choice made two decades ago.

The CBOR problem

(aka the Msgpack problem)

JSON is verbose. It wastes so much space with quotes, commas and colons that people resort to things like CBOR or Msgpack, which are not human-readable at all.

CBOR is pretty good, and I’m not familiar enough with Msgpack to pass judgement. But they’re completely unreadable by humans and require conversion. Debugging a malformed CBOR record would just be painful, while in JSON it’s pretty easy.

A real interchange language

A better middle-ground, where it’s still readable but not easily human-editable, and requires less space, could be achieved with some small changes to JSON:

  • Get rid of commas, replace them with newlines (now you can have pretty-printed lists at no extra space cost!)
  • Get rid of quotes around strings. Add \: to escape a colon in object keys, and remove restrictions on control codes embedded in strings. (The only characters that would need to be escaped are \n and \:)
  • Treat strings as arbitrary binary data (rather than trying to treat an 8bit string as Unicode). If the data is text then you should of course store it as UTF-8.

So your new InterChange Object Notation might look like:

{
    foo:bar
    this has a \: in it:so does \: this
    hello:world\nwith a newline
    cows:[
        go
        moo
    ]
}

Some other problems

Some parsers let you end a list with a comma anyway. That’s all well and good until you switch to a different parser and nothing works anymore because the opinionated authors of that parser decided to disallow it.

Many parsers don’t give any information as to where an error occurred in the JSON object, so you may be left to hunt through thousands of lines for an extra or missing comma.

JSON is specifically designed to be quite inflexible, to simplify parsers. But writing a parser only has to be done once, in any given language; editing JSON files has to be done much more often. A configuration language should be as flexible as possible: keys should not need to be quoted (since the application author picks them, they can pick simple strings that don’t conflict syntactically), lists should be able to be ended with commas, comments should exist, you should be able to reference a value that’s set elsewhere in the config or include external files, and more.

Don’t even get me started on YAML, aka “JSON with whitespace made significant”. It inherits all the same problems and then adds many more of its own.

So what?

What should you use instead?

Configuration

Well, for configuration files, ideally one would use a purpose-built configuration language. Better yet, just use the same programming language as you’re already using (if you’re targeting developers and power-users in a dynamic language, this is the way).

For example, in Python a good option could be configparser or an imported module, while in PHP a good option could be parse_ini_file or an included file.

By the way, it’s not a coincidence that both of those options are compatible with INI files. INI is a pretty good configuration format, and some of the extensions out there like configparser make it a great format. Plus, it’s over 30 years old! Everyone knows it and it’s probably easy to use in even more languages than JSON. (I guess that must be why it’s not cool.)

All 4 options are significantly more flexible than JSON… even though 2 of them are programming languages!

Interchange

For interchange, there are many other options out there, like CBOR or protobuf, and one of them might be better for your usecase.

If you’re doing anything with any kind of binary data, you definitely don’t want to use JSON. Even if your data is text but you can’t guarantee it’s valid Unicode, you probably don’t want to use JSON.

For internal uses (say, storing an object to a database), you should just use your language’s native serialization: pickle for Python, serialize or var_export for PHP, Data::Dumper or Storable for Perl, etc.


Other than that, for interchange, JSON does a pretty good job. The design smells, but it works well enough, when it’s truly being used machine-to-machine.

Just please, please, stop using it for configuration.