An implementation of a JSON Parser in Python. This parser performs both lexical and syntactic analysis of JSON input, providing detailed error reporting with line and column numbers if parsing fails.
I've always wanted to understand how parsing works!
- Full JSON syntax support
- Detailed error reporting with line and column numbers
- Handles nested structures and all JSON data types
- Extensive test suite
python3 py/json-parser FILE
Add any additional tests to the test/
directory you may want, then run:
python3 -m unittest py/test_parser.py
This parser can handle complex JSON structures, including nested objects and arrays, Unicode characters, and escape sequences. Here's an example of a complex JSON it can parse:
[
"JSON Test Pattern pass1",
{
"object with 1 member": [
"array with 1 element"
]
},
{},
[],
-42,
true,
false,
null,
{
"integer": 1234567890,
"real": -9876.54321,
"e": 1.23456789e-13,
"E": 1.23456789e+34,
"": 2.3456789012e+76,
"zero": 0,
"one": 1,
"space": " ",
"quote": "\"",
"backslash": "\\",
"controls": "\b\f\n\r\t",
"slash": "/ & \/",
"alpha": "abcdefghijklmnopqrstuvwyz",
"ALPHA": "ABCDEFGHIJKLMNOPQRSTUVWYZ",
"digit": "0123456789",
"0123456789": "digit",
"special": "`1~!@#$%^&*()_+-={':[,]}|;.</>?",
"hex": "\u0123\u4567\u89AB\uCDEF\uabcd\uef4A",
"true": true,
"false": false,
"null": null,
"array": [],
"object": {},
"address": "50 St. James Street",
"url": "http://www.JSON.org/",
"comment": "// /* <!-- --",
"# -- --> */": " ",
" s p a c e d ": [
1,
2,
3,
4,
5,
6,
7
],
"compact": [
1,
2,
3,
4,
5,
6,
7
],
"jsontext": "{\"object with 1 member\":[\"array with 1 element\"]}",
"quotes": "" \u0022 %22 0x22 034 "",
"\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?": "A key can be any string"
},
0.5,
98.6,
99.44,
1066,
10,
1,
0.1,
1,
2,
2,
"rosebud"
]
- Writing a Simple JSON Parser - Helpful for understanding initial concepts.
- JSON Grammar - Comprehensive JSON grammar reference.
- JSONLint - Useful for visualizing and validating JSON.
Two parts to the parsing process:
- Lexical analysis -> split input into tokens
- Syntactic analysis -> gets fed the tokens to validate syntactical patterns
There's nothing magic about this process, which, I guess, makes it kind of magic.