##What these scripts do The purpose of the scripts in this repository is to explore an input JSON file to give insight into its content: its architecture, its fields and their different values.
The main script is jsonexplore.py. It explores the given JSON file and build a tree of Obj objects from it thanks to the ObjBuilder. Then it renders the tree of Obj using a printer. Tree basic printers are avalaible at this time:
- The ObjTextPrinter will simply output a string describing the tree, printed on console
- The ObjJsonPrinter will output a JSON dictionary, written in the output obj.json file
- The output JSON file can be used as the input of the ObjHtmlPrinter to display a D3.js tree.
##The Obj class An Obj is created for each key of the given JSON dictionary.
name: the name of the field or "." for root
path: the path of the field from root
level: the distance from root
type: the type of its value(s) (e.g. dict, list, unicode, int…)
values: a dictionary associating to each existing value the number of times this value is found (see also the methods).
nb_times_it_exists: if this field is part of an item in a list, it says how many items in the list have this field
nb_times_it_is_expected: if this field is part of an item in a list, it says how many items are in the list, that is to say how many times this field should appear
nb_items: in case of a list, it says how many items it contains (basically it is the length of the list). If this list exists as an item of a parent list, it says the max length of the list.
nb_items_min: in case of a list existing as an item of a parent list, it says the min length of the list
children: in case of a dictionary, it is the list of its keys each one reprensented by an Obj instance
is_optional(): if the field is part of an item in a list, it says if this field is sometimes missing in the other items of the list, in other words if nb_times_it exists is different from nb_times_it_is_expected. If it is optional, it returns True. Otherwise, it return False.
(string) get_values_summary(): depending on the values property, it returns a string or a list offering a more comprehensible version of the values dictionary :
"'value'" if there is only one value, used only once
"Always 'value'" if there is only one value, used more than once
"Always empty" if the value is always an empty string
"All different values" if each item has a different value
"Almost all different values (each value appear from x to y times)" if every value is used less than 5 times
a list where each value is reprensented by a dictionary with 2 keys:
{ 'value': the_value, 'count': the_number_of_times_this_value_is_used }
This method is used by the ObjJsonPrinter, and consequently by the ObjHtmlPrinter to display the values.
get_sample_value(): it returns the first value that is not an empty string or None if not found.
##The ObjTextPrinter Example of output:
|- .: a Dict composed of:
|- statuses: a List composed of 10 items like:
|- [x]: a Dict composed of:
|- attitudes_count: int - values:
'0' [7]
'1' [3]
|- bmiddle_pic (Optional: only 6 value(s) over the 10 items): unicode - values:
'http://ww3.sinaimg.cn/bmiddle/4bed07b4jw1ef4pyzb9kzj20uh15ok33.jpg' [1]
'http://ww4.sinaimg.cn/bmiddle/664b3fe9jw1ef4px8ca0uj20bv0ft40f.jpg' [1]
'http://ww4.sinaimg.cn/bmiddle/9f767fc7jw1ef4prguk4tj20hs0npdjc.jpg' [1]
'http://ww3.sinaimg.cn/bmiddle/4d3ffe9ejw1ef4q0uogdtj20xc18gguu.jpg' [1]
'http://ww1.sinaimg.cn/bmiddle/df92068ejw1ef4pqnefs2j20xc18gjyx.jpg' [1]
'http://ww4.sinaimg.cn/bmiddle/8bf1e5b4jw1ef4pxgzavsj20f00qo756.jpg' [1]
|- comments_count: int - values:
'0' [9]
'1' [1]
|- distance: int - values:
'400' [1]
'1700' [1]
'1800' [3]
'1900' [1]
'2000' [2]
'1400' [1]
'1500' [1]
|- favorited: bool (always: 'False')
|- in_reply_to_user_id: unicode (always: '')
|- mid: unicode - values:
'3696007882016559' [1]
'3696009920372953' [1]
'3696010247357290' [1]
'3696007861039896' [1]
'3696010012795278' [1]
'3696009530197458' [1]
'3696008955203962' [1]
'3696009429891580' [1]
'3696008082926243' [1]
'3696010490692288' [1]
|- reposts_count: int (always: '0')
|- truncated: bool (always: 'False')
|- user: a Dict composed of:
|- allow_all_comment: bool - values:
'False' [3]
'True' [7]
|- location: unicode - values:
'上海 普陀区' [5]
'湖北 襄阳' [1]
'其他' [1]
'上海 黄浦区' [1]
'上海 闸北区' [1]
|- total_number: int (value: '129751')
##The ObjHtmlPrinter You can see a working example here.
This visualisation is based on D3js.