##What these scripts do The purpose of the scripts in this repository is to explore an input JSON file to give insight into its content: its architecture, its fields and their different values.
The main script is jsonexplore.py. It explores the given JSON file and build a tree of Obj objects from it thanks to the ObjBuilder. Then it renders the tree of Obj using a printer. Tree basic printers are avalaible at this time:
- The ObjTextPrinter will simply output a string describing the tree, printed on console
- The ObjJsonPrinter will output a JSON dictionary, written in the output obj.json file
- The output JSON file can be used as the input of the ObjHtmlPrinter to display a D3.js tree.
##The Obj class An Obj is created for each key of the given JSON dictionary.
###Properties
-
name: the name of the field or "." for root
-
path: the path of the field from root
-
level: the distance from root
-
type: the type of its value(s) (e.g. dict, list, unicode, int…)
-
values: a dictionary associating to each existing value the number of times this value is found (see also the methods).
-
nb_times_it_exists: if this field is part of an item in a list, it says how many items in the list have this field
-
nb_times_it_is_expected: if this field is part of an item in a list, it says how many items are in the list, that is to say how many times this field should appear
-
nb_items: in case of a list, it says how many items it contains (basically it is the length of the list). If this list exists as an item of a parent list, it says the max length of the list.
-
nb_items_min: in case of a list existing as an item of a parent list, it says the min length of the list
-
children: in case of a dictionary, it is the list of its keys each one reprensented by an Obj instance
###Methods
-
is_optional(): if the field is part of an item in a list, it says if this field is sometimes missing in the other items of the list, in other words if nb_times_it exists is different from nb_times_it_is_expected. If it is optional, it returns True. Otherwise, it return False.
-
(string) get_values_summary(): depending on the values property, it returns a string or a list offering a more comprehensible version of the values dictionary :
-
"'value'" if there is only one value, used only once
-
"Always 'value'" if there is only one value, used more than once
-
"Always empty" if the value is always an empty string
-
"All different values" if each item has a different value
-
"Almost all different values (each value appear from x to y times)" if every value is used less than 5 times
-
a list where each value is reprensented by a dictionary with 2 keys:
{ 'value': the_value, 'count': the_number_of_times_this_value_is_used }
This method is used by the ObjJsonPrinter, and consequently by the ObjHtmlPrinter to display the values.
-
-
get_sample_value(): it returns the first value that is not an empty string or None if not found.
##The ObjTextPrinter Example of output:
|- .: a Dict composed of:
|- statuses: a List composed of 10 items like:
|- [x]: a Dict composed of:
|- attitudes_count: int - values:
'0' [7]
'1' [3]
|- bmiddle_pic (Optional: only 6 value(s) over the 10 items): unicode - values:
'http://ww3.sinaimg.cn/bmiddle/4bed07b4jw1ef4pyzb9kzj20uh15ok33.jpg' [1]
'http://ww4.sinaimg.cn/bmiddle/664b3fe9jw1ef4px8ca0uj20bv0ft40f.jpg' [1]
'http://ww4.sinaimg.cn/bmiddle/9f767fc7jw1ef4prguk4tj20hs0npdjc.jpg' [1]
'http://ww3.sinaimg.cn/bmiddle/4d3ffe9ejw1ef4q0uogdtj20xc18gguu.jpg' [1]
'http://ww1.sinaimg.cn/bmiddle/df92068ejw1ef4pqnefs2j20xc18gjyx.jpg' [1]
'http://ww4.sinaimg.cn/bmiddle/8bf1e5b4jw1ef4pxgzavsj20f00qo756.jpg' [1]
|- comments_count: int - values:
'0' [9]
'1' [1]
|- distance: int - values:
'400' [1]
'1700' [1]
'1800' [3]
'1900' [1]
'2000' [2]
'1400' [1]
'1500' [1]
|- favorited: bool (always: 'False')
|- in_reply_to_user_id: unicode (always: '')
|- mid: unicode - values:
'3696007882016559' [1]
'3696009920372953' [1]
'3696010247357290' [1]
'3696007861039896' [1]
'3696010012795278' [1]
'3696009530197458' [1]
'3696008955203962' [1]
'3696009429891580' [1]
'3696008082926243' [1]
'3696010490692288' [1]
|- reposts_count: int (always: '0')
|- truncated: bool (always: 'False')
|- user: a Dict composed of:
|- allow_all_comment: bool - values:
'False' [3]
'True' [7]
|- location: unicode - values:
'上海 普陀区' [5]
'湖北 襄阳' [1]
'其他' [1]
'上海 黄浦区' [1]
'上海 闸北区' [1]
|- total_number: int (value: '129751')
##The ObjHtmlPrinter You can see a working example here.
This visualisation is based on D3js.