-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
end to end table understanding using python API with demo on WD tooling #245
base: master
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:46Z Web link to Cloud Pak for Data is not rendering properly on ReviewNB. Is there a typo in the Markdown? Monireh2 commented on 2022-02-02T02:32:45Z The link was working on my local machine and here in ReviewNB for me when I was clicking. I think it was not working because of the new line in the start of the url link. Just fixed it. Thanks for pointing that out. |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:47Z "We start" ==> "Allison starts" Monireh2 commented on 2022-02-02T02:33:40Z fixed, thanks! |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:48Z There's no need to embed Python code to display the video. You can directly embed the video file into the Markdown in the previous cell. Syntax: <video controls src="./images/Table_Understanding.mp4'">Creating a collection in IBM Watson Discovery</video>
Documentation here: https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html#Local-files
Overall the video looks good, but I do have some suggestions: * You need to blur/black out the PII -- user names, people's names, account names. Instructions here: https://www.youtube.com/watch?v=54KYsEVJlWQ. * If you have time to re-record the clip, I think it would work better if you shrunk the browser window to a smaller size and just recorded the window (Press command-shift-5 to select a portion of the screen to record). * I recommend you edit out or speed up the parts where you're waiting for Discovery to perform an action. Monireh2 commented on 2022-02-02T02:42:51Z Thanks Fred for the pointer @frreiss. I actually tried to do so. But it will give me a black screen with the inactive play button. The only way I could resolve the issue was using the python snippet above. Regarding your other comments I will fix them. |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:48Z I think it would be better to move this cell and the ones that follow (up to the heading, "Query the project") to a separate notebook file to avoid breaking up the flow. You can put a hyperlink to the other notebook file directly into your Markdown, i.e. For more information, refer to [this additional notebook](./other_notebook.ipynb) |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:49Z Can you truncate this output a bit? Maybe print out the first 20 lines, followed by something like Monireh2 commented on 2022-02-02T16:43:53Z done! |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:50Z This table is rendering as empty (no body cells) in ReviewNB. Monireh2 commented on 2022-02-03T00:27:23Z That is weird. It is rendering for me over my local machine. |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:51Z The data shown doesn't match the screenshot. The screenshot shows 2013-2014 data; the data here is for 2014-2014. Monireh2 commented on 2022-02-03T00:28:36Z Resolved. Had changed in final run! |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:51Z Those error messages ( Monireh2 commented on 2022-02-03T00:43:15Z @frreiss: The error does make sense to me. Whenever you get an empty value you are substitute the value with pd.NA and print the above error. I can open an issue on that if you think the code should get changed: See line 229-231 here please: https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/io/watson/tables.py except ValueError: ans = pd.NA print(f"ERROR READING VALUE:\"{val}\"\t Filling with <NA>") Here the value for "Major markets", "Growth Markets" and "BRIC countries" is empty. |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:52Z Several incorrect values are present in this table: "of intellectual property", "Licensing/royalty-based fees", "Custom development income", "2009. The increase in total expense and other", "Examples of the company's investments include:",
These incorrect values most likely come from incorrect JSON input from Watson Discovery. Can you please trace these incorrect values back to the corresponding portions of the Watson Discovery output please? If there is a bug in Discovery, we should submit a bug report. If there's a bug in our Text Extensions for Pandas code it needs to be fixed. Monireh2 commented on 2022-02-03T01:36:03Z {'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 703825, 'begin': 703796}, 'text': 'Total consolidated research,', 'row_index_end': 0, 'cell_id': 'rowHeader-703796-703825', 'column_index_end': 0, 'text_normalized': 'Total consolidated research,'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 704313, 'begin': 704285}, 'text': 'development and engineering', 'row_index_end': 1, 'cell_id': 'rowHeader-704285-704313', 'column_index_end': 0, 'text_normalized': 'development and engineering'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 705414, 'begin': 705389}, 'text': 'Non-operating adjustment', 'row_index_end': 2, 'cell_id': 'rowHeader-705389-705414', 'column_index_end': 0, 'text_normalized': 'Non-operating adjustment'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 705914, 'begin': 705881}, 'text': 'Non-operating retirement-related', 'row_index_end': 3, 'cell_id': 'rowHeader-705881-705914', 'column_index_end': 0, 'text_normalized': 'Non-operating retirement-related'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 706394, 'begin': 706379}, 'text': '(costs)/income', 'row_index_end': 4, 'cell_id': 'rowHeader-706379-706394', 'column_index_end': 0, 'text_normalized': '(costs)/income'}, {'column_index_begin': 0, 'row_index_begin': 5, 'location': {'end': 707502, 'begin': 707471}, 'text': 'Operating (non-GAAP) research,', 'row_index_end': 5, 'cell_id': 'rowHeader-707471-707502', 'column_index_end': 0, 'text_normalized': 'Operating (non-GAAP) research,'}, {'column_index_begin': 0, 'row_index_begin': 6, 'location': {'end': 707990, 'begin': 707962}, 'text': 'development and engineering', 'row_index_end': 6, 'cell_id': 'rowHeader-707962-707990', 'column_index_end': 0, 'text_normalized': 'development and engineering'}], 'table_headers': [], 'location': {'end': 708798, 'begin': 703796}, 'text': 'Total consolidated research, development and engineering $5,247 $5,437 (3.5)%\nNon-operating adjustment\n Non-operating retirement-related (costs)/income (48) 77 NM\nOperating (non-GAAP) research,\n development and engineering $5,200 $5,514 (5.7)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703903, 'begin': 703902}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703902-703903'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703968, 'begin': 703967}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703967-703968'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704033, 'begin': 704032}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-704032-704033'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704581, 'begin': 704574}, 'attributes': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'type': 'Currency'}], 'text': '$5,247', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704574-704581'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704848, 'begin': 704841}, 'attributes': [{'location': {'end': 704847, 'begin': 704841}, 'text': '$5,437', 'type': 'Currency'}], 'text': '$5,437', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704841-704848'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705118, 'begin': 705111}, 'attributes': [{'location': {'end': 705115, 'begin': 705112}, 'text': '3.5', 'type': 'Number'}], 'text': '(3.5)%', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-705111-705118'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705492, 'begin': 705491}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705491-705492'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705557, 'begin': 705556}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705556-705557'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705622, 'begin': 705621}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705621-705622'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705992, 'begin': 705991}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-705991-705992'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706057, 'begin': 706056}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706056-706057'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706122, 'begin': 706121}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706121-706122'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706662, 'begin': 706657}, 'attributes': [{'location': {'end': 706660, 'begin': 706658}, 'text': '48', 'type': 'Number'}], 'text': '(48)', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706657-706662'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706929, 'begin': 706926}, 'attributes': [{'location': {'end': 706928, 'begin': 706926}, 'text': '77', 'type': 'Number'}], 'text': '77', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706926-706929'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707196, 'begin': 707193}, 'attributes': [], 'text': 'NM', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-707193-707196'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707580, 'begin': 707579}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707579-707580'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707645, 'begin': 707644}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707644-707645'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707710, 'begin': 707709}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707709-707710'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708259, 'begin': 708252}, 'attributes': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'type': 'Currency'}], 'text': '$5,200', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708252-708259'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708527, 'begin': 708520}, 'attributes': [{'location': {'end': 708526, 'begin': 708520}, 'text': '$5,514', 'type': 'Currency'}], 'text': '$5,514', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708520-708527'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708798, 'begin': 708791}, 'attributes': [{'location': {'end': 708795, 'begin': 708792}, 'text': '5.7', 'type': 'Number'}], 'text': '(5.7)%', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708791-708798'}], 'contexts': [{'location': {'end': 702649, 'begin': 702242}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 702865, 'begin': 702855}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 703240, 'begin': 703046}, 'text': 'Percent\nChange'}, {'location': {'end': 709029, 'begin': 709012}, 'text': 'NM-Not meaningful'}, {'location': {'end': 709287, 'begin': 709227}, 'text': 'Research, development and engineering (RD&E) expense was'}, {'location': {'end': 709775, 'begin': 709521}, 'text': '6.4 percent of revenue in 2015 and 5.9 percent of revenue in 2014.'}, {'location': {'end': 710189, 'begin': 709945}, 'text': 'RD&E expense decreased 3.5 percent in 2015 versus 2014 primarily driven by:'}], 'key_value_pairs': [{'value': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'cell_id': 'bodyCell-704574-704581'}], 'key': {'location': {'end': 704312, 'begin': 704285}, 'text': 'development and engineering', 'cell_id': 'rowHeader-704285-704313'}}, {'value': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'cell_id': 'bodyCell-708252-708259'}], 'key': {'location': {'end': 707989, 'begin': 707962}, 'text': 'development and engineering', 'cell_id': 'rowHeader-707962-707990'}}], 'title': {}, 'column_headers': []}, {'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 714975, 'begin': 714949}, 'text': 'Sales and other transfers', 'row_index_end': 0, 'cell_id': 'rowHeader-714949-714975', 'column_index_end': 0, 'text_normalized': 'Sales and other transfers'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 715466, 'begin': 715441}, 'text': 'of intellectual property', 'row_index_end': 1, 'cell_id': 'rowHeader-715441-715466', 'column_index_end': 0, 'text_normalized': 'of intellectual property'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 716567, 'begin': 716538}, 'text': 'Licensing/royalty-based fees', 'row_index_end': 2, 'cell_id': 'rowHeader-716538-716567', 'column_index_end': 0, 'text_normalized': 'Licensing/royalty-based fees'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 717670, 'begin': 717644}, 'text': 'Custom development income', 'row_index_end': 3, 'cell_id': 'rowHeader-717644-717670', 'column_index_end': 0, 'text_normalized': 'Custom development income'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 718753, 'begin': 718747}, 'text': 'Total', 'row_index_end': 4, 'cell_id': 'rowHeader-718747-718753', 'column_index_end': 0, 'text_normalized': 'Total'}], 'table_headers': [], 'location': {'end': 719555, 'begin': 714949}, 'text': 'Sales and other transfers of intellectual property $303 $283 7.1%\nLicensing/royalty-based fees 117 129 (9.8)\nCustom development income 262 330 (20.5)\nTotal $682 $742 (8.1)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715053, 'begin': 715052}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715052-715053'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715118, 'begin': 715117}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715117-715118'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715183, 'begin': 715182}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715182-715183'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715734, 'begin': 715729}, 'attributes': [{'location': {'end': 715733, 'begin': 715729}, 'text': '$303', 'type': 'Currency'}], 'text': '$303', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715729-715734'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716001, 'begin': 715996}, 'attributes': [{'location': {'end': 716000, 'begin': 715996}, 'text': '$283', 'type': 'Currency'}], 'text': '$283', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715996-716001'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716269, 'begin': 716264}, 'attributes': [{'location': {'end': 716268, 'begin': 716264}, 'text': '7.1%', 'type': 'Percentage'}], 'text': '7.1%', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-716264-716269'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716835, 'begin': 716831}, 'attributes': [{'location': {'end': 716834, 'begin': 716831}, 'text': '117', 'type': 'Number'}], 'text': '117', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-716831-716835'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717104, 'begin': 717100}, 'attributes': [{'location': {'end': 717103, 'begin': 717100}, 'text': '129', 'type': 'Number'}], 'text': '129', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717100-717104'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717374, 'begin': 717368}, 'attributes': [{'location': {'end': 717372, 'begin': 717369}, 'text': '9.8', 'type': 'Number'}], 'text': '(9.8)', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717368-717374'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717939, 'begin': 717935}, 'attributes': [{'location': {'end': 717938, 'begin': 717935}, 'text': '262', 'type': 'Number'}], 'text': '262', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-717935-717939'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718208, 'begin': 718204}, 'attributes': [{'location': {'end': 718207, 'begin': 718204}, 'text': '330', 'type': 'Number'}], 'text': '330', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718204-718208'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718473, 'begin': 718466}, 'attributes': [{'location': {'end': 718471, 'begin': 718467}, 'text': '20.5', 'type': 'Number'}], 'text': '(20.5)', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718466-718473'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719022, 'begin': 719017}, 'attributes': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'type': 'Currency'}], 'text': '$682', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719017-719022'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719287, 'begin': 719282}, 'attributes': [{'location': {'end': 719286, 'begin': 719282}, 'text': '$742', 'type': 'Currency'}], 'text': '$742', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719282-719287'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719555, 'begin': 719548}, 'attributes': [{'location': {'end': 719552, 'begin': 719549}, 'text': '8.1', 'type': 'Number'}], 'text': '(8.1)%', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719548-719555'}], 'contexts': [{'location': {'end': 713812, 'begin': 713406}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 714027, 'begin': 714017}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 714400, 'begin': 714207}, 'text': 'Percent Change'}, {'location': {'end': 720499, 'begin': 719763}, 'text': 'The timing and amount of Sales and other transfers of IP may vary significantly from period to period depending upon the timing of divestitures, economic conditions, industry consolidation and the timing of new patents and know-how development.'}, {'location': {'end': 720730, 'begin': 720500}, 'text': 'There were no material individual IP transactions in 2015 or 2014.'}, {'location': {'end': 720953, 'begin': 720927}, 'text': 'Other (Income) and Expense'}], 'key_value_pairs': [{'value': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'cell_id': 'bodyCell-719017-719022'}], 'key': {'location': {'end': 718752, 'begin': 718747}, 'text': 'Total', 'cell_id': 'rowHeader-718747-718753'}}], 'title': {}, 'column_headers': []}, If you look at the json you can see it covers pther tables under Geographic Revenue section title as well: please check page 56-57 in IBM_Annual_Report_2015. I would say it is neither error with Text Extension for Pandas nor with WD. |
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:53Z This table contains more duplicates than it did before. Why is that happening? Is the latest version of Watson Discovery returning multiple copies of the same table? Monireh2 commented on 2022-02-03T21:25:50Z I checked this carefully and you can see for example for 2012-2011 we have two geographic revenue tables and that is the same for 2011-2010. So we will have 4 values for America for 2011. You just need to search for 44,944 to validate this. Checking that in the IBM_Annual_Report_2012.pdf I can see two Geographic Revenues tables one for 2012-2011 and another for 2011-2010 which explains why we have 4 values for each region for each year and Watson Discovery has listed both tables for each document correctly.
|
View / edit / reply to this conversation on ReviewNB frreiss commented on 2022-02-01T00:39:54Z Data from 2018 and 2019 is no longer here. What happened to it? Monireh2 commented on 2022-02-04T00:36:56Z checking why the data from 2019.pdf has not processed; I can see some results has been returned by WD for 2019.pdf! Monireh2 commented on 2022-02-10T23:12:12Z The column_header_texts for the 2018-2019 table is empty from WD discovery's json output, that is why we were not retain the rows for 2019-2018 table: "column_header_texts": [ "", "", "", "", "" ],
text row_header_texts_0 column_header_texts attributes.type value 0 2019 For the year ended December 31: [DateTime] 2019 1 2018 For the year ended December 31: [Number] 2018
Monireh2 commented on 2022-02-10T23:14:39Z I am just changing the retaining condition or copy the from the text column into the column_header_texts when the text follows the \d4 regex pattern to include the 2018-2019 info as well. Monireh2 commented on 2022-02-11T18:02:11Z Created an issue with the Discovery's team: https://github.ibm.com/Watson-Discovery/disco-issue-tracker/issues/10974 |
The link was working on my local machine and here in ReviewNB for me when I was clicking. I think it was not working because of the new line in the start of the url link. Just fixed it. Thanks for pointing that out. View entire conversation on ReviewNB |
fixed, thanks! View entire conversation on ReviewNB |
Thanks Fred for the pointer @frreiss. I actually tried to do so. But it will give me a black screen with the inactive play button. The only way I could resolve the issue was using the python snippet above. Regarding your other comments I will fix them. View entire conversation on ReviewNB |
done! View entire conversation on ReviewNB |
That is weird. It is rendering for me over my local machine. View entire conversation on ReviewNB |
Resolved. Had changed in final run! View entire conversation on ReviewNB |
@frreiss: The error does make sense to me. Whenever you get an empty value you are substitute the value with pd.NA and print the above error. I can open an issue on that if you think the code should get changed: See line 229-231 here please: https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/io/watson/tables.py except ValueError: ans = pd.NA print(f"ERROR READING VALUE:\"{val}\"\t Filling with <NA>") View entire conversation on ReviewNB |
{'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 703825, 'begin': 703796}, 'text': 'Total consolidated research,', 'row_index_end': 0, 'cell_id': 'rowHeader-703796-703825', 'column_index_end': 0, 'text_normalized': 'Total consolidated research,'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 704313, 'begin': 704285}, 'text': 'development and engineering', 'row_index_end': 1, 'cell_id': 'rowHeader-704285-704313', 'column_index_end': 0, 'text_normalized': 'development and engineering'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 705414, 'begin': 705389}, 'text': 'Non-operating adjustment', 'row_index_end': 2, 'cell_id': 'rowHeader-705389-705414', 'column_index_end': 0, 'text_normalized': 'Non-operating adjustment'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 705914, 'begin': 705881}, 'text': 'Non-operating retirement-related', 'row_index_end': 3, 'cell_id': 'rowHeader-705881-705914', 'column_index_end': 0, 'text_normalized': 'Non-operating retirement-related'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 706394, 'begin': 706379}, 'text': '(costs)/income', 'row_index_end': 4, 'cell_id': 'rowHeader-706379-706394', 'column_index_end': 0, 'text_normalized': '(costs)/income'}, {'column_index_begin': 0, 'row_index_begin': 5, 'location': {'end': 707502, 'begin': 707471}, 'text': 'Operating (non-GAAP) research,', 'row_index_end': 5, 'cell_id': 'rowHeader-707471-707502', 'column_index_end': 0, 'text_normalized': 'Operating (non-GAAP) research,'}, {'column_index_begin': 0, 'row_index_begin': 6, 'location': {'end': 707990, 'begin': 707962}, 'text': 'development and engineering', 'row_index_end': 6, 'cell_id': 'rowHeader-707962-707990', 'column_index_end': 0, 'text_normalized': 'development and engineering'}], 'table_headers': [], 'location': {'end': 708798, 'begin': 703796}, 'text': 'Total consolidated research, development and engineering $5,247 $5,437 (3.5)%\nNon-operating adjustment\n Non-operating retirement-related (costs)/income (48) 77 NM\nOperating (non-GAAP) research,\n development and engineering $5,200 $5,514 (5.7)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703903, 'begin': 703902}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703902-703903'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 703968, 'begin': 703967}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-703967-703968'}, {'row_header_ids': ['rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704033, 'begin': 704032}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Total consolidated research,'], 'cell_id': 'bodyCell-704032-704033'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704581, 'begin': 704574}, 'attributes': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'type': 'Currency'}], 'text': '$5,247', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704574-704581'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 704848, 'begin': 704841}, 'attributes': [{'location': {'end': 704847, 'begin': 704841}, 'text': '$5,437', 'type': 'Currency'}], 'text': '$5,437', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-704841-704848'}, {'row_header_ids': ['rowHeader-704285-704313', 'rowHeader-703796-703825'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['development and engineering', 'Total consolidated research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705118, 'begin': 705111}, 'attributes': [{'location': {'end': 705115, 'begin': 705112}, 'text': '3.5', 'type': 'Number'}], 'text': '(3.5)%', 'row_index_end': 1, 'row_header_texts_normalized': ['development and engineering', 'Total consolidated research,'], 'cell_id': 'bodyCell-705111-705118'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705492, 'begin': 705491}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705491-705492'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705557, 'begin': 705556}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705556-705557'}, {'row_header_ids': ['rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705622, 'begin': 705621}, 'attributes': [], 'text': '', 'row_index_end': 2, 'row_header_texts_normalized': ['Non-operating adjustment'], 'cell_id': 'bodyCell-705621-705622'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 705992, 'begin': 705991}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-705991-705992'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706057, 'begin': 706056}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706056-706057'}, {'row_header_ids': ['rowHeader-705881-705914', 'rowHeader-705389-705414'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Non-operating retirement-related', 'Non-operating adjustment'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706122, 'begin': 706121}, 'attributes': [], 'text': '', 'row_index_end': 3, 'row_header_texts_normalized': ['Non-operating retirement-related', 'Non-operating adjustment'], 'cell_id': 'bodyCell-706121-706122'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706662, 'begin': 706657}, 'attributes': [{'location': {'end': 706660, 'begin': 706658}, 'text': '48', 'type': 'Number'}], 'text': '(48)', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706657-706662'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 706929, 'begin': 706926}, 'attributes': [{'location': {'end': 706928, 'begin': 706926}, 'text': '77', 'type': 'Number'}], 'text': '77', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-706926-706929'}, {'row_header_ids': ['rowHeader-706379-706394', 'rowHeader-705389-705414', 'rowHeader-705881-705914'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707196, 'begin': 707193}, 'attributes': [], 'text': 'NM', 'row_index_end': 4, 'row_header_texts_normalized': ['(costs)/income', 'Non-operating adjustment', 'Non-operating retirement-related'], 'cell_id': 'bodyCell-707193-707196'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707580, 'begin': 707579}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707579-707580'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707645, 'begin': 707644}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707644-707645'}, {'row_header_ids': ['rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 5, 'row_header_texts': ['Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 707710, 'begin': 707709}, 'attributes': [], 'text': '', 'row_index_end': 5, 'row_header_texts_normalized': ['Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-707709-707710'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 1, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708259, 'begin': 708252}, 'attributes': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'type': 'Currency'}], 'text': '$5,200', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708252-708259'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 2, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708527, 'begin': 708520}, 'attributes': [{'location': {'end': 708526, 'begin': 708520}, 'text': '$5,514', 'type': 'Currency'}], 'text': '$5,514', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708520-708527'}, {'row_header_ids': ['rowHeader-707962-707990', 'rowHeader-707471-707502'], 'column_index_begin': 3, 'row_index_begin': 6, 'row_header_texts': ['development and engineering', 'Operating (non-GAAP) research,'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 708798, 'begin': 708791}, 'attributes': [{'location': {'end': 708795, 'begin': 708792}, 'text': '5.7', 'type': 'Number'}], 'text': '(5.7)%', 'row_index_end': 6, 'row_header_texts_normalized': ['development and engineering', 'Operating (non-GAAP) research,'], 'cell_id': 'bodyCell-708791-708798'}], 'contexts': [{'location': {'end': 702649, 'begin': 702242}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 702865, 'begin': 702855}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 703240, 'begin': 703046}, 'text': 'Percent\nChange'}, {'location': {'end': 709029, 'begin': 709012}, 'text': 'NM-Not meaningful'}, {'location': {'end': 709287, 'begin': 709227}, 'text': 'Research, development and engineering (RD&E) expense was'}, {'location': {'end': 709775, 'begin': 709521}, 'text': '6.4 percent of revenue in 2015 and 5.9 percent of revenue in 2014.'}, {'location': {'end': 710189, 'begin': 709945}, 'text': 'RD&E expense decreased 3.5 percent in 2015 versus 2014 primarily driven by:'}], 'key_value_pairs': [{'value': [{'location': {'end': 704580, 'begin': 704574}, 'text': '$5,247', 'cell_id': 'bodyCell-704574-704581'}], 'key': {'location': {'end': 704312, 'begin': 704285}, 'text': 'development and engineering', 'cell_id': 'rowHeader-704285-704313'}}, {'value': [{'location': {'end': 708258, 'begin': 708252}, 'text': '$5,200', 'cell_id': 'bodyCell-708252-708259'}], 'key': {'location': {'end': 707989, 'begin': 707962}, 'text': 'development and engineering', 'cell_id': 'rowHeader-707962-707990'}}], 'title': {}, 'column_headers': []}, {'section_title': {'location': {'end': 627943, 'begin': 627925}, 'text': 'Geographic Revenue'}, 'row_headers': [{'column_index_begin': 0, 'row_index_begin': 0, 'location': {'end': 714975, 'begin': 714949}, 'text': 'Sales and other transfers', 'row_index_end': 0, 'cell_id': 'rowHeader-714949-714975', 'column_index_end': 0, 'text_normalized': 'Sales and other transfers'}, {'column_index_begin': 0, 'row_index_begin': 1, 'location': {'end': 715466, 'begin': 715441}, 'text': 'of intellectual property', 'row_index_end': 1, 'cell_id': 'rowHeader-715441-715466', 'column_index_end': 0, 'text_normalized': 'of intellectual property'}, {'column_index_begin': 0, 'row_index_begin': 2, 'location': {'end': 716567, 'begin': 716538}, 'text': 'Licensing/royalty-based fees', 'row_index_end': 2, 'cell_id': 'rowHeader-716538-716567', 'column_index_end': 0, 'text_normalized': 'Licensing/royalty-based fees'}, {'column_index_begin': 0, 'row_index_begin': 3, 'location': {'end': 717670, 'begin': 717644}, 'text': 'Custom development income', 'row_index_end': 3, 'cell_id': 'rowHeader-717644-717670', 'column_index_end': 0, 'text_normalized': 'Custom development income'}, {'column_index_begin': 0, 'row_index_begin': 4, 'location': {'end': 718753, 'begin': 718747}, 'text': 'Total', 'row_index_end': 4, 'cell_id': 'rowHeader-718747-718753', 'column_index_end': 0, 'text_normalized': 'Total'}], 'table_headers': [], 'location': {'end': 719555, 'begin': 714949}, 'text': 'Sales and other transfers of intellectual property $303 $283 7.1%\nLicensing/royalty-based fees 117 129 (9.8)\nCustom development income 262 330 (20.5)\nTotal $682 $742 (8.1)%\n', 'body_cells': [{'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715053, 'begin': 715052}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715052-715053'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715118, 'begin': 715117}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715117-715118'}, {'row_header_ids': ['rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 0, 'row_header_texts': ['Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715183, 'begin': 715182}, 'attributes': [], 'text': '', 'row_index_end': 0, 'row_header_texts_normalized': ['Sales and other transfers'], 'cell_id': 'bodyCell-715182-715183'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 1, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 715734, 'begin': 715729}, 'attributes': [{'location': {'end': 715733, 'begin': 715729}, 'text': '$303', 'type': 'Currency'}], 'text': '$303', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715729-715734'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 2, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716001, 'begin': 715996}, 'attributes': [{'location': {'end': 716000, 'begin': 715996}, 'text': '$283', 'type': 'Currency'}], 'text': '$283', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-715996-716001'}, {'row_header_ids': ['rowHeader-715441-715466', 'rowHeader-714949-714975'], 'column_index_begin': 3, 'row_index_begin': 1, 'row_header_texts': ['of intellectual property', 'Sales and other transfers'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716269, 'begin': 716264}, 'attributes': [{'location': {'end': 716268, 'begin': 716264}, 'text': '7.1%', 'type': 'Percentage'}], 'text': '7.1%', 'row_index_end': 1, 'row_header_texts_normalized': ['of intellectual property', 'Sales and other transfers'], 'cell_id': 'bodyCell-716264-716269'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 1, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 716835, 'begin': 716831}, 'attributes': [{'location': {'end': 716834, 'begin': 716831}, 'text': '117', 'type': 'Number'}], 'text': '117', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-716831-716835'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 2, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717104, 'begin': 717100}, 'attributes': [{'location': {'end': 717103, 'begin': 717100}, 'text': '129', 'type': 'Number'}], 'text': '129', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717100-717104'}, {'row_header_ids': ['rowHeader-716538-716567'], 'column_index_begin': 3, 'row_index_begin': 2, 'row_header_texts': ['Licensing/royalty-based fees'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717374, 'begin': 717368}, 'attributes': [{'location': {'end': 717372, 'begin': 717369}, 'text': '9.8', 'type': 'Number'}], 'text': '(9.8)', 'row_index_end': 2, 'row_header_texts_normalized': ['Licensing/royalty-based fees'], 'cell_id': 'bodyCell-717368-717374'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 1, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 717939, 'begin': 717935}, 'attributes': [{'location': {'end': 717938, 'begin': 717935}, 'text': '262', 'type': 'Number'}], 'text': '262', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-717935-717939'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 2, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718208, 'begin': 718204}, 'attributes': [{'location': {'end': 718207, 'begin': 718204}, 'text': '330', 'type': 'Number'}], 'text': '330', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718204-718208'}, {'row_header_ids': ['rowHeader-717644-717670'], 'column_index_begin': 3, 'row_index_begin': 3, 'row_header_texts': ['Custom development income'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 718473, 'begin': 718466}, 'attributes': [{'location': {'end': 718471, 'begin': 718467}, 'text': '20.5', 'type': 'Number'}], 'text': '(20.5)', 'row_index_end': 3, 'row_header_texts_normalized': ['Custom development income'], 'cell_id': 'bodyCell-718466-718473'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 1, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 1, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719022, 'begin': 719017}, 'attributes': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'type': 'Currency'}], 'text': '$682', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719017-719022'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 2, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 2, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719287, 'begin': 719282}, 'attributes': [{'location': {'end': 719286, 'begin': 719282}, 'text': '$742', 'type': 'Currency'}], 'text': '$742', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719282-719287'}, {'row_header_ids': ['rowHeader-718747-718753'], 'column_index_begin': 3, 'row_index_begin': 4, 'row_header_texts': ['Total'], 'column_header_texts': [], 'column_index_end': 3, 'column_header_ids': [], 'column_header_texts_normalized': [], 'location': {'end': 719555, 'begin': 719548}, 'attributes': [{'location': {'end': 719552, 'begin': 719549}, 'text': '8.1', 'type': 'Number'}], 'text': '(8.1)%', 'row_index_end': 4, 'row_header_texts_normalized': ['Total'], 'cell_id': 'bodyCell-719548-719555'}], 'contexts': [{'location': {'end': 713812, 'begin': 713406}, 'text': 'For the year ended December 31: 2015 2014'}, {'location': {'end': 714027, 'begin': 714017}, 'text': 'Yr.-to-Yr.'}, {'location': {'end': 714400, 'begin': 714207}, 'text': 'Percent Change'}, {'location': {'end': 720499, 'begin': 719763}, 'text': 'The timing and amount of Sales and other transfers of IP may vary significantly from period to period depending upon the timing of divestitures, economic conditions, industry consolidation and the timing of new patents and know-how development.'}, {'location': {'end': 720730, 'begin': 720500}, 'text': 'There were no material individual IP transactions in 2015 or 2014.'}, {'location': {'end': 720953, 'begin': 720927}, 'text': 'Other (Income) and Expense'}], 'key_value_pairs': [{'value': [{'location': {'end': 719021, 'begin': 719017}, 'text': '$682', 'cell_id': 'bodyCell-719017-719022'}], 'key': {'location': {'end': 718752, 'begin': 718747}, 'text': 'Total', 'cell_id': 'rowHeader-718747-718753'}}], 'title': {}, 'column_headers': []}, If you look at the json you can see it covers multiple tables under Geographic Revenue section: please check page 41-43 in IBM_Annual_Report_2016
View entire conversation on ReviewNB |
I checked this carefully and you can see for example for 2012-2011 we have two geographic revenue tables and that is the same for 2011-2010. So we will have 4 values for America for 2011. You just need to search for 44,944 to validate this. Checking that in the IBM_Annual_Report_2012.pdf I can see two Geographic Revenues tables one for 2012-2011 and another for 2011-2010 which explains why we have 4 values for each region for each year and Watson Discovery has listed both tables for each document correctly.
View entire conversation on ReviewNB |
checking why the data from 2019.pdf has not processed; I can see some results has been returned by WD for 2019.pdf! View entire conversation on ReviewNB |
The column_header_texts for the 2018-2019 table is empty from WD discovery's json output, that is why we were not retain the rows for 2019-2018 table: "column_header_texts": [ "", "", "", "", "" ],
text row_header_texts_0 column_header_texts attributes.type value 0 2019 For the year ended December 31: [DateTime] 2019 1 2018 For the year ended December 31: [Number] 2018
View entire conversation on ReviewNB |
I am just changing the retaining condition or copy the from the text column into the column_header_texts when the text follows the \d4 regex pattern to include the 2018-2019 info as well. View entire conversation on ReviewNB |
Created an issue with the Discovery's team: https://github.ibm.com/Watson-Discovery/disco-issue-tracker/issues/10974 View entire conversation on ReviewNB |
b04c32a
to
ac3c739
Compare
Made the table understanding end to end using the WD python SDK, included a video tutorial to show how someone can use the WD tooling up to querying the project.