Move hardware information into directory structure and add scripting to autopopulate #271

Darin755 · 2024-08-12T15:16:12Z

Checklist

I have read the OpenWISP Contributing Guidelines.
I have manually tested the changes proposed in this pull request.
I have written new test cases for new code and/or updated existing tests for changes to existing code.
I have updated the documentation.

Reference to Existing Issue

Closes #272

Related to #264

Description of Changes

The original ordered dictionary was moved into a file called populatehardware.py. populatehardware.py simply creates the proper directories under targets from the first part of the file name (platform-subplatform) and then it writes all devices matching the first part of the name to devices.json.

After that was completed hardware.py was updated to iterate though the directory structure to pull all the device.json files. It is important to note that hardware.py now imports listdir from os and json. If it fails to load a json for whatever reason it prints a error and moves on.

Darin755 · 2024-08-12T16:52:48Z

I am going to hold off on tests and docs until this is reviewed

nemesifier

@Darin755 thanks for working on this.

I thought we discussed that something useful would be to build a way to parse the OpenWrt source code (a specific branch), extract definitions and merge those definition which we already stored here.

But I don't see here, I see the code has been moved around, but I don't understand the point of this work.

Darin755 · 2024-08-17T02:38:28Z

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.

I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

nemesifier

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.

I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Thanks for the explanation. I understand this could be a preparation for what comes next.

I need to insist against storing JSON in this repository.

We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.

Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.

Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black formatting tool reformat it.
Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.

Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.

Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format should fix most issues automatically.

Darin755 · 2024-08-17T23:58:29Z

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.
I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Thanks for the explanation. I understand this could be a preparation for what comes next.

I need to insist against storing JSON in this repository.

We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.

Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.

Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black formatting tool reformat it. Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.

Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.

Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format should fix most issues automatically.

I understand. However, trying to store data in python is simply not a great option. This is due to both to the reliability and security issues. My concern is that python scripts have the ability to run arbitrary code. This means someone could put a backdoor that is very hard to detect. I am sure you are verifying merges but from my perspective it is risky. The hardware.py before was mostly fine as it was a relatively small file written by humans. The supported devices were just a fancy constant in the program. However, I am looking to automate device addition. When dealing with code it is better to hand review what it is doing to make sure no threats are introduced. However, it is hard to hand review potentially hundreds (if not thousands) of lines of just data. I think there would be a significant attack surface with many points to add a backdoor. If it is a matter of updating the testing that is fine but I think it is time to look into a better way to store constant data. In all reality this change didn't take me very long to do as I automated a lot of it so I am fine accepting that it will not be accepted.

With that being said if there is a secure way to import data from a python file that I missed let me know. Everything I know about secure code tells me that I need to separate data and code. I am sorry for being a pain but I would rather build secure systems if I can. It is entirely possible that I am wrong and if so feel free to correct me.

nemesifier

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.
I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Thanks for the explanation. I understand this could be a preparation for what comes next.
I need to insist against storing JSON in this repository.
We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.
Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.
Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black formatting tool reformat it. Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.
Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.
Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format should fix most issues automatically.

I understand. However, trying to store data in python is simply not a great option. This is due to both to the reliability and security issues. My concern is that python scripts have the ability to run arbitrary code. This means someone could put a backdoor that is very hard to detect. I am sure you are verifying merges but from my perspective it is risky. The hardware.py before was mostly fine as it was a relatively small file written by humans. The supported devices were just a fancy constant in the program. However, I am looking to automate device addition. When dealing with code it is better to hand review what it is doing to make sure no threats are introduced. However, it is hard to hand review potentially hundreds (if not thousands) of lines of just data. I think there would be a significant attack surface with many points to add a backdoor. If it is a matter of updating the testing that is fine but I think it is time to look into a better way to store constant data. In all reality this change didn't take me very long to do as I automated a lot of it so I am fine accepting that it will not be accepted.

With that being said if there is a secure way to import data from a python file that I missed let me know. Everything I know about secure code tells me that I need to separate data and code. I am sorry for being a pain but I would rather build secure systems if I can. It is entirely possible that I am wrong and if so feel free to correct me.

I appreciate your concerns. We have security scans in place and we never merge code without checking and testing it.
Moreover, if we are worried that could happen, it would be pretty easy to write a script that checks that these files contain only data and nothing else, we can than add this check to the CI build.

Darin755 · 2024-08-20T15:37:36Z

I'll look into it and get back to you

Darin755 · 2024-08-20T16:05:53Z

So I can not find a way to do this without using exec(). I can generate python code with file.write("data = "+json.dumps(devobj, indent=4)) but to import it you need to actually execute it which is trickier.

Would it be possible to treat JSON files as python files for formatting and testing? JSON should be able to pass the formatting check with the proper indent set

nemesifier

@Darin755 try this in the ipython shell:

data = {}  # data here
with open('test_data.py', 'w') as file_:
    file_.write(f'data = {str(data)}')

from test_data import data
isinstance(data, dict)
print(data)

Darin755 · 2024-08-27T16:22:51Z

I got it working with python files. It should be ready for review

Darin755 · 2024-10-09T18:33:42Z

Could I get a maintainer to take a look at this?

nemesifier

Thanks for your patience @Darin755, we're currently busy preparing the release so our response may be slow.

I recommend simplifying the directory structure like:

hardware/openwrt/<target>.py

The hardware dir would replace hardware.py. The subdir in hardware indicates the system, now we work mostly with openwrt but it's possible to have other OSes and at some point we'll be headed there.

We can then have just 1 file for target, to avoid having to constantly change this because OpenWrt does change and rename targets often, but we need to have something as stable as possible, we can't be renaming files and moving them around too much.

There's been additions of a few more models on master, so please double check that.

nemesifier · 2024-10-11T16:27:40Z

openwisp_firmware_upgrader/targets/ar71xx/generic/devices.py

@@ -0,0 +1,2 @@
+def returnData():


The camelCase style is not a thing in python projects, not important though here, because I think we should get rid of the function and just leave something like data = {....}.

The data structure should be indented (should be doable using openwisp-qa-format, make sure to use the latest version available in the openwisp-utils repo).

Thanks, I will look into it

added initial script and directories

cc31056

Darin755 changed the title ~~[Draft] added initial script and directories~~ [Draft] Move hardware information into directory structure and add scripting to autopopulate Aug 12, 2024

Darin755 added 2 commits August 12, 2024 09:57

added json generation and json

04f4e47

modified hardware.py to import devices from tree

9bef67f

Darin755 changed the title ~~[Draft] Move hardware information into directory structure and add scripting to autopopulate~~ Move hardware information into directory structure and add scripting to autopopulate Aug 12, 2024

Darin755 marked this pull request as ready for review August 12, 2024 16:53

nemesifier reviewed Aug 16, 2024

View reviewed changes

nemesifier reviewed Aug 17, 2024

View reviewed changes

nemesifier reviewed Aug 19, 2024

View reviewed changes

nemesifier reviewed Aug 20, 2024

View reviewed changes

switched to py

57482f8

nemesifier and others added 2 commits August 28, 2024 12:03

Merge branch 'master' into master

488b060

fixed targets dir (hopefully)

3455ba6

nemesifier requested changes Oct 11, 2024

View reviewed changes

Darin755 added 2 commits October 21, 2024 13:41

added new hardware

b72b5e4

moved devices to new format

94da12f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move hardware information into directory structure and add scripting to autopopulate #271

Move hardware information into directory structure and add scripting to autopopulate #271

Darin755 commented Aug 12, 2024 •

edited

Loading

Darin755 commented Aug 12, 2024

nemesifier left a comment

Darin755 commented Aug 17, 2024

nemesifier left a comment

Darin755 commented Aug 17, 2024

nemesifier left a comment

Darin755 commented Aug 20, 2024

Darin755 commented Aug 20, 2024 •

edited

Loading

nemesifier left a comment

Darin755 commented Aug 27, 2024 •

edited

Loading

Darin755 commented Oct 9, 2024

nemesifier left a comment

nemesifier Oct 11, 2024

Darin755 Oct 14, 2024

Move hardware information into directory structure and add scripting to autopopulate #271

Are you sure you want to change the base?

Move hardware information into directory structure and add scripting to autopopulate #271

Conversation

Darin755 commented Aug 12, 2024 • edited Loading

Checklist

Reference to Existing Issue

Description of Changes

Darin755 commented Aug 12, 2024

nemesifier left a comment

Choose a reason for hiding this comment

Darin755 commented Aug 17, 2024

nemesifier left a comment

Choose a reason for hiding this comment

Darin755 commented Aug 17, 2024

nemesifier left a comment

Choose a reason for hiding this comment

Darin755 commented Aug 20, 2024

Darin755 commented Aug 20, 2024 • edited Loading

nemesifier left a comment

Choose a reason for hiding this comment

Darin755 commented Aug 27, 2024 • edited Loading

Darin755 commented Oct 9, 2024

nemesifier left a comment

Choose a reason for hiding this comment

nemesifier Oct 11, 2024

Choose a reason for hiding this comment

Darin755 Oct 14, 2024

Choose a reason for hiding this comment

Darin755 commented Aug 12, 2024 •

edited

Loading

Darin755 commented Aug 20, 2024 •

edited

Loading

Darin755 commented Aug 27, 2024 •

edited

Loading