Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move hardware information into directory structure and add scripting to autopopulate #271

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Darin755
Copy link

@Darin755 Darin755 commented Aug 12, 2024

Checklist

  • I have read the OpenWISP Contributing Guidelines.
  • I have manually tested the changes proposed in this pull request.
  • I have written new test cases for new code and/or updated existing tests for changes to existing code.
  • I have updated the documentation.

Reference to Existing Issue

Closes #272

Related to #264

Description of Changes

The original ordered dictionary was moved into a file called populatehardware.py. populatehardware.py simply creates the proper directories under targets from the first part of the file name (platform-subplatform) and then it writes all devices matching the first part of the name to devices.json.

After that was completed hardware.py was updated to iterate though the directory structure to pull all the device.json files. It is important to note that hardware.py now imports listdir from os and json. If it fails to load a json for whatever reason it prints a error and moves on.

@Darin755 Darin755 changed the title [Draft] added initial script and directories [Draft] Move hardware information into directory structure and add scripting to autopopulate Aug 12, 2024
@Darin755 Darin755 changed the title [Draft] Move hardware information into directory structure and add scripting to autopopulate Move hardware information into directory structure and add scripting to autopopulate Aug 12, 2024
@Darin755
Copy link
Author

I am going to hold off on tests and docs until this is reviewed

@Darin755 Darin755 marked this pull request as ready for review August 12, 2024 16:53
Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Darin755 thanks for working on this.

I thought we discussed that something useful would be to build a way to parse the OpenWrt source code (a specific branch), extract definitions and merge those definition which we already stored here.

But I don't see here, I see the code has been moved around, but I don't understand the point of this work.

@Darin755
Copy link
Author

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.

I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.

I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Thanks for the explanation. I understand this could be a preparation for what comes next.

I need to insist against storing JSON in this repository.

We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.

Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.

Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black formatting tool reformat it.
Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.

Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.

Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format should fix most issues automatically.

@Darin755
Copy link
Author

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.
I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Thanks for the explanation. I understand this could be a preparation for what comes next.

I need to insist against storing JSON in this repository.

We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.

Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.

Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black formatting tool reformat it. Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.

Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.

Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format should fix most issues automatically.

I understand. However, trying to store data in python is simply not a great option. This is due to both to the reliability and security issues. My concern is that python scripts have the ability to run arbitrary code. This means someone could put a backdoor that is very hard to detect. I am sure you are verifying merges but from my perspective it is risky. The hardware.py before was mostly fine as it was a relatively small file written by humans. The supported devices were just a fancy constant in the program. However, I am looking to automate device addition. When dealing with code it is better to hand review what it is doing to make sure no threats are introduced. However, it is hard to hand review potentially hundreds (if not thousands) of lines of just data. I think there would be a significant attack surface with many points to add a backdoor. If it is a matter of updating the testing that is fine but I think it is time to look into a better way to store constant data. In all reality this change didn't take me very long to do as I automated a lot of it so I am fine accepting that it will not be accepted.

With that being said if there is a secure way to import data from a python file that I missed let me know. Everything I know about secure code tells me that I need to separate data and code. I am sorry for being a pain but I would rather build secure systems if I can. It is entirely possible that I am wrong and if so feel free to correct me.

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.
I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.

Thanks for the explanation. I understand this could be a preparation for what comes next.
I need to insist against storing JSON in this repository.
We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.
Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.
Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black formatting tool reformat it. Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.
Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.
Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format should fix most issues automatically.

I understand. However, trying to store data in python is simply not a great option. This is due to both to the reliability and security issues. My concern is that python scripts have the ability to run arbitrary code. This means someone could put a backdoor that is very hard to detect. I am sure you are verifying merges but from my perspective it is risky. The hardware.py before was mostly fine as it was a relatively small file written by humans. The supported devices were just a fancy constant in the program. However, I am looking to automate device addition. When dealing with code it is better to hand review what it is doing to make sure no threats are introduced. However, it is hard to hand review potentially hundreds (if not thousands) of lines of just data. I think there would be a significant attack surface with many points to add a backdoor. If it is a matter of updating the testing that is fine but I think it is time to look into a better way to store constant data. In all reality this change didn't take me very long to do as I automated a lot of it so I am fine accepting that it will not be accepted.

With that being said if there is a secure way to import data from a python file that I missed let me know. Everything I know about secure code tells me that I need to separate data and code. I am sorry for being a pain but I would rather build secure systems if I can. It is entirely possible that I am wrong and if so feel free to correct me.

I appreciate your concerns. We have security scans in place and we never merge code without checking and testing it.
Moreover, if we are worried that could happen, it would be pretty easy to write a script that checks that these files contain only data and nothing else, we can than add this check to the CI build.

@Darin755
Copy link
Author

I'll look into it and get back to you

@Darin755
Copy link
Author

Darin755 commented Aug 20, 2024

So I can not find a way to do this without using exec(). I can generate python code with file.write("data = "+json.dumps(devobj, indent=4)) but to import it you need to actually execute it which is trickier.

Would it be possible to treat JSON files as python files for formatting and testing? JSON should be able to pass the formatting check with the proper indent set

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Darin755 try this in the ipython shell:

data = {}  # data here
with open('test_data.py', 'w') as file_:
    file_.write(f'data = {str(data)}')

from test_data import data
isinstance(data, dict)
print(data)

@Darin755
Copy link
Author

Darin755 commented Aug 27, 2024

I got it working with python files. It should be ready for review

@Darin755
Copy link
Author

Darin755 commented Oct 9, 2024

Could I get a maintainer to take a look at this?

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience @Darin755, we're currently busy preparing the release so our response may be slow.

I recommend simplifying the directory structure like:

hardware/openwrt/<target>.py

The hardware dir would replace hardware.py. The subdir in hardware indicates the system, now we work mostly with openwrt but it's possible to have other OSes and at some point we'll be headed there.

We can then have just 1 file for target, to avoid having to constantly change this because OpenWrt does change and rename targets often, but we need to have something as stable as possible, we can't be renaming files and moving them around too much.

There's been additions of a few more models on master, so please double check that.

@@ -0,0 +1,2 @@
def returnData():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The camelCase style is not a thing in python projects, not important though here, because I think we should get rid of the function and just leave something like data = {....}.

The data structure should be indented (should be doable using openwisp-qa-format, make sure to use the latest version available in the openwisp-utils repo).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will look into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Store hardware table in directory structure with JSON
2 participants