Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate python module name #31

Open
ryanmrichard opened this issue Aug 18, 2022 · 22 comments
Open

Remove duplicate python module name #31

ryanmrichard opened this issue Aug 18, 2022 · 22 comments

Comments

@ryanmrichard
Copy link
Member

Right now our Python API looks like:

import nwchemex

mm = nwchemex.chemist.pluginplay.ModuleManager()
nwchemex.nwchemex.load_modules(mm)

Namely each top-level Python module keeps its functionality in a submodule of the same name.

I think it makes more sense for the API to be:

import nwchemex

mm = nwchemex.chemist.pluginplay.ModuleManager()
nwchemex.load_modules(mm)

I assume this is an either an issue with the generated __init__.py file or the CMake module for creating the Python bindings.

@wadejong
Copy link

This must be different from from nwchemex import * . Here, I can simply do mm = pluginplay.ModuleManager()

@wadejong
Copy link

See [https://www.codingem.com/python-difference-between-import-and-from-import/] for the difference between them.

@ryanmrichard
Copy link
Member Author

The issue is referring to the nwchemex.nwchemex.load_modules(...) vs. nwchemex.load_modules(...) syntax. We shouldn't have to do from nwchemex import nwchemx (or from nwchem import *) to get the nwchemex.load_modules(...) syntax.

@wadejong
Copy link

Agreed. Let me investigate.

@wadejong
Copy link

Same solution as #34 .

@wadejong
Copy link

import nwchemex imports the package including the nwchemex class and

dir(nwchemex) shows what is loaded
'dir(nwchemex.nwchemex)` shows what functions are available

from nwchemex import nwchemex imports the classes in nwchemex class

from nwchemex import chemist imports the classes in chemist

There is not much we can do about this, this is how the Python structure works.

@ryanmrichard
Copy link
Member Author

ryanmrichard commented Sep 14, 2022

from nwchemex import nwchemex imports the classes in nwchemex class

FTR, there is no nwchemex class.

To reiterate, the problem here is our package setup not how Python works.

I expect:

import nwchemex
nwchemex.load_modules()

to work. I should not have to do:

from nwchemex import nwchemex
nwchemex.load_moddules()

The former code snippet imports the nwchemex package, the latter imports the nwchemex.nwchemex package (which is a subpackage of the nwchemex package). If it helps to think of it in C++ terms, the structure we are mirroring looks like nwchemex::load_modules(); not nwchemex::nwchemex::load_modules().

@wadejong
Copy link

wadejong commented Sep 14, 2022

You are correct, I misspoke.

import nwchemex imports the package namespace that includes the nwchemex namespace

from nwchemex import nwchemex imports the classes in nwchemex namespace

@ryanmrichard
Copy link
Member Author

You are correct, I misspoke.

To be clear are you saying that you agree that it shouldn't be from nwchemex import nwchemex?

@wadejong
Copy link

It's undesirable, but this is a Python feature. To do what you want we need to tell cppyy to ignore the nwchemex namespace (for example).

@ryanmrichard
Copy link
Member Author

End of the day, this syntax is not normal. Users do:

import numpy 
import math
my_array = numpy.ndarray()
two = math.sqrt(4)

not:

from numpy import numpy
from math import math
my_array = numpy.ndarray()
two = math.sqrt(4)

As for resolving this issue, if we have to provide a wrapper to get the syntax to be from nwchemex import load_modules that's fine with me, but the user should not have to do from nwchemex import nwchemex it's non-intuitive, it doesn't mirror the C++, and it looks weird.

@wadejong
Copy link

wadejong commented Sep 14, 2022

Will see if there is a way around this. For reference: https://realpython.com/python-modules-packages/

@wadejong
Copy link

What is your issue with from nwchemex import * ???

from nwchemex import *

mm = pluginplay.ModuleManager()
nwchemex.load_modules(mm)

mol = chemist.Molecule()

With this all of it works, and you can directly access all the modules. There is no way to effectively remove the extra namespace. And, adding a wrapper to initialize only adds another level of namespace.

@ryanmrichard
Copy link
Member Author

First from x import * is in general frowned upon; for various reasons (notably performance) modern Python pushes IWYU (include what you use). Second, in our present case I think that's literally the same as from nwchemex import nwchemex which we agreed is non ideal.

FWIW, I'm pretty sure this entire issue can be resolved with a one line modification to the __init__.py file.

@wadejong
Copy link

Actually, I see where @ryanmrichard was going with the init.py discussion. I think I overlooked what I think he is referring to.

Bert

@ryanmrichard
Copy link
Member Author

I was wrong. I don't think this can be fixed in one line. There's more going on here than I appreciated. I thought we could do something like:

from cppy.gbl.parallelzone import *

or (since cppy.gbl isn't a package)

from cppy.gbl import parallelzone
from parallelzone import *

to strip off the extra parallelzone.

However, while testing my solution I now see that when you do from parallelzone import parallelzone you're actually importing a class called parallelzone_meta that lives in the parallelzone package. The members of that class are the contents of the parallelzone package. This is very unexpected behavior to me.

@ryanmrichard
Copy link
Member Author

I'm incredibly confused by what's going on because I don't understand how import parallelzone imports a class called parallelzone_meta I would have expected the line to be:

from parallelzone import parallelzone_meta

@wadejong
Copy link

This seems to work:

import cppyy.gbl.parallelzone as parallelzone

@ryanmrichard
Copy link
Member Author

Working off of NWChemEx/ParallelZone#75, your solution doesn't work for me:

[ctest] ImportError: Failed to import test module: test_parallelzone
[ctest] Traceback (most recent call last):
[ctest]   File "/usr/lib/python3.10/unittest/loader.py", line 470, in _find_test_path
[ctest]     package = self._get_module_from_name(name)
[ctest]   File "/usr/lib/python3.10/unittest/loader.py", line 377, in _get_module_from_name
[ctest]     __import__(name)
[ctest]   File "/home/ryan/workspaces/NWX/ParallelZone/tests/py_unit_tests/test_parallelzone/__init__.py", line 19, in <module>
[ctest]     from parallelzone import parallelzone
[ctest]   File "/home/ryan/workspaces/NWX/ParallelZone/build/Python/parallelzone/__init__.py", line 28, in <module>
[ctest]     import cppyy.gbl.parallelzone as parallelzone
[ctest] ModuleNotFoundError: No module named 'cppyy.gbl.parallelzone'; 'cppyy.gbl' is not a package
[ctest] 

@wadejong
Copy link

I did not check the init. I simply was doing:

`[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import parallelzone
import cppyy.gbl.parallelzone as parallelzone
dir(parallelzone)
['add', 'bool', 'class', 'delattr', 'destruct', 'dict', 'dir', 'dispatch', 'doc', 'eq', 'format', 'ge', 'getattribute', 'getitem', 'gt', 'hash', 'init', 'init_subclass', 'invert', 'le', 'lt', 'module', 'mul', 'ne', 'neg', 'new', 'pos', 'python_owns', 'radd', 'reduce', 'reduce_ex', 'repr', 'rmul', 'rsub', 'rtruediv', 'setattr', 'sizeof', 'smartptr', 'str', 'sub', 'subclasshook', 'truediv', 'weakref', 'hash_objects', 'hash_to_string', 'make_file_logger', 'make_hash', 'make_hasher', 'make_null_logger', 'make_stderr_logger', 'make_stdout_logger', 'make_stream_logger']

`

Not sure why this does not work on the init side.

@ryanmrichard
Copy link
Member Author

I think I'm starting to understand what's going on. It appears that Cppyy does not make a parallelzone module or package at all. All the Cppyy symbols live in a global object gbl which is of type _meta and is part of the cppyy module. When you do import cppyy.gbl.parallelzone as parallelzone you're actually bringing a global object parallelzone (which is of type parallelzone_meta and is a member of cppyy.gbl) into scope. When we do that in parallelzone/__init__.py we create a package parallelzone which has a global object parallelzone which lives in it.

So @wadejong I think that you're right, there's nothing we can do at the moment to change this sytax. The current syntax is a result of Python's import semantics and the decisions Cppyy makes. However, I would argue the Cppyy decision to map a C++ namespace to a Python class doesn't make sense. Conceptually it's more of a Python module or a Python package. I don't know, but maybe it's not possible for Cppyy to automate such a mapping?

@wadejong
Copy link

I posed the question to Wim. Let's see what his thoughts are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants