Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use KubernetesPodOperator as superclass when defining new components #3109

Open
gaelxcowi opened this issue Feb 13, 2023 · 2 comments
Assignees
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime status:Needs Triage

Comments

@gaelxcowi
Copy link

gaelxcowi commented Feb 13, 2023

Describe the issue
In this scenario I want to make a new Airflow component that is based on (inherits from) the KubernetesPodOperator.

After the .py file definition and component registering, it is not displayed in the available components of the Visual Editor. As far as I can understand the issue lays in this regex pattern that fails to add KubernetesPodOperator to the operator_bases list, which then does not add the class to the operator_classes list.

Inside the elyra.pipeline.airflow.component_parser_airflow.py the _filter_operator_classes function has the following regex patterns for adding extra operator bases:

        # Get class names for package imports that match one of the following patterns,
        # indicating that this class does match a known Operator class as defined in
        # a provider package or core Airflow package
        regex_patterns = [
            re.compile(r"airflow\.providers\.[a-zA-Z0-9_]+\.operators"),  # airflow.providers.*.operators (provider)
            re.compile(r"airflow\.operators\."),  # airflow.operators.* (core Airflow package)
        ]

and a quick test shows that it does not identify correctly the kubernetes pod import:
image

As far as I am concerned a quick fix would be to just add a dot (.) to the pattern - tho I might be missing some perspective, as follows:

image

The pattern would change from [a-zA-Z0-9_] to [a-zA-Z0-9_.] that then should give the desired result.

Also on a bit of a side quest: it would be really nice if it would be possible to extend that list of patterns, or somehow be able to add own operator sources - given that at some point they inherit from BaseOperator,

Deployment information
Describe what you've deployed and how:

  • Elyra version: 3.14.2
  • Operating system: linux
  • Installation source: PyPI
  • Deployment type: Kubernetes

Pipeline runtime environment

  • Apache Airflow (2.4.3)
@gaelxcowi gaelxcowi added component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines status:Needs Triage labels Feb 13, 2023
@ptitzler ptitzler added the platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime label Feb 13, 2023
@ptitzler
Copy link
Member

Unfortunately Elyra currently does not support Airflow 2x as a runtime. The approach we've taken to extract operator information from [Airflow] provider packages is no longer compatible with the entrypoint-based approach that newer Airflow versions use.

@gaelxcowi
Copy link
Author

@ptitzler I understand that, tho I do think that this could be a good occasion to refactor such that you can configure valid operator bases - it would allow defining operator bases somewhere in a separate library and only using it as a parent in the component definition.

If going one more level, than one could also define the entire component in a package, and then the component definition would only consist of importing the said class.

It is still definitely more preferable to allow "local" operator definition, but something like this, as a quick example:

    def _parse_all_classes(self, file_contents: str) -> Dict[str, Dict]:
        """
        Parses the contents of the file to retrieve operators from the
        import command.
        """

        operator_imports = ast.parse(file_contents)
        exec(file_contents)
        
        operator_classes = []

        for classes in operator_imports.body[0].names: 
            class_string = inspect.getsource(eval(classes.name))
            class_module = ast.parse(class_string)
            operator_classes.extend(class_module.body)

        return {
            operator.name: {
                "init_function": self._get_class_init_function_def(operator),
                "docstring": self._get_class_docstring(operator) or "",
            }
            for operator in operator_classes
        }

would allow the operator .py file to just be the import clause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines platform: pipeline-Airflow Related to usage of Apache Airflow as pipeline runtime status:Needs Triage
Projects
None yet
Development

No branches or pull requests

3 participants