Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No errors on Python 3.12 when none of url, urls, or urls_file is given #59

Open
BurnzZ opened this issue Aug 22, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@BurnzZ
Copy link
Contributor

BurnzZ commented Aug 22, 2024

Overview

From the following PRs:

We have respectively introduced urls_file and urls as new parameters to indicate input URLs to the crawls, alongside the existing url parameter.

Should none of these 3 parameters are supplied to a crawl, the expected behavior would be to have the following error message:

  File "/some_dir/zyte-spider-templates-project/venv/lib/python/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for EcommerceSpiderParams
  Value error, No input parameter defined. Please, define one of: url, urls, urls_file. [type=value_error, input_value={}, input_type=dict]

However, it would seem that when using Python 3.12, the error does not exist.

Code to Reproduce

from scrapy_spider_metadata import Args
from zyte_spider_templates.spiders.ecommerce import EcommerceSpiderParams
from zyte_spider_templates.spiders.base import BaseSpider


class Spider(Args[EcommerceSpiderParams], BaseSpider):
    name = "spider"


if __name__ == "__main__":
    Spider()

Python 3.11

  • python file.py
  • scrapy crawl spider

Python 3.12

  • python file.py
  • scrapy crawl spider (no error at all)
@BurnzZ BurnzZ added the bug Something isn't working label Aug 22, 2024
@wRAR
Copy link
Member

wRAR commented Aug 23, 2024

It's basically scrapy/scrapy#6047

The exception is bubbled up to the deferred created with self.crawler_process.crawl() in the crawl or runspider command, but that deferred has no errback.

(No idea why is this situation handled differently on 3.11 and 3.12 🤷)

@wRAR
Copy link
Member

wRAR commented Aug 23, 2024

So ideally we just shouldn't rely on unhandled exceptions, unless we fix Scrapy.

@Gallaecio
Copy link
Contributor

Gallaecio commented Aug 23, 2024

So in Python 3.12+ Twisted no longer reports (exceptions in) unhandled deferreds?

@wRAR
Copy link
Member

wRAR commented Aug 23, 2024

Not sure what could have changed.

from scrapy import Spider


class MySpider(Spider):
    name = "spider"

    def __init__(self, *args, **kwargs):
        1/0

This shows an unhandled exception on both Python versions.

@wRAR
Copy link
Member

wRAR commented Aug 26, 2024

So far I was able to minimize it to this:

import scrapy
from pydantic import BaseModel, model_validator


class Model(BaseModel):
    @model_validator(mode="after")
    def foo(self):
        raise ValueError()


class Spider(scrapy.Spider):
    name = "spider"

    def __init__(self, *args, **kwargs) -> None:
        Model()
        super().__init__(*args, **kwargs)

Just having e.g. a required field is not enough to trigger this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants