No errors on Python 3.12 when none of `url`, `urls`, or `urls_file` is given #59

BurnzZ · 2024-08-22T03:31:38Z

Overview

From the following PRs:

We have respectively introduced urls_file and urls as new parameters to indicate input URLs to the crawls, alongside the existing url parameter.

Should none of these 3 parameters are supplied to a crawl, the expected behavior would be to have the following error message:

  File "/some_dir/zyte-spider-templates-project/venv/lib/python/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for EcommerceSpiderParams
  Value error, No input parameter defined. Please, define one of: url, urls, urls_file. [type=value_error, input_value={}, input_type=dict]

However, it would seem that when using Python 3.12, the error does not exist.

Code to Reproduce

from scrapy_spider_metadata import Args
from zyte_spider_templates.spiders.ecommerce import EcommerceSpiderParams
from zyte_spider_templates.spiders.base import BaseSpider


class Spider(Args[EcommerceSpiderParams], BaseSpider):
    name = "spider"


if __name__ == "__main__":
    Spider()

Python 3.11

✅ python file.py
✅ scrapy crawl spider

Python 3.12

✅ python file.py
❌ scrapy crawl spider (no error at all)

The text was updated successfully, but these errors were encountered:

wRAR · 2024-08-23T10:05:48Z

It's basically scrapy/scrapy#6047

The exception is bubbled up to the deferred created with self.crawler_process.crawl() in the crawl or runspider command, but that deferred has no errback.

(No idea why is this situation handled differently on 3.11 and 3.12 🤷)

wRAR · 2024-08-23T10:13:43Z

So ideally we just shouldn't rely on unhandled exceptions, unless we fix Scrapy.

Gallaecio · 2024-08-23T10:18:48Z

So in Python 3.12+ Twisted no longer reports (exceptions in) unhandled deferreds?

wRAR · 2024-08-23T12:13:35Z

Not sure what could have changed.

from scrapy import Spider


class MySpider(Spider):
    name = "spider"

    def __init__(self, *args, **kwargs):
        1/0

This shows an unhandled exception on both Python versions.

wRAR · 2024-08-26T14:06:08Z

So far I was able to minimize it to this:

import scrapy
from pydantic import BaseModel, model_validator


class Model(BaseModel):
    @model_validator(mode="after")
    def foo(self):
        raise ValueError()


class Spider(scrapy.Spider):
    name = "spider"

    def __init__(self, *args, **kwargs) -> None:
        Model()
        super().__init__(*args, **kwargs)

Just having e.g. a required field is not enough to trigger this.

BurnzZ added the bug Something isn't working label Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No errors on Python 3.12 when none of `url`, `urls`, or `urls_file` is given #59

No errors on Python 3.12 when none of `url`, `urls`, or `urls_file` is given #59

BurnzZ commented Aug 22, 2024 •

edited

Loading

wRAR commented Aug 23, 2024 •

edited

Loading

wRAR commented Aug 23, 2024

Gallaecio commented Aug 23, 2024 •

edited

Loading

wRAR commented Aug 23, 2024

wRAR commented Aug 26, 2024 •

edited

Loading

No errors on Python 3.12 when none of url, urls, or urls_file is given #59

No errors on Python 3.12 when none of url, urls, or urls_file is given #59

Comments

BurnzZ commented Aug 22, 2024 • edited Loading

Overview

Code to Reproduce

wRAR commented Aug 23, 2024 • edited Loading

wRAR commented Aug 23, 2024

Gallaecio commented Aug 23, 2024 • edited Loading

wRAR commented Aug 23, 2024

wRAR commented Aug 26, 2024 • edited Loading

No errors on Python 3.12 when none of `url`, `urls`, or `urls_file` is given #59

No errors on Python 3.12 when none of `url`, `urls`, or `urls_file` is given #59

BurnzZ commented Aug 22, 2024 •

edited

Loading

wRAR commented Aug 23, 2024 •

edited

Loading

Gallaecio commented Aug 23, 2024 •

edited

Loading

wRAR commented Aug 26, 2024 •

edited

Loading