Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: related article validation #698

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
71628cf
Adapta a classe de validação ao novo modelo
Rossi-Luciano Sep 12, 2024
8cd2d4e
Adapta os testes
Rossi-Luciano Sep 12, 2024
4f19b32
Adiciona validação para os atributos de 'related-article'
Rossi-Luciano Sep 13, 2024
e1c0ad1
Adiciona validação para evento em 'history'
Rossi-Luciano Sep 13, 2024
641094a
Adiciona testes
Rossi-Luciano Sep 13, 2024
7beefdc
Remove 'related_articles_matches_history_date_validation'
Rossi-Luciano Sep 14, 2024
8e9232f
Adapta os testes
Rossi-Luciano Sep 14, 2024
fdc7fb0
Refatora e complementa validação de 'errata'
Rossi-Luciano Sep 22, 2024
e35d721
Adapta e adiciona testes
Rossi-Luciano Sep 22, 2024
5cb9511
Remove 'validation' de 'title'
Rossi-Luciano Sep 22, 2024
96a3999
Adapta os testes
Rossi-Luciano Sep 22, 2024
d3facd9
Refatora validação de 'errata'
Rossi-Luciano Sep 24, 2024
530ad59
Adapta os testes
Rossi-Luciano Sep 24, 2024
80664ef
Adiciona argumento "related_article_type"
Rossi-Luciano Sep 30, 2024
078c2b8
Adiciona teste
Rossi-Luciano Sep 30, 2024
3c9f8ef
Remove o módulo de validação de 'preprint'
Rossi-Luciano Oct 9, 2024
a1fc865
Remove o módulo de validação de 'errata'
Rossi-Luciano Oct 9, 2024
35158fc
Corrige a extensão dos arquivos de listas controladas
Rossi-Luciano Oct 9, 2024
2af892d
Adiciona 'related_article.json'
Rossi-Luciano Oct 9, 2024
14f341f
Adiciona 'related_article_type_date_type.json'
Rossi-Luciano Oct 9, 2024
619c161
Renomeia classe
Rossi-Luciano Oct 9, 2024
acfb059
Adiciona 'remove_namespaces()'
Rossi-Luciano Oct 9, 2024
6f6b968
Aplica formatação
Rossi-Luciano Oct 9, 2024
cfac0fa
Adiciona 'full_tag'
Rossi-Luciano Oct 9, 2024
626fdd3
Aplica formatação
Rossi-Luciano Oct 9, 2024
df03ab3
Corrige importações
Rossi-Luciano Oct 9, 2024
f919184
Corrige e adiciona 'docstring'
Rossi-Luciano Oct 9, 2024
c1a45b2
Remove validação de atributos (substituída pela validação de ordem)
Rossi-Luciano Oct 9, 2024
2d946ec
Adiciona 'validate_history_date'
Rossi-Luciano Oct 9, 2024
fdc7e89
Adiciona validação de ordem dos atributos
Rossi-Luciano Oct 9, 2024
3d347e6
Adiciona validação de DOI
Rossi-Luciano Oct 9, 2024
4d373d3
Adiciona 'validate'
Rossi-Luciano Oct 9, 2024
47c4598
Adiciona 'validate_related_article_matches_article_type'
Rossi-Luciano Oct 9, 2024
cbb03fe
Adapta testes de artigos relacionados
Rossi-Luciano Oct 9, 2024
e324186
Adapta testes de 'errata'
Rossi-Luciano Oct 9, 2024
5946418
Adapta testes de 'preprint'
Rossi-Luciano Oct 9, 2024
aeea305
Merge branch 'master' into feat_realated_article_validation
Rossi-Luciano Oct 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 41 additions & 118 deletions packtools/sps/validation/errata.py
Original file line number Diff line number Diff line change
@@ -1,151 +1,74 @@
from packtools.sps.validation.utils import format_response
from packtools.sps.models.related_articles import RelatedItems
from packtools.sps.models.article_dates import HistoryDates
from packtools.sps.models.v2.related_articles import RelatedArticles
from packtools.sps.models.article_dates import ArticleDates


class ValidationBase:
def __init__(self, xml_tree, expected_article_type, expected_related_article_type):
class RelatedArticlesValidation:
def __init__(self, xml_tree, correspondence_list):
self.xml_tree = xml_tree
self.article_lang = xml_tree.get("{http://www.w3.org/XML/1998/namespace}lang")
self.correspondence_list = correspondence_list
self.article_type = xml_tree.find(".").get("article-type")
self.expected_article_type = expected_article_type
self.expected_related_article_type = expected_related_article_type
self.related_articles = self._get_related_articles()
self.related_articles = list(RelatedArticles(xml_tree).related_articles())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano aqui está sendo recuperado todos os related_articles. Em RelatedArticles, crie método para filtrar somente os related_articles com os quais deseja trabalhar, no lugar de criar em RelatedArticlesValidation o método get_related_article_types_by_article_type

self.history_dates = ArticleDates(xml_tree).history_dates_dict

def validate_related_article(self, title, error_level="ERROR"):
"""
Validates the related articles against the expected type and other criteria.
def get_related_article_types_by_article_type(self, obtained_article_type):
return {item['related-article-type'] for item in self.correspondence_list
if item['article-type'] == obtained_article_type}

Args:
error_level (str, optional): The error level for the validation response. Defaults to "ERROR".
def get_related_article_types(self):
return {item['related-article-type'] for item in self.related_articles}

Yields:
dict: A formatted response indicating whether the validation passed or failed.
"""
if self.article_type != self.expected_article_type:
return
def get_history_events_by_related_article_type(self):
obtained_related_article_types = self.get_related_article_types()
return {item['date-type'] for item in self.correspondence_list
if item['related-article-type'] in obtained_related_article_types and item['date-type']}

expected_response = f'at least one <related-article related-article-type="{self.expected_related_article_type}">'
def get_history_events(self):
return set(self.history_dates.keys())

if self.related_articles:
yield from (
format_response(
title=title,
parent=related_article.get("parent"),
parent_id=related_article.get("parent_id"),
parent_article_type=related_article.get("parent_article_type"),
parent_lang=related_article.get("parent_lang"),
item="related-article",
sub_item="@related-article-type",
validation_type="match",
is_valid=True,
expected=expected_response,
obtained=self._format_obtained(related_article),
advice=None,
data=related_article,
error_level=error_level
)
for related_article in self.related_articles
)
else:
def validate_related_articles(self, error_level="ERROR"):
expected_related_article_types = self.get_related_article_types_by_article_type(self.article_type)
obtained_related_article_types = self.get_related_article_types()

missing_types = expected_related_article_types - obtained_related_article_types
if missing_types:
related_article_type = next(iter(missing_types))
yield format_response(
title=title,
title=f"matching '{self.article_type}' and '{related_article_type}'",
parent="article",
parent_id=None,
parent_article_type=self.article_type,
parent_lang=self.article_lang,
parent_lang=self.xml_tree.find(".").get("{http://www.w3.org/XML/1998/namespace}lang"),
item="related-article",
sub_item="@related-article-type",
validation_type="exist",
validation_type="match",
is_valid=False,
expected=expected_response,
expected=f'at least one <related-article related-article-type="{related_article_type}">',
obtained=None,
advice=f'provide <related-article related-article-type="{self.expected_related_article_type}">',
data=None,
advice=f'provide <related-article related-article-type="{related_article_type}">',
data=self.related_articles,
error_level=error_level
)

def _get_related_articles(self,):
return [
article for article in RelatedItems(self.xml_tree).related_articles
if article.get("related-article-type") == self.expected_related_article_type
]

def _format_obtained(self, related_article):
return (
f'<related-article ext-link-type="{related_article.get("ext-link-type")}" '
f'id="{related_article.get("id")}" related-article-type="{related_article.get("related-article-type")}" '
f'xlink:href="{related_article.get("href")}"/>'
)


class ErrataValidation(ValidationBase):
def __init__(self, xml_tree, expected_article_type, expected_related_article_type):
super().__init__(xml_tree, expected_article_type, expected_related_article_type)

def validate_related_article(self, error_level="ERROR", title="validation matching 'correction' and 'corrected-article'"):
"""
Validates related articles specifically for corrected articles.

Args:
error_level (str, optional): The error level for the validation response. Defaults to "ERROR".

Yields:
dict: A formatted response indicating whether the validation passed or failed.
"""
yield from super().validate_related_article(error_level=error_level, title=title)

def validate_history_events(self, error_level="ERROR"):
expected_history_events = self.get_history_events_by_related_article_type()
obtained_history_events = self.get_history_events()

class CorrectedArticleValidation(ValidationBase):
def __init__(self, xml_tree, expected_article_type, expected_related_article_type):
super().__init__(xml_tree, expected_article_type, expected_related_article_type)
self.history_dates = self._get_history_dates()

def validate_related_article(self, error_level="ERROR", title="validation matching 'correction' and 'correction-forward'"):
"""
Validates related articles specifically for corrected articles.

Args:
error_level (str, optional): The error level for the validation response. Defaults to "ERROR".

Yields:
dict: A formatted response indicating whether the validation passed or failed.
"""
yield from super().validate_related_article(error_level=error_level, title=title)

def validate_history_dates(self, error_level="ERROR"):
"""
Validates that the number of related articles matches the number of corresponding corrected dates.

Args:
error_level (str, optional): The error level for the validation response. Defaults to "ERROR".

Yields:
dict: A formatted response indicating whether the validation passed or failed.
"""
history_date_count = len(self.history_dates)
related_article_count = len(self.related_articles)

if history_date_count < related_article_count:
missing_events = expected_history_events - obtained_history_events
if missing_events:
yield format_response(
title="validation related and corrected dates count",
title="exist historical date event for the related-article",
parent="article",
parent_id=None,
parent_article_type=self.article_type,
parent_lang=self.article_lang,
parent_lang=self.xml_tree.find(".").get("{http://www.w3.org/XML/1998/namespace}lang"),
item="related-article",
sub_item="@related-article-type",
validation_type="exist",
is_valid=False,
expected='equal numbers of <related-article type="correction-forward"> and <date type="corrected">',
obtained=f'{related_article_count} <related-article type="correction-forward"> and {history_date_count} <date type="corrected">',
advice='for each <related-article type="correction-forward">, there must be a corresponding <date type="corrected"> in <history>',
expected=' '.join([f'<date date-type="{event}">' for event in missing_events]),
obtained=None,
advice='provide ' + ' '.join([f'<date date-type="{event}">' for event in missing_events]),
data=self.history_dates,
error_level=error_level,
)

def _get_history_dates(self):
return [
date for date in HistoryDates(self.xml_tree).history_dates()
if "corrected" in date.get("history")
]
34 changes: 27 additions & 7 deletions packtools/sps/validation/related_articles.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
from packtools.sps.models import (
related_articles,
article_and_subarticles
)
from packtools.sps.models.v2.related_articles import RelatedArticles
from packtools.sps.models import article_and_subarticles, article_dates
Copy link
Member

@robertatakenaka robertatakenaka Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano se não me engano, foi feita a versão v2 deste modelo... se sim... qual é o mais apropriado para usar. Por favor, coloque em forma de comentário a justificativa por usar um ou o outro. Teoricamente se foi necessário criar o v2, teria que usar o v2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertatakenaka para esse caso o módulo foi renomeado de dates para article_dates e não está em v2 mas é o módulo novo.


from packtools.sps.validation.exceptions import ValidationRelatedArticleException
from packtools.sps.validation.utils import format_response


class RelatedArticlesValidation:
def __init__(self, xmltree):
self.related_articles = [related for related in related_articles.RelatedItems(xmltree).related_articles]
self.article_type = article_and_subarticles.ArticleAndSubArticles(xmltree).main_article_type
def __init__(self, xml_tree):
self.related_articles = list(RelatedArticles(xml_tree).related_articles())
self.article_type = article_and_subarticles.ArticleAndSubArticles(xml_tree).main_article_type
self.history_events = list(article_dates.ArticleDates(xml_tree).history_dates_dict)

def related_articles_matches_article_type_validation(self, correspondence_list=None, error_level="ERROR"):
"""
Expand Down Expand Up @@ -136,3 +135,24 @@ def related_articles_doi(self, error_level="ERROR"):
data=related_article,
error_level=error_level
)

def related_article_attributes_validation(self, error_level="ERROR"):
for related_article in self.related_articles:
for attrib in ("related-article-type", "id", "href", "ext-link-type"):
if not related_article[attrib]:
yield format_response(
title='Related article attributes validation',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano gostaria que o título não tivesse validation para deixar mais curto, pois tudo é validation. Em title deixe mais específico que aspecto está sendo validado. Neste caso, use f'Related article {attrib}'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent=related_article.get("parent"),
parent_id=related_article.get("parent_id"),
parent_article_type=related_article.get("parent_article_type"),
parent_lang=related_article.get("parent_lang"),
item='related-article',
sub_item=f'@{attrib}',
validation_type='exist',
is_valid=False,
expected=f"a value for @{attrib}",
obtained=None,
advice=f"Provide a value for @{attrib}",
data=related_article,
error_level=error_level
)
Loading