Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP Auto tagger #2503

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
ca483fe
Update imatrics.py
tcp-bhargav Aug 11, 2023
8f942a8
Update imatrics.py
tcp-bhargav Aug 11, 2023
a6d9fd9
V2: Update imatrics.py
tcp-bhargav Aug 15, 2023
3de0597
Update imatrics.py
tcp-bhargav Aug 16, 2023
6a8ee9c
Update imatrics.py
tcp-bhargav Aug 16, 2023
13af372
Update imatrics.py
cpemichalhorak Aug 22, 2023
2396cbe
Update imatrics.py
tcp-bhargav Sep 10, 2023
fd6618f
Update imatrics.py
tcp-bhargav Sep 11, 2023
f73fcc3
Create semaphore.py
tcp-bhargav Sep 16, 2023
c88c82d
Update semaphore.py
tcp-bhargav Sep 19, 2023
7755b40
Update semaphore.py
tcp-bhargav Sep 20, 2023
d388769
Create semaphore.py
tcp-bhargav Sep 20, 2023
ad38e5c
Create semaphore.py
tcp-bhargav Sep 20, 2023
8a3caca
Update semaphore.py
tcp-bhargav Sep 20, 2023
e94651d
Update semaphore.py
tcp-bhargav Sep 25, 2023
032d41d
Update semaphore.py
tcp-bhargav Sep 25, 2023
1dff98c
Update semaphore.py
tcp-bhargav Sep 25, 2023
eaadbbb
Update semaphore.py
tcp-bhargav Sep 25, 2023
171dc12
Update semaphore.py
tcp-bhargav Sep 25, 2023
9f0843e
Update semaphore.py
tcp-bhargav Sep 25, 2023
6dbe94e
Update semaphore.py
tcp-bhargav Sep 25, 2023
0e30fca
Update semaphore.py
tcp-bhargav Sep 25, 2023
19aaacf
Update __init__.py
tcp-bhargav Sep 25, 2023
c2f1cad
Update semaphore.py
tcp-bhargav Sep 25, 2023
ecd98be
Update semaphore.py
tcp-bhargav Sep 25, 2023
ca28e4e
Update semaphore.py
tcp-bhargav Sep 25, 2023
5709de1
Update semaphore.py
tcp-bhargav Sep 25, 2023
34a8797
Update semaphore.py
tcp-bhargav Sep 26, 2023
8008907
Update semaphore.py
tcp-bhargav Sep 26, 2023
b98efbc
Update semaphore.py
tcp-bhargav Sep 26, 2023
e9eb9ac
Update semaphore.py
tcp-bhargav Sep 26, 2023
40f8965
Update semaphore.py
tcp-bhargav Sep 26, 2023
7f6bfff
Update semaphore.py
tcp-bhargav Sep 26, 2023
938a5be
Update semaphore.py
tcp-bhargav Sep 26, 2023
26facd8
Update semaphore.py
tcp-bhargav Sep 26, 2023
8befa27
Update semaphore.py
tcp-bhargav Sep 26, 2023
a0b521e
Update semaphore.py
tcp-bhargav Sep 26, 2023
4c37c6c
Update semaphore.py
tcp-bhargav Sep 26, 2023
f6cdb34
Update semaphore.py
tcp-bhargav Sep 26, 2023
594914a
Update semaphore.py
tcp-bhargav Sep 26, 2023
4e1866c
Update semaphore.py
tcp-bhargav Sep 27, 2023
5a5b162
Update semaphore.py
tcp-bhargav Sep 27, 2023
3cdb227
Update semaphore.py
tcp-bhargav Sep 27, 2023
4fe741e
Update semaphore.py
tcp-bhargav Sep 27, 2023
2620807
Update semaphore.py
tcp-bhargav Sep 27, 2023
07be371
Update imatrics.py with Michal's Code
tcp-bhargav Oct 5, 2023
0d26218
Update imatrics.py
cpemichalhorak Oct 23, 2023
8ff2226
Update imatrics.py
cpemichalhorak Oct 23, 2023
c76f816
Update imatrics.py
cpemichalhorak Oct 26, 2023
1c9036e
Update imatrics.py
cpemichalhorak Oct 26, 2023
4552674
Update semaphore.py
tcp-bhargav Nov 8, 2023
1dfb71c
Update imatrics.py
tcp-bhargav Nov 15, 2023
b8c73ba
Update semaphore.py
tcp-bhargav Nov 15, 2023
7affaa6
Update semaphore.py
tcp-bhargav Nov 16, 2023
cf2b806
Update __init__.py
tcp-bhargav Nov 29, 2023
1115db9
Update semaphore.py
tcp-bhargav Nov 29, 2023
989d35a
Update semaphore.py
tcp-bhargav Dec 5, 2023
5d45786
Update __init__.py
tcp-bhargav Dec 5, 2023
a01fbfd
Update ninjs_formatter.py
tcp-bhargav Dec 13, 2023
a581b73
Update semaphore.py
tcp-bhargav Dec 13, 2023
503787c
Added Comments for better Reference..
tcp-bhargav Dec 15, 2023
2e358de
Added Comments For Better Reference
tcp-bhargav Dec 15, 2023
a9c0144
Removed a couple print Statements.
tcp-bhargav Dec 15, 2023
6b4457d
imatrics changes reverted.
tcp-bhargav Dec 23, 2023
2a954db
Updated with modifications asked by Petr.
tcp-bhargav Dec 23, 2023
ce532de
Create cp_ninjs_formatter
tcp-bhargav Jan 15, 2024
90277f4
Rename cp_ninjs_formatter to cp_ninjs_formatter.py
tcp-bhargav Jan 15, 2024
e23ada9
Update ninjs_formatter.py Reverted back to the Original.
tcp-bhargav Jan 15, 2024
814ec1f
Update __init__.py -- Reverted to Original Code
tcp-bhargav Jan 15, 2024
c481e60
Update __init__.py with Create Tag in KMM Feature
tcp-bhargav Jan 23, 2024
7aa421a
Update semaphore.py in Formatters to work with ninjs_formatter_2
tcp-bhargav Jan 23, 2024
083e518
Update and rename cp_ninjs_formatter.py to ninjs_formatter_2.py
tcp-bhargav Jan 23, 2024
fbe407a
Update __init__.py. Changed ninjs_formatter import to ninjs_formatter_2
tcp-bhargav Jan 23, 2024
1b635c2
Update ninjs_ftp_formatter.py to work with our ninjs_formatter_2
tcp-bhargav Jan 23, 2024
c6e8ff7
Update vocabularies.json
tcp-bhargav Jan 24, 2024
86f8f82
Update semaphore.py
tcp-bhargav Jan 24, 2024
3711753
Update ninjs_formatter_2.py
tcp-bhargav Jan 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions superdesk/publish/formatters/imatrics.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
from superdesk.text_utils import get_text
from .ninjs_formatter import NINJSFormatter
import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

def format_datetime(date):
return date.isoformat()
Expand All @@ -11,6 +14,10 @@ def can_format(self, format_type, article):
return format_type.lower() == "imatrics" and article.get("type") == "text"

def _transform_to_ninjs(self, article, subscriber, recursive=True):

logger.warning('In Formatter IMatrics. lets log.')
logging.debug("Transforming article to NINJS: %s", article)

return {
"uuid": article["guid"],
"createdTimestamp": format_datetime(article["firstcreated"]),
Expand Down
47 changes: 43 additions & 4 deletions superdesk/publish/formatters/ninjs_formatter.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to do those changes in a custom formatter, either that Semaphore one or some CP specific NINJS.
I think those changes would break some existing integrations like with newshub

Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,16 @@ def get_locale_name(item, language):

def format_cv_item(item, language):
"""Format item from controlled vocabulary for output."""
return filter_empty_vals(
{"code": item.get("qcode"), "name": get_locale_name(item, language), "scheme": item.get("scheme")}
if item.get("scheme") == "subject":

return filter_empty_vals(
{"code": item.get("qcode"), "name": get_locale_name(item, language), "scheme": "http://cv.iptc.org/newscodes/mediatopic/"}
)
else:

return filter_empty_vals(
{"code": item.get("qcode"), "name": get_locale_name(item, language), "scheme": item.get("scheme")}
)


class NINJSFormatter(Formatter):
Expand Down Expand Up @@ -213,8 +220,18 @@ def _transform_to_ninjs(self, article, subscriber, recursive=True):
else:
ninjs["priority"] = 5

if article.get("subject"):
ninjs["subject"] = self._get_subject(article)
# Merging Various Entities into Subjects for ninjs Response
# ---------------------------------------------------------
# This section of the code is responsible for aggregating different entity types
# like 'organisation', 'place', 'event', and 'person' along with 'subject' into
# a single list.


if article.get("subject") or article.get("organisation") or article.get("place") or article.get("event") or article.get("person"):
combined_subjects = (self._get_subject(article) + self._get_organisation(article) +
self._get_place(article) + self._get_event(article) +
self._get_person(article))
ninjs["subject"] = combined_subjects

if article.get("anpa_category"):
ninjs["service"] = self._get_service(article)
Expand Down Expand Up @@ -414,10 +431,32 @@ def _get_genre(self, article):
lang = article.get("language", "")
return [format_cv_item(item, lang) for item in article["genre"]]




def _get_subject(self, article):
"""Get subject list for article."""
return [format_cv_item(item, article.get("language", "")) for item in article.get("subject", [])]

# Updated Code here to fetch Organisations from Article
def _get_organisation(self, article):
return [format_cv_item(item, article.get("language", "")) for item in article.get("organisation", [])]

# Updated Code here to fetch Places from Article
def _get_place(self, article):
"""Get place list for article."""
return [format_cv_item(item, article.get("language", "")) for item in article.get("place", [])]

# Updated Code here to fetch Events from Article
def _get_event(self, article):
"""Get event list for article."""
return [format_cv_item(item, article.get("language", "")) for item in article.get("event", [])]

# Updated Code here to fetch Person from Article
def _get_person(self, article):
"""Get person list for article."""
return [format_cv_item(item, article.get("language", "")) for item in article.get("person", [])]

def _get_service(self, article):
"""Get service list for article.

Expand Down
26 changes: 26 additions & 0 deletions superdesk/publish/formatters/semaphore.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import logging
from superdesk.text_utils import get_text
from .ninjs_formatter import NINJSFormatter
from superdesk.text_checkers.ai.semaphore import Semaphore # Import the Semaphore integration class

logger = logging.getLogger(__name__)

class SemaphoreFormatter(NINJSFormatter):
def can_format(self, format_type, article):
return format_type.lower() == "semaphore" and article.get("type") == "text"

def _transform_to_ninjs(self, article, subscriber, recursive=True):
semaphore = Semaphore() # Initialize the Semaphore integration
formatted_data = {} # Define how you want to format the data for Semaphore

try:
# Example: format the data
formatted_data["uuid"] = article["guid"]
formatted_data["headline"] = get_text(article["headline"])
# Add more formatting logic here

except Exception as e:
logger.error(f"Error formatting data for Semaphore: {str(e)}")
formatted_data = {} # Return an empty dictionary in case of an error

return formatted_data
7 changes: 7 additions & 0 deletions superdesk/publish/transmitters/imatrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,19 @@
from superdesk.publish import register_transmitter
from superdesk.publish.publish_service import PublishService
from superdesk.text_checkers.ai.imatrics import IMatrics
import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

class IMatricsTransmitter(PublishService):
def _transmit(self, queue_item, subscriber):
imatrics = IMatrics(current_app)
item = json.loads(queue_item["formatted_item"])

logger.warning("Logging in Transmitter for IMAtrics")
logging.debug("Transmitting item: %s", item)

imatrics.publish(item)


Expand Down
15 changes: 15 additions & 0 deletions superdesk/publish/transmitters/semaphore.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from flask import current_app, json

from superdesk.publish import register_transmitter
from superdesk.publish.publish_service import PublishService
from superdesk.text_checkers.ai.semaphore import Semaphore # Import the Semaphore integration class

class SemaphoreTransmitter(PublishService):
def _transmit(self, queue_item, subscriber):
semaphore = Semaphore(current_app) # Initialize the Semaphore integration
item = json.loads(queue_item["formatted_item"])
# Modify this part to transmit the item using the Semaphore integration
semaphore.transmit(item)

# Register the Semaphore transmitter
register_transmitter("semaphore", SemaphoreTransmitter(), [])
13 changes: 8 additions & 5 deletions superdesk/text_checkers/ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,13 @@ class AIResource(Resource):
"type": "dict",
"required": True,
"schema": {
"guid": {"type": "string", "required": True},
"guid": {"type": "string", "required": False},
"abstract": {"type": "string", "required": False},
"language": {"type": "string", "required": True},
"headline": {"type": "string", "nullable": True},
"body_html": {"type": "string", "required": True},
"language": {"type": "string", "required": False},
"headline": {"type": "string", "nullable": False},
"slugline": {"type": "string", "required": False},
"searchString": {"type": "string", "required": False},
"body_html": {"type": "string", "required": False},
},
},
"tags": {
Expand Down Expand Up @@ -88,7 +90,8 @@ def create(self, docs, **kwargs):
except KeyError:
raise SuperdeskApiError.notFoundError("{service} service can't be found".format(service=service))

analyzed_data = service.analyze(item, doc.get("tags"))
# analyzed_data = service.analyze(item, doc.get("tags"))
analyzed_data = service.analyze(item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need the previous version here

docs[0].update({"analysis": analyzed_data})
return [0]

Expand Down
5 changes: 3 additions & 2 deletions superdesk/text_checkers/ai/imatrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,10 @@ def search_images(self, items: list) -> list:
data = items
try:
r_data = self._search_images(data)
except Exception:
except Exception as err:
logger.exception(err)
return []
return [image for image in r_data if type(image["imageUrl"]) == str and image["imageUrl"] != ""]
return [image for image in r_data if isinstance(image["imageUrl"], str) and image["imageUrl"] != ""]

def _search_images(self, data, **params):
return self._request_images(
Expand Down
Loading