[cwapi] adding named entities and noun chunks counts django stuff #26

jeiranj · 2017-11-03T21:50:37Z

@rmangi @will-horning @rappoport @heavi5ide @alberttoledo

rappoport · 2017-11-03T22:25:04Z

capitolweb/cwapi/models.py

+class SpeakerWordCounts(models.Model):
+    def __str__(self):
+        return ",".join([self.crec_id, self.bioguide_id])
+    bioguide_id = models.CharField(max_length=7, primary_key=True)


Do I understand it correctly that these are named entities and noun chunks within a given document (crec_id) attributed to a particular speaker (bioguide_id)? In that case, should the primary key be a compound key of ('bioguid_id', 'crec_id'). Also, does the data come from the segments right now?

Yes, a single row in this table (or a single instance of this class) contains the noun chunk and named entity counts attributed to a given speaker within a single snippet, or a single document if it is a single-speaker document.

The primary key question is a little tricky. It would need to be a compound of bioguide_id, crec_id and some sequence number in for the attributed segments (it is possible that a person speaks in separate segments within a single document, so crec + bioguide alone may not be unique). We'll also want to make bioguide_id a foreign key (see the legislators models for an example of how to do that in Django's ORM) so we can easily retrieve all the segments/documents for a legislator object via the ORM.

will-horning · 2017-11-05T18:26:14Z

capitolweb/workers/crec_parser.py

@@ -253,4 +254,19 @@ def parse_mods_file(self, mods_file):
            noun_chunks = text_utils.named_entity_dedupe(noun_chunks, named_entity_freqs.keys())
            record['noun_chunks'] = str(Counter(noun_chunks).most_common())

+            if bool(record['speaker_ids']):


Why the bool cast?

[cwapi] adding named entities and noun chunks counts django stuff

e68dfbc

rappoport reviewed Nov 3, 2017

View reviewed changes

will-horning reviewed Nov 5, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cwapi] adding named entities and noun chunks counts django stuff #26

[cwapi] adding named entities and noun chunks counts django stuff #26

jeiranj commented Nov 3, 2017

rappoport Nov 3, 2017

will-horning Nov 5, 2017 •

edited

Loading

will-horning Nov 5, 2017

[cwapi] adding named entities and noun chunks counts django stuff #26

Are you sure you want to change the base?

[cwapi] adding named entities and noun chunks counts django stuff #26

Conversation

jeiranj commented Nov 3, 2017

rappoport Nov 3, 2017

Choose a reason for hiding this comment

will-horning Nov 5, 2017 • edited Loading

Choose a reason for hiding this comment

will-horning Nov 5, 2017

Choose a reason for hiding this comment

will-horning Nov 5, 2017 •

edited

Loading