RO Feedback #626

matyaskopp · 2023-03-24T09:29:00Z

meeting element

extend meeting elements (#parla.term, #parla.sitting)

I haven't found any information about terms or sitting in the meeting elements. This is how other corpora implement it:

ParlaMint/Data/ParlaMint-UA/ParlaMint-UA_2014-12-02-m0.xml

Lines 11 to 13 in 197e5ec

    
           <meeting ana="#parla.term #parla.uni" n="8" corresp="#ВРУ">8</meeting> 
        
           <meeting ana="#parla.session #parla.uni" n="1" corresp="#ВРУ">1</meeting> 
        
           <meeting ana="#parla.sitting #parla.uni" n="2014-12-02" corresp="#ВРУ">2014-12-02</meeting>

I was not able to find term info on Romanian parliament websites - I believe the information is there.
And if a single file contains one sitting, then add sitting identification.

Missing speech content

speech content

In some files there is no speech content:
https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO_2000-09-04-id4959.xml

        <note type="time">Şedinţa a început la ora 15,55.</note>
        <note type="chairman">Lucrările au fost conduse de domnul Ion Diaconescu, preşedintele Camerei Deputaţilor, asistat de domnii Andrei Ioan Chiliman şi Acsinte Gaspar, secretari.</note>
        <note type="speaker">Domnul Ion Diaconescu:</note>
        <u ana="#chair" who="#Ion-Diaconescu" xml:id="ParlaMint-RO_2000-09-04-id4959.u1"/>
        <note type="speaker">Domnul Iuliu Ioan Furo:</note>

but the source contains speech contents:
https://www.cdep.ro/pls/steno/steno2015.stenograma?ids=4959&idl=1#S0

Chairman note type

use narrative or president
According to doc, narrative or president fits better in this case:
https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO_2000-09-04-id4959.xml#L125

        <note type="chairman">Lucrările au fost conduse de domnul Ion Diaconescu, preşedintele Camerei Deputaţilor, asistat de domnii Andrei Ioan Chiliman şi Acsinte Gaspar, secretari.</note>

not recognized notes

notes in text

Notes are in source italics so easy to recognize...

https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO_2000-04-14-id4927.xml#L474

<seg xml:id="ParlaMint-RO_2000-04-14-id4927.u39.seg6">Cine este pentru?(Vociferără în partea dreaptă a sălii).Vă rog să număraţi... Vă rog să ridicaţi mâna, cei care sunteţi pentru acest amendament, să repetăm numărătoarea. Este o confuzie.</seg>

should be: (https://clarin-eric.github.io/ParlaMint/#TEI.vocal)

<seg xml:id="ParlaMint-RO_2000-04-14-id4927.u39.seg6">Cine este pentru? <vocal type="shouting">
    <desc>(Vociferără în partea dreaptă a sălii)</desc>
  </vocal> Vă rog să număraţi... Vă rog să ridicaţi mâna, cei care sunteţi pentru acest amendament, să repetăm numărătoarea. Este o confuzie.</seg>

presence list

presence list is missing status

https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO_2000-04-14-id4927.xml#L510-L513

        <u ana="#regular" who="#Andrei-Ioan-Chiliman" xml:id="ParlaMint-RO_2000-04-14-id4927.u46">
          <seg xml:id="ParlaMint-RO_2000-04-14-id4927.u46.seg1">Achimescu Victor Ştefan</seg>
          <seg xml:id="ParlaMint-RO_2000-04-14-id4927.u46.seg2">Aferăriţei Constantin</seg>
          <seg xml:id="ParlaMint-RO_2000-04-14-id4927.u46.seg3">Afrăsinei Viorica</seg>

corpus timespan

corpus timespan bibl
corpus timespan setting
corpus timespan it would be nice to have it in text content of corpus title too

https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO.xml#L72

        <bibl>
          <title type="main" xml:lang="en">Meeting minutes of the Romanian Parliament</title>
          <title type="main" xml:lang="ro">Stenograme ale şedinţelor din Parlamentul României</title>
          <idno type="URI">http://www.parlament.ro/</idno>
          <date from="2000-02-01" to="2020-11-24">2000-02-01 - 2020-11-24</date>
        </bibl>

https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO.xml#L252

        <setting>
          <name type="city">Bucharest</name>
          <name type="place">Palace of the Parliament</name>
          <date from="2000-02-01" to="2020-11-24"/>
        </setting>

setting element

setting element in root file

root file setting element should correspond to component ones (missing country)

https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO.xml#L249-L253

        <setting>
          <name type="city">Bucharest</name>
          <name type="place">Palace of the Parliament</name>
          <date from="2000-02-01" to="2020-11-24"/>
        </setting>

vs:
https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO_2000-04-14-id4927.xml#L97-L101

        <setting>
          <name type="city">Bucharest</name>
          <name type="country" key="RO">Romania</name>
          <date when="2000-04-14" ana="#parla.sitting">14.04.2000</date>
        </setting>

capitalize surname

dont capitalize surname

https://github.com/romanian-parlamint/ParlaMint/blob/8439dd75ca3c31b89f06bac23eff736a72a6ed6a/Data/ParlaMint-RO/ParlaMint-RO.xml#L384

              <surname>GORGHIU</surname>

should be

              <surname>Gorghiu</surname>

sort component files

sort component files

The component files should be ordered according to the contents' date.

taxonomies

translations
wrong language context - English content in xml:lang="ro"
missing descriptions

The text was updated successfully, but these errors were encountered:

RePierre · 2023-03-26T10:05:28Z

Changed the capitalization of surnames with commit 51787f7.

RePierre · 2023-03-26T11:43:14Z

Sorted component files in commit be08d9a.

RePierre · 2023-03-27T09:36:46Z

Changed note type to narrative with commit 9fe5f43.

RePierre · 2023-03-28T11:12:28Z

Converted notes into more specific elements within segments with commit cc386af.

matyaskopp · 2023-03-28T13:20:30Z

Spaces around notes

spaces around notes inside text

Converted notes into more specific elements within segments with commit cc386af.

You have removed spaces around notes which can cause troubles in tokenization... It can happen that the note is inside the token (= unexpected behaviour of my annotation script).
https://github.com/romanian-parlamint/ParlaMint/blob/cc386afc90e1298cb4f4d79f44d5558949e4eeae/Data/ParlaMint-RO/ParlaMint-RO_2000-04-14-id4927.xml#L472

<seg xml:id="ParlaMint-RO_2000-04-14-id4927.u39.seg6">Cine este pentru?<vocal type="shouting"><desc>(Vociferără în partea dreaptă a sălii).</desc></vocal>Vă <!-- ... --> confuzie.</seg>

Should be:

<seg xml:id="ParlaMint-RO_2000-04-14-id4927.u39.seg6">Cine este pentru? <vocal type="shouting">
  <desc>(Vociferără în partea dreaptă a sălii).</desc>
</vocal> Vă <!-- ... --> confuzie.</seg>

RePierre · 2023-03-28T18:07:02Z

Added spaces around notes with commit 79b08b1.

RePierre · 2023-03-28T18:22:21Z

wrong language context - English content in xml:lang="ro"

Can you please provide an example?

I ran find -type f -name *.xml -exec grep --color=auto -i -nH --null -e lang\=\"ro\" \{\} +, went over all results, and wasn't able to find English content. Maybe I'm missing something?

matyaskopp · 2023-03-28T18:51:14Z

Can you please provide an example?

I ran find -type f -name *.xml -exec grep --color=auto -i -nH --null -e lang\=\"ro\" \{\} +, went over all results, and wasn't able to find English content. Maybe I'm missing something?

Oh, sorry - your <teiCorpus> is in English context:

<teiCorpus xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en" xml:id="ParlaMint-RO">

This is the only corpus that has it. I implicitly expected that it has xml:lang="ro"

To search language context of <term> I now used

java -cp /usr/share/java/saxon.jar net.sf.saxon.Query -xi:off \!method=adaptive -qs:'//*[name()="term" and ./ancestor::*[@xml:lang][1]/@xml:lang="ro"]' -s:ParlaMint-RO/ParlaMint-RO.xml
<term xmlns="http://www.tei-c.org/ns/1.0">Legislatură</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Unități geo-politice sau administrative</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Legislatură națională</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Organizație politică</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Camere</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Parlament bicameral</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Senat</term>
<term xmlns="http://www.tei-c.org/ns/1.0">Camera deputaților</term>

The majority language in teiCorpus is usually English, so you have it correctly according to the documentation:

@xml:lang is also a global attribute and gives the language code of the text content of the element; for the corpus root this does not (just) mean the content of its TEI header, but primarily the textual content of its XIncluded components. The convention is that language of the text content of an element is determined by the value of the first @xml:lang attribute on its ancestor axis. In cases where the content is multilingual, the language code should be of the majority language. When the proportion of the languages is about equal, then the mul code for multiple languages can also be used.

but it is common to have the corpus language...

@TomazErjavec Can be english preserved in teiCorpus here?

RePierre · 2023-03-28T19:16:48Z

Normalized setting element in corpus root file and component files and set corpus span with commit d343920.

Should resolve:

setting element in root file
corpus timespan setting

TomazErjavec · 2023-03-28T20:21:16Z

@TomazErjavec Can be english preserved in teiCorpus here?

In practice I'd much rather not have an exception. So, teiCorpus and TEI should have @xml:lang="ro".
But maybe teiHeader with @xml:lang="en" is legit?

RePierre · 2023-03-29T07:56:04Z

Changed language of the teiCorpus element in commit 548e357.

matyaskopp · 2023-03-29T08:19:30Z

Duplicite person

duplicite person

Every person should have one record in listPerson:
https://github.com/romanian-parlamint/ParlaMint/blob/548e3576054c9067aee43fb2275b879cac9ba806/Data/ParlaMint-RO/ParlaMint-RO.xml#L1306-L1324

          <person xml:id="Augustin-Lucian-Bolcas">
            <persName>
              <forename>Lucian</forename>
              <forename>Augustin</forename>
              <surname>Bolcaș</surname>
            </persName>
            <sex value="M"/>
            <affiliation ana="#RoParl.51" ref="#RoParl" role="member" from="2000-12-15" to="2004-11-30"/>
          </person>
          <person xml:id="Lucian-Augustin-Bolcas">
            <persName>
              <forename>Lucian</forename>
              <forename>Augustin</forename>
              <surname>Bolcaș</surname>
            </persName>
            <sex value="M"/>
            <affiliation ana="#RoParl.51" ref="#RoParl" role="member" from="2000-12-15" to="2004-11-30"/>
            <affiliation ana="#RoParl.52" ref="#RoParl" role="member" from="2004-12-19" to="2008-12-13"/>
          </person>

`Necunoscut Necunoscut` person's name

Necunoscut Necunoscut

first occurence:
https://github.com/romanian-parlamint/ParlaMint/blob/548e3576054c9067aee43fb2275b879cac9ba806/Data/ParlaMint-RO/ParlaMint-RO.xml#L6030

          <person xml:id="Dan-Dumitrescu">
            <persName>
              <forename>Necunoscut</forename>
              <surname>Necunoscut</surname>
            </persName>
            <sex value="U"/>
            <affiliation ana="#RoParl.55" ref="#RoParl" role="member" from="2016-12-21" to="2020-12-20"/>
          </person>

RePierre · 2023-03-29T18:15:59Z

Missing speech content

As suggested by @TomazErjavec, added <gap> elements to the utterances without segments in commit 0082dd3.

RePierre · 2023-04-10T18:20:37Z

Duplicite person

Fixed duplicate person with commit ac9a2bc.

RePierre · 2023-04-11T12:08:59Z

corpus timespan bibl

Included corpus timespan in <bibl> element with commit 70b7fc2.

RePierre · 2023-04-13T18:07:39Z

corpus timespan it would be nice to have it in text content of corpus title too

Included corpus span in corpus subtitle with commit df3879b.

RePierre · 2023-04-13T18:11:56Z

presence list is missing status

As discussed in the meeting on April 12, we cannot provide the presence list in time for this version because this requires changes in the crawlers of the session transcripts. I will try to include this data into a future version of the corpus.

RePierre · 2023-04-27T18:19:43Z

extend meeting elements (#parla.term, #parla.sitting)

Extended meeting elements with term and sitting information with commit 75affa9.

matyaskopp · 2023-05-24T08:09:52Z

include annotated component files

Error: /home/runner/work/ParlaMint/ParlaMint/ParlaMint/Data/ParlaMint-RO/ParlaMint-RO_2015-09-29-id7560.xml:132:189: error: text not allowed here; expected element "gap", "incident", "kinesic", "note", "pb", "s" or "vocal"

@RePierre, you include unannotated files (TEI) in annotated (TEI.ana) root file:
https://github.com/romanian-parlamint/ParlaMint/blob/459b829a1e053df1e22502222324d246be1c9a47/Data/ParlaMint-RO/ParlaMint-RO.ana.xml#L3018-L3027
eg

<xsi:include xmlns:xsi="http://www.w3.org/2001/XInclude" href="ParlaMint-RO_2015-09-29-id7560.xml"/>

should be

<xsi:include xmlns:xsi="http://www.w3.org/2001/XInclude" href="ParlaMint-RO_2015-09-29-id7560.ana.xml"/>

RePierre · 2023-05-24T08:45:47Z

include annotated component files

Included proper component files in commit 90da93b.

matyaskopp · 2023-05-26T08:31:42Z

@RePierre, thanks for the progress.

I have spotted an issue in the TEI.ana version of the files:

wrongly placed notes in the TEI.ana version

notes are placed at the beginning of seg
unannotated text after the first note

Data/ParlaMint-RO/ParlaMint-RO_2015-09-29-id7560.ana.xml:6433:284: error: text not allowed here; expected the element end-tag or element "gap", "incident", "kinesic", "note", "pb", "s" or "vocal"

TEI: (https://github.com/romanian-parlamint/ParlaMint/blob/5f986e2cc79e3f28347c6a655416c7f4f4d57a1c/Data/ParlaMint-RO/ParlaMint-RO_2015-09-29-id7560.xml#L284)

<seg xml:id="ParlaMint-RO_2015-09-29-id7560.u11.seg8">Cred <!--
... 
--> salariile. <vocal type="noise"><desc>(Aplauze.)</desc></vocal> Însă<!--
...
--> toţi.</seg>

TEI.ana:

<seg xml:id="ParlaMint-RO_2015-09-29-id7560.u11.seg8"><vocal type="noise"><desc>(Aplauze.)</desc></vocal> Însă<!--
...
-->toţi.<s xml:id="ParlaMint-RO_2015-09-29-id7560.u11.seg8.1">
  <w xml:id="ParlaMint-RO_2015-09-29-id7560.u11.seg8.1.1" lemma="Cred" pos="Vmip1s" msd="UPosTag=AUX|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin">Cred</w>
<!--... -->
</s>
<!--... -->
</seg>

matyaskopp · 2023-05-26T12:25:41Z

Unrecognized full-paragraph note

"full-paragraph" notes

https://github.com/romanian-parlamint/ParlaMint/blob/a510c149ba04407fe6df77414b3a2aaec6f47022/Data/ParlaMint-RO/ParlaMint-RO_2006-09-18-id6154.xml#L422-L424

  <seg xml:id="ParlaMint-RO_2006-09-18-id6154.u33.seg8">Mulţumesc.</seg>
  <seg xml:id="ParlaMint-RO_2006-09-18-id6154.u33.seg9">(Domnul Valeriu Ştefan Zgonea se îndreaptă spre prezidiu.)</seg>
</u>

should be:

  <seg xml:id="ParlaMint-RO_2006-09-18-id6154.u33.seg8">Mulţumesc.</seg>
</u>
<note type="narrative">(Domnul Valeriu Ştefan Zgonea se îndreaptă spre prezidiu.)</note>

Other occurrences in sample data:

DataForks/ParlaMint-RO/ParlaMint-RO_2006-09-18-id6154.xml:411:          <seg xml:id="ParlaMint-RO_2006-09-18-id6154.u32.seg4">(Domnul Valeriu Ştefan Zgonea părăseşte prezidiul şi se îndreaptă spre tribună.)</seg>
DataForks/ParlaMint-RO/ParlaMint-RO_2006-09-18-id6154.xml:423:          <seg xml:id="ParlaMint-RO_2006-09-18-id6154.u33.seg9">(Domnul Valeriu Ştefan Zgonea se îndreaptă spre prezidiu.)</seg>

matyaskopp · 2023-05-26T12:41:49Z

U+0096 (SPA) Unicode Character

remove <0x0096> character

This character is allowed in ParlaMint, but it causes problems in linguistic annotations, I suggest removing it from the text: https://github.com/romanian-parlamint/ParlaMint/blob/a510c149ba04407fe6df77414b3a2aaec6f47022/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.xml#L148

<seg xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5">După <!--
...
--> urgie � 1940. Dar n-a fost să fie aşa.</seg>

<w xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5.1.29" lemma="�" pos="Ncm--n" msd="UPosTag=NOUN|Definite=Ind|Gender=Masc">�</w>

matyaskopp · 2023-05-26T13:12:27Z

Named entities

named entities contains non-proper names

I guess you are using a model that labels not only named entities from PER/LOC/ORG/MISC set but also DATE and probably other labels. Something like this: https://huggingface.co/dumitrescustefan/bert-base-romanian-ner
And you map all non-proper names to the MISC category, eg

<name type="MISC">
  <w xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5.1.23" lemma="acel" pos="Dd3msr---e" msd="UPosTag=DET|Case=Acc,Nom|Gender=Masc|Number=Sing|Person=3|Position=Prenom|PronType=Dem">acel</w>
  <w xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5.1.24" lemma="an" pos="Ncms-n" msd="UPosTag=NOUN|Definite=Ind|Gender=Masc|Number=Sing">an</w>
</name>

or

<name type="MISC">
  <w xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5.1.30" lemma="1940" pos="Mc-s-d" msd="UPosTag=">1940</w>
</name>

The year 1940 is not a proper name, so it shouldn't be surrounded by <name>. It is better to use <date>
There are two options to solve this

remove named entities that are not proper names (DATETIME, PERIOD, MONEY, QUANTITY, ...)
find inspiration in the CZ corpus and use the proper tags. See mapping: update named-entity elements ufal/ParCzech#95 (comment)

We are under time pressure, so I suggest using option (1) for ParlaMint3.0, and you can possibly improve it in ParlaMint3.1 (create RO special taxonomy, use proper elements and add ana attribute)
@TomazErjavec ??

matyaskopp · 2023-05-26T13:34:28Z

shifted NEs ?

shifted NEs

In this paragraph (ParlaMint-RO_2000-10-24-id4980.u2.seg8.2), NEs seem to be shifted.
https://raw.githubusercontent.com/clarin-eric/ParlaMint/3f2d0a820d31aa7e55b72156089a3450b303e3bc/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.ana.xml
reformated and remove token elements (w and pc)

<s xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg8.2">
atitudinea autorităţilor ucrainene faţă de delegaţiile judeţului Suceava şi
<name type="MISC">Botoşani</name>
, la festivitatea dezvelirii
<name type="LOC">statuii</name>
lui
<name type="LOC">Eminescu</name>
, la Cernăuţi, în ziua de 15 iunie
<name type="LOC">2000</name>
; constrângerile
<name type="MISC">aduse în şcolile româneşti;</name>
coborârea unicului steag românesc de
<name type="MISC">pe</name>
clădirea sediului
<name type="LOC">redacţiei ziarului"</name>
Lumea"
<name type="MISC">;</name>
prezenţa la
<name type="MISC">manifestările româneşti a unor</name>
reprezentanţi gălăgioşi ai organizaţiilor
<name type="MISC">extremiste</name>
ucrainene; oprirea tinerilor etnici români,
<name type="MISC">în</name>
număr de
<name type="PER">200, de</name>
a veni la studii
<name type="MISC">în</name>
România, cu burse din partea statului
<name type="LOC">român</name>
şi altele.
</s>

matyaskopp · 2023-05-26T17:10:58Z

Voci din sală: in utterance

voice from the hall

https://github.com/romanian-parlamint/ParlaMint/blob/a510c149ba04407fe6df77414b3a2aaec6f47022/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.xml#L408-L414

<note type="speaker">Domnul Vasile Lupu:</note>
<u ana="#chair" who="#Vasile-Lupu" xml:id="ParlaMint-RO_2000-10-24-id4980.u37">
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u37.seg1">Să vedem cine îl face. <vocal type="murmuring"><desc>(Rumoare în partea stângă a sălii)</desc></vocal> </seg>
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u37.seg2">Dar, iată, se pare că nu s-a terminat şedinţa Biroului permanent.</seg>
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u37.seg3">Voci din sală:</seg>
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u37.seg4">S-a terminat de mult!</seg>
</u>

should be:

<note type="speaker">Domnul Vasile Lupu:</note>
<u ana="#chair" who="#Vasile-Lupu" xml:id="ParlaMint-RO_2000-10-24-id4980.u37">
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u37.seg1">Să vedem cine îl face. <vocal type="murmuring"><desc>(Rumoare în partea stângă a sălii)</desc></vocal> </seg>
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u37.seg2">Dar, iată, se pare că nu s-a terminat şedinţa Biroului permanent.</seg>
</u>
<note type="speaker">Voci din sală:</note>
<!-- no who attribute, ana is regular - expecting MP interrupting -->
<u ana="#regular" xml:id="ParlaMint-RO_2000-10-24-id4980.u38">
  <seg xml:id="ParlaMint-RO_2000-10-24-id4980.u38.seg1">S-a terminat de mult!</seg>
</u>

matyaskopp · 2023-05-26T17:51:56Z

person - affiliation - organization

parliamentary groups
only one virtual parliamentary group <orgName xml:lang="en" full="yes">Placeholder parliamentary group</orgName>
government

I guess you are aware of this. I just wanted it to be recorded

  INFO[10]  Total number of affiliations with RoParl: 256
  INFO[10]  Total number of affiliations with RoGov: 0
  Error: ERROR[10]  government-role organisation without affiliation: #RoGov
  INFO[10]  Total number of affiliations with RoParl.All: 0
  WARN[10]  parliamentaryGroup-role organisation without affiliation: #RoParl.All
  INFO[12]  Total number of organizations with parliament role: 1
  INFO[12]  Total number of organizations with government role: 1
  INFO[12]  Total number of organizations with parliamentaryGroup role: 1
  INFO[??]  Total number of affiliations 256
  INFO[??]  Total number of NO-role affiliations 0
  INFO[??]  Total number of 'member' role affiliations 256

RePierre · 2023-05-27T07:36:03Z

wrongly placed notes in the TEI.ana version

Fixed with commit 6662ec4.

RePierre · 2023-05-27T14:06:59Z

remove <0x0096> character

Removed in commit 69a116e.

matyaskopp · 2023-05-29T07:05:25Z

strange UPosTag `_` when `Mc-s-d`

UPosTag of digit tokens Mc-s-d

Every token with pos="Mc-s-d" has wrong msd="UPosTag=_".
sample:

<w xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5.1.2" 
   lemma="1990"
   pos="Mc-s-d"
   msd="UPosTag=_">1990</w>

You can fix this with msd="UPosTag=NUM" or msd="UPosTag=NUM|NumForm=Digit"

<w xml:id="ParlaMint-RO_2000-10-24-id4980.u2.seg5.1.2" 
   lemma="1990"
   pos="Mc-s-d"
   msd="UPosTag=NUM|NumForm=Digit">1990</w>

strange UPosTag `_` when `Mc-s-b`

UPosTag of digit tokens Mc-s-b

Here I suggest replacing _ with X

cat DataForks/ParlaMint-RO/ParlaMint-RO_*.ana.xml| grep 'UPosTag=_"' | grep -v 'pos="Mc.s.d"'

<w xml:id="ParlaMint-RO_2006-09-18-id6154.u31.seg3.1.73" lemma="29,4" pos="Mc-s-b" msd="UPosTag=_">29,4</w>
<w xml:id="ParlaMint-RO_2006-09-18-id6154.u31.seg7.1.14" lemma="29,4" pos="Mc-s-b" msd="UPosTag=_">29,4</w>
<w xml:id="ParlaMint-RO_2006-09-18-id6154.u76.seg2.1.1" lemma="Mie" pos="Mc-s-b" msd="UPosTag=_">Mie</w>
<w xml:id="ParlaMint-RO_2006-09-18-id6154.u136.seg18.1.2" lemma="31.III.2006" pos="Mc-s-b" msd="UPosTag=_">31.III.2006</w>
<w xml:id="ParlaMint-RO_2006-09-18-id6154.u153.seg5.1.52" lemma="Secuiesc" pos="Mc-s-b" msd="UPosTag=_">Secuiesc</w>
<w xml:id="ParlaMint-RO_2015-09-29-id7560.u60.seg7.1.18" lemma="207;voturi" pos="Mc-s-b" msd="UPosTag=_">207;voturi</w>
<w xml:id="ParlaMint-RO_2015-10-12-id7569.u48.seg9.1.12" lemma="2003/88" pos="Mc-s-b" msd="UPosTag=_">2003/88</w>
<w xml:id="ParlaMint-RO_2015-10-12-id7569.u96.seg2.2.15" lemma="2002/772" pos="Mc-s-b" msd="UPosTag=_">2002/772</w>
<w xml:id="ParlaMint-RO_2015-10-12-id7569.u156.seg16.1.25" lemma="2007-2013" pos="Mc-s-b" msd="UPosTag=_">2007-2013</w>
<w xml:id="ParlaMint-RO_2018-03-05-id7900.u7.seg11.1.1" lemma="Mie" pos="Mc-s-b" msd="UPosTag=_">Mie</w>
<w xml:id="ParlaMint-RO_2018-03-05-id7900.u45.seg8.1.1" lemma="Mie" pos="Mc-s-b" msd="UPosTag=_">Mie</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u70.seg2.1.34" lemma="30.06.2021" pos="Mc-s-b" msd="UPosTag=_">30.06.2021</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u91.seg2.1.36" lemma="29A" pos="Mc-s-b" msd="UPosTag=_">29A</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u92.seg2.1.40" lemma="29A" pos="Mc-s-b" msd="UPosTag=_">29A</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u92.seg3.1.7" lemma="29A" pos="Mc-s-b" msd="UPosTag=_">29A</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u92.seg6.1.6" lemma="29A" pos="Mc-s-b" msd="UPosTag=_">29A</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u92.seg6.1.47" lemma="29A" pos="Mc-s-b" msd="UPosTag=_">29A</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u92.seg12.1.7" lemma="29A" pos="Mc-s-b" msd="UPosTag=_">29A</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u118.seg6.1.41" lemma="27.548" pos="Mc-s-b" msd="UPosTag=_">27.548</w>
<w xml:id="ParlaMint-RO_2021-10-25-id8335.u126.seg4.1.30" lemma="1.579/2006" pos="Mc-s-b" msd="UPosTag=_">1.579/2006</w>
<w xml:id="ParlaMint-RO_2021-11-09-id8341.u96.seg3.2.49" lemma="1,5°C" pos="Mc-s-b" msd="UPosTag=_">1,5°C</w>

matyaskopp · 2023-05-29T08:26:56Z

No `join` attribute

join="right" is missing in TEI.ana

see documentation: https://clarin-eric.github.io/ParlaMint/#sec-ana-words

TomazErjavec · 2023-09-21T18:08:50Z

As RO won't be a part of 3.1, moving this to "future" milestone.

matyaskopp assigned RePierre Mar 24, 2023

matyaskopp linked a pull request Mar 24, 2023 that will close this issue

Sample for ParlaMint-RO #625

Open

TomazErjavec added this to the ParlaMint 3.1 release milestone Jun 1, 2023

TomazErjavec modified the milestones: ParlaMint 3.1 release, Future Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RO Feedback #626

RO Feedback #626

matyaskopp commented Mar 24, 2023 •

edited

Loading

RePierre commented Mar 26, 2023

RePierre commented Mar 26, 2023

RePierre commented Mar 27, 2023

RePierre commented Mar 28, 2023

matyaskopp commented Mar 28, 2023 •

edited

Loading

RePierre commented Mar 28, 2023

RePierre commented Mar 28, 2023

matyaskopp commented Mar 28, 2023

RePierre commented Mar 28, 2023 •

edited

Loading

TomazErjavec commented Mar 28, 2023

RePierre commented Mar 29, 2023

matyaskopp commented Mar 29, 2023 •

edited

Loading

RePierre commented Mar 29, 2023

RePierre commented Apr 10, 2023

RePierre commented Apr 11, 2023 •

edited

Loading

RePierre commented Apr 13, 2023

RePierre commented Apr 13, 2023

RePierre commented Apr 27, 2023

matyaskopp commented May 24, 2023 •

edited

Loading

RePierre commented May 24, 2023

matyaskopp commented May 26, 2023

matyaskopp commented May 26, 2023

matyaskopp commented May 26, 2023

matyaskopp commented May 26, 2023

matyaskopp commented May 26, 2023

matyaskopp commented May 26, 2023 •

edited

Loading

matyaskopp commented May 26, 2023

RePierre commented May 27, 2023

RePierre commented May 27, 2023

matyaskopp commented May 29, 2023

matyaskopp commented May 29, 2023

TomazErjavec commented Sep 21, 2023

RO Feedback #626

RO Feedback #626

Comments

matyaskopp commented Mar 24, 2023 • edited Loading

meeting element

Missing speech content

Chairman note type

not recognized notes

presence list

corpus timespan

setting element

capitalize surname

sort component files

taxonomies

RePierre commented Mar 26, 2023

RePierre commented Mar 26, 2023

RePierre commented Mar 27, 2023

RePierre commented Mar 28, 2023

matyaskopp commented Mar 28, 2023 • edited Loading

Spaces around notes

RePierre commented Mar 28, 2023

RePierre commented Mar 28, 2023

matyaskopp commented Mar 28, 2023

RePierre commented Mar 28, 2023 • edited Loading

TomazErjavec commented Mar 28, 2023

RePierre commented Mar 29, 2023

matyaskopp commented Mar 29, 2023 • edited Loading

Duplicite person

Necunoscut Necunoscut person's name

RePierre commented Mar 29, 2023

RePierre commented Apr 10, 2023

RePierre commented Apr 11, 2023 • edited Loading

RePierre commented Apr 13, 2023

RePierre commented Apr 13, 2023

RePierre commented Apr 27, 2023

matyaskopp commented May 24, 2023 • edited Loading

RePierre commented May 24, 2023

matyaskopp commented May 26, 2023

wrongly placed notes in the TEI.ana version

matyaskopp commented May 26, 2023

Unrecognized full-paragraph note

matyaskopp commented May 26, 2023

U+0096 (SPA) Unicode Character

matyaskopp commented May 26, 2023

Named entities

matyaskopp commented May 26, 2023

shifted NEs ?

matyaskopp commented May 26, 2023 • edited Loading

Voci din sală: in utterance

matyaskopp commented May 26, 2023

person - affiliation - organization

RePierre commented May 27, 2023

RePierre commented May 27, 2023

matyaskopp commented May 29, 2023

strange UPosTag _ when Mc-s-d

strange UPosTag _ when Mc-s-b

matyaskopp commented May 29, 2023

No join attribute

TomazErjavec commented Sep 21, 2023

matyaskopp commented Mar 24, 2023 •

edited

Loading

matyaskopp commented Mar 28, 2023 •

edited

Loading

RePierre commented Mar 28, 2023 •

edited

Loading

matyaskopp commented Mar 29, 2023 •

edited

Loading

`Necunoscut Necunoscut` person's name

RePierre commented Apr 11, 2023 •

edited

Loading

matyaskopp commented May 24, 2023 •

edited

Loading

matyaskopp commented May 26, 2023 •

edited

Loading

strange UPosTag `_` when `Mc-s-d`

strange UPosTag `_` when `Mc-s-b`

No `join` attribute