Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

fix: (sample) fix split introductions #430

Merged
merged 13 commits into from
Dec 8, 2023
Merged

Conversation

BobBorges
Copy link
Collaborator

Fix incorrectly split introductions. Sample will follow.

Closes #429

@BobBorges

This comment was marked as resolved.

@BobBorges
Copy link
Collaborator Author

mp unit test fails because this branch has new metadata but redetect edits only in 50 files -- this test failing is expected and not a problem to merge into query-metadata branch

@BobBorges BobBorges requested a review from ninpnin December 8, 2023 08:16
@BobBorges
Copy link
Collaborator Author

The classify_join_intros.py file generates some files -- some already tracked
image
and some newly added due to my edits of the script
image
Do we want to track such files? I wouldn't track them (they are generated by the script). If not, I think we should remove those that are tracked and ignore all of them.

@ninpnin
Copy link
Collaborator

ninpnin commented Dec 8, 2023

@BobBorges I agree I think we don't want to track such files.

Also, the merged intros look fine, but they should be formatted on a separate line, like so

<note xml:id="i-TxxAduEjQXG2cCCpZPDJb4" type="speaker">
    Anf. 82 StatsrådetMORGAN JOHANSSON (s):
</note>

There should be a script for doing this.

@ninpnin
Copy link
Collaborator

ninpnin commented Dec 8, 2023

This python snippet should format a protocol so that the text is on a separate line

from pyparlaclarin.refine import format_texts
# [...]
root = format_texts(root)

@BobBorges
Copy link
Collaborator Author

BobBorges commented Dec 8, 2023

Sampled changes

corpus/protocols/199091/prot-199091--005.xml

Diff starting from line 3182

@@ -3251,10 +3182,7 @@
             </seg>
           </u>
           <note xml:id="i-MCos8uDQAyo7qjV5gACLbx" type="speaker">
-            Anf. 69 Civilminister BENGT K Å
-          </note>
-          <note xml:id="i-6jTckTotpCcTPS79gGGipz" type="speaker">
-            JOHANSSON:
+            Anf. 69 Civilminister BENGT K Å JOHANSSON:
           </note>
           <u xml:id="i-dd79e44da648f34f-369" who="Q5885510" next="i-dd79e44da648f34f-370">
             <seg xml:id="i-XioitYEThqFciCJ7iX1zHo">
  • Correct
  • Incorrect

corpus/protocols/199091/prot-199091--038.xml

Diff starting from line 967

@@ -982,10 +967,7 @@
             </seg>
           </u>
           <note xml:id="i-3ThXKDSop3Psjem5414tXE" type="speaker">
-            Anf. 25 Statsrådet LENA HJELM-WALLÉN
-          </note>
-          <note xml:id="i-RXwaK17W3WUQ2MQS1qGQXn" type="speaker">
-            (s):
+            Anf. 25 Statsrådet LENA HJELM-WALLÉN (s):
           </note>
           <u xml:id="i-bdb40db96dd6947e-95" who="Q460919" next="i-bdb40db96dd6947e-96">
             <seg xml:id="i-17tzrGamDxMgAPXrRdcVKT">
  • Correct
  • Incorrect

corpus/protocols/199091/prot-199091--046.xml

Diff starting from line 8773

@@ -8785,10 +8773,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-LcyZE6KxtjBGpGu2VZBMX5">
-            Anf. 54 MARGÓ INGVARDSSON (v)
-          </note>
-          <note xml:id="i-JtkJeRmFkLdVJTZSSMEpFS">
-            replik:
+            Anf. 54 MARGÓ INGVARDSSON (v) replik:
           </note>
           <u xml:id="i-bfa1343abd46653c-998" who="Q4955783" next="i-bfa1343abd46653c-999">
             <seg xml:id="i-E8WDaMP4Th7qNJJuWf6JYV">
  • Correct
  • Incorrect

corpus/protocols/199091/prot-199091--049.xml

Diff starting from line 3030

@@ -3102,10 +3030,7 @@
             </seg>
           </u>
           <note xml:id="i-3Kvobihh1Mzh1YgojLGmjp" type="speaker">
-            Anf. 68 Socialminister INGELA THALÉN
-          </note>
-          <note xml:id="i-H99iic5n2pKW97VbdgyxXE" type="speaker">
-            (s):
+            Anf. 68 Socialminister INGELA THALÉN (s):
           </note>
           <u xml:id="i-8de83981ccaa5eee-347" who="unknown">
             <seg xml:id="i-D1cwDAxBeZsv1P9o5fAP6K">
  • Correct
  • Incorrect

corpus/protocols/199091/prot-199091--061.xml

Diff starting from line 4700

@@ -4838,10 +4700,7 @@
             </seg>
           </u>
           <note xml:id="i-TkkdAEQKwEcGkDC6zBoUGB" type="speaker">
-            Anf. 123 Kommunikationsminister GEORG
-          </note>
-          <note type="speaker" xml:id="i-4qqRFAb1R2AKikDzEoS5Tg">
-            ANDERSSON (s):
+            Anf. 123 Kommunikationsminister GEORG ANDERSSON (s):
           </note>
           <u xml:id="i-bc46c100e6649aa9-524" who="Q5554537" next="i-H2UVm2yCsbFCTWJM9v8WCm">
             <seg xml:id="i-StJBeuiPjY5rcs3MwXDurz">
  • Correct
  • Incorrect

corpus/protocols/199091/prot-199091--082.xml

Diff starting from line 5243

@@ -5252,10 +5243,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-UDLSSsWcdbKtqtnK8ssUgs">
-            Anf. 38 INGRID HEMMINGSSON (m)
-          </note>
-          <note xml:id="i-7YBVSRL59ymVYkGFKCvoNq">
-            replik:
+            Anf. 38 INGRID HEMMINGSSON (m) replik:
           </note>
           <u xml:id="i-616ea1ec335247e8-583" who="Q4953552" next="i-616ea1ec335247e8-584">
             <seg xml:id="i-2GxjSyXUqJufCcpcbu2bQ2">
  • Correct
  • Incorrect

corpus/protocols/199192/prot-199192--025.xml

Diff starting from line 5489

@@ -5528,10 +5489,7 @@
             </seg>
           </u>
           <note xml:id="i-MzyJtqxxUyk9bBP53TjH68" type="speaker">
-            Anf. 74 Statsrådet REIDUNN LAURÉN
-          </note>
-          <note xml:id="i-8bPuJCyYf2VwABPT2NBMWG">
-            (--):
+            Anf. 74 Statsrådet REIDUNN LAURÉN (--):
           </note>
           <u xml:id="i-ce61a3fdfd51a9f4-652" who="Q4961261" next="i-ce61a3fdfd51a9f4-653">
             <seg xml:id="i-KXxUMShspaMGrbaFEfpH6m">
  • Correct
  • Incorrect

corpus/protocols/199192/prot-199192--027.xml

Diff starting from line 2776

@@ -2809,10 +2776,7 @@
             16 § Svar på fråga 1991/92:100 om pensionsåldern för yrkesofficerare
           </note>
           <note xml:id="i-Q88dauuuMz1Kc8TVthnVbz" type="speaker">
-            Anf. 72 Försvarsminister ANDERS BJÖRCK
-          </note>
-          <note xml:id="i-Mdf1Gt4uixZn4TXXUVztFX" type="speaker">
-            (m):
+            Anf. 72 Försvarsminister ANDERS BJÖRCK (m):
           </note>
           <u xml:id="i-642e2532fc5a2f10-298" who="Q490785" next="i-642e2532fc5a2f10-299">
             <seg xml:id="i-8TbW9itq7UTYzC57VF37zf">
  • Correct
  • Incorrect

corpus/protocols/199192/prot-199192--037.xml

Diff starting from line 4246

@@ -4408,10 +4246,7 @@
             </seg>
           </u>
           <note xml:id="i-83wjur3nauMuEEMNDpZjeT" type="speaker">
-            Anf. 112 Arbetsmarknadsminister BÖRJE
-          </note>
-          <note type="speaker" xml:id="i-L4XQ9U7BJNNJ6qXtMYua7A">
-            HÖRNLUND (c):
+            Anf. 112 Arbetsmarknadsminister BÖRJE HÖRNLUND (c):
           </note>
           <u xml:id="i-02adb08626132d2e-492" who="Q5005171" next="i-02adb08626132d2e-493">
             <seg xml:id="i-NsxhzkcPKdNtGC56d1Wjup">
  • Correct
  • Incorrect

corpus/protocols/199192/prot-199192--044.xml

Diff starting from line 11196

@@ -11313,10 +11196,7 @@
             </seg>
           </u>
           <note xml:id="i-Ajvq3dpb2XesEfH3GcxYEj" type="speaker">
-            Anf. 178 HOLGER GUSTAFSSON (kds)
-          </note>
-          <note xml:id="i-3zUHK7Vqwxuhb7yXrZJ7gC">
-            replik:
+            Anf. 178 HOLGER GUSTAFSSON (kds) replik:
           </note>
           <u xml:id="i-c526af2de47342e8-1265" who="Q5777750" next="i-c526af2de47342e8-1266">
             <seg xml:id="i-SitaSipX5aaC5wExR4Q3iR">
  • Correct
  • Incorrect

corpus/protocols/199192/prot-199192--049.xml

Diff starting from line 7118

@@ -7142,10 +7118,7 @@
             </seg>
           </u>
           <note xml:id="i-KCK4r7RLkzjUjtZwCZbCWS" type="speaker">
-            Anf. 68 Kulturminister BIRGIT FRIGGEBO
-          </note>
-          <note xml:id="i-3igg25GLary2XwBGMvrnHe" type="speaker">
-            (fp):
+            Anf. 68 Kulturminister BIRGIT FRIGGEBO (fp):
           </note>
           <u xml:id="i-cdb56f17e800ec07-822" who="Q4916267" next="i-cdb56f17e800ec07-823">
             <seg xml:id="i-EFzDnp4YtK648VEMzUK8A1">
  • Correct
  • Incorrect

corpus/protocols/199192/prot-199192--054.xml

Diff starting from line 2312

@@ -2333,10 +2312,7 @@
             </seg>
           </u>
           <note xml:id="i-Ag5J8svN5Mq2MvxWLcCD3U" type="speaker">
-            Anf. 42 Arbetsmarknadsminister BÖRJE
-          </note>
-          <note type="speaker" xml:id="i-VGVQR2634A7tDLePotJkYc">
-            HÖRNLUND (c):
+            Anf. 42 Arbetsmarknadsminister BÖRJE HÖRNLUND (c):
           </note>
           <u xml:id="i-6850a760cde7e2fe-247" who="Q5005171">
             <seg xml:id="i-Q3sAVbrvyfh8uc9QDwL9v8">
  • Correct
  • Incorrect

corpus/protocols/199293/prot-199293--025.xml

Diff starting from line 11932

@@ -12286,10 +11932,7 @@
             </seg>
           </u>
           <note xml:id="i-Xxej9XxrnvvANvt68xWvs4" type="speaker">
-            Anf. 325 Försvarsminister ANDERS BJÖRCK
-          </note>
-          <note xml:id="i-VgNQB9ANbP8s4z2SRjoqXm" type="speaker">
-            (m):
+            Anf. 325 Försvarsminister ANDERS BJÖRCK (m):
           </note>
           <u xml:id="i-491450da720c9f9c-1343" who="Q490785" next="i-491450da720c9f9c-1344">
             <seg xml:id="i-M5PjFKJ2wo7Bnj3X44S5g5">
  • Correct
  • Incorrect

corpus/protocols/199293/prot-199293--035.xml

Diff starting from line 5416

@@ -5497,10 +5416,7 @@
             </seg>
           </u>
           <note xml:id="i-Vkkq3Y94RCtNiP9CGYxyVB" type="speaker">
-            Anf. 124 Statsrådet REIDUNN LAURÉN (-
-          </note>
-          <note xml:id="i-DgWRUYysnMXQoMZKgVkrY4">
-            ):
+            Anf. 124 Statsrådet REIDUNN LAURÉN (-):
           </note>
           <u xml:id="i-7ebf2647590d4903-614" who="Q4961261" next="i-7ebf2647590d4903-615">
             <seg xml:id="i-KVhjZKx2YVmbK77zBTuXfN">
  • Correct
  • Incorrect

corpus/protocols/199293/prot-199293--096.xml

Diff starting from line 11824

@@ -11872,10 +11824,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-Cf6rbW3JRNY9XgSWyKxZ6j">
-            Anf. 126 CHRISTEL ANDERBERG (m)
-          </note>
-          <note xml:id="i-QxnoN5pVUoZBdW9vUJNnQc">
-            replik:
+            Anf. 126 CHRISTEL ANDERBERG (m) replik:
           </note>
           <u xml:id="i-7803e5b350c80f73-1385" who="Q4935587" next="i-7803e5b350c80f73-1386">
             <seg xml:id="i-D4nrHu9UPv7c411EoKzaZq">
  • Correct
  • Incorrect

corpus/protocols/199293/prot-199293--115.xml

Diff starting from line 2459

@@ -2501,10 +2459,7 @@
             </seg>
           </u>
           <note xml:id="i-XRsQPriSu5b44sJ65Curxe" type="speaker">
-            Anf. 57 Statsrådet REIDUNN LAURÉN
-          </note>
-          <note xml:id="i-QSkDJMx2Rc2dNbHHdau2BX">
-            (-):
+            Anf. 57 Statsrådet REIDUNN LAURÉN (-):
           </note>
           <u xml:id="i-cd9d93bbf176642f-264" who="Q4961261" next="i-cd9d93bbf176642f-265">
             <seg xml:id="i-GbbquBRKhDiSwwc4cVVr6Q">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--024.xml

Diff starting from line 2907

@@ -2991,10 +2907,7 @@
             14 § Svar på fråga 1993/94:131 om de funktionshindrade och lönebidragen
           </note>
           <note xml:id="i-7wWFM7qbXVqh8VmWpsP14B" type="speaker">
-            Anf. 60 Arbetsmarknadsminister BÖRJE
-          </note>
-          <note type="speaker" xml:id="i-Kzckj3kktb7227LwdJSyxc">
-            HÖRNLUND (c):
+            Anf. 60 Arbetsmarknadsminister BÖRJE HÖRNLUND (c):
           </note>
           <u xml:id="i-464604d7a9edea4d-333" who="Q5005171" next="i-464604d7a9edea4d-334">
             <seg xml:id="i-PoZP9RAWBo6L3rqJKfzTqR">
  • Correct
  • Incorrect

Diff starting from line 444

@@ -453,10 +444,7 @@
             Turkiet m.m.
           </note>
           <note xml:id="i-D8sShZrzYB8bcDb3JSxEqD" type="speaker">
-            Anf. 7 Utrikesminister MARGARETHA AF
-          </note>
-          <note type="speaker" xml:id="i-Q7ZgoeqtN7zTbaaPFQncGA">
-            UGGLAS (m):
+            Anf. 7 Utrikesminister MARGARETHA AF UGGLAS (m):
           </note>
           <u xml:id="i-464604d7a9edea4d-41" who="Q455820" next="i-464604d7a9edea4d-42">
             <seg xml:id="i-Dbx9fKvZcdNEMnnWt4Z3cM">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--046.xml

Diff starting from line 1859

@@ -1874,10 +1859,7 @@
             (Applåder)
           </note>
           <note xml:id="i-L7N8xSPXUY2KqM9ksuN4KH" type="speaker">
-            Anf. 16 IAN WACHTMEISTER (nyd)
-          </note>
-          <note xml:id="i-HwsnrNhPKdF1yjdUe4K2J3">
-            replik:
+            Anf. 16 IAN WACHTMEISTER (nyd) replik:
           </note>
           <u xml:id="i-1b935445cfa07fb5-237" who="Q5983177" next="i-1b935445cfa07fb5-238">
             <seg xml:id="i-HQnvULjcfnhNpDZdfsep8C">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--059.xml

Diff starting from line 3085

@@ -3115,10 +3085,7 @@
             </seg>
           </u>
           <note xml:id="i-4xaL9GVHNk1HgoMfqLHReC" type="speaker">
-            ANf. 35 MARGARETA WINBERG (s)
-          </note>
-          <note xml:id="i-LSm5J1wJqgNRf6453YzcWd">
-            replik:
+            ANf. 35 MARGARETA WINBERG (s) replik:
           </note>
           <u xml:id="i-1f62ca39fc15af58-399" who="Q3430022" next="i-1f62ca39fc15af58-400">
             <seg xml:id="i-PvSzYZF4cHAgaRC1dfw5RT">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--073.xml

Diff starting from line 9690

@@ -9717,10 +9690,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-P1rZMFrrcZ5UzZrBMyPnrV">
-            Anf. 100 IAN WACHTMEISTER (nyd)
-          </note>
-          <note xml:id="i-Dy4GjNMJnZq3zMAKLYoZT2">
-            replik:
+            Anf. 100 IAN WACHTMEISTER (nyd) replik:
           </note>
           <u xml:id="i-854eb82c676d5b32-1026" who="Q5983177" next="i-854eb82c676d5b32-1027">
             <seg xml:id="i-PpeL1oSzQBhWqWzFiLjnFX">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--086.xml

Diff starting from line 5355

@@ -5490,10 +5355,7 @@
             </seg>
           </u>
           <note xml:id="i-X9LrvQHh5cQP249aQ2sZDC" type="speaker">
-            Anf. 119 Utbildningsminister PER UNCKEL
-          </note>
-          <note xml:id="i-3erHixQe3SkyUSUj2g7bkJ" type="speaker">
-            (m):
+            Anf. 119 Utbildningsminister PER UNCKEL (m):
           </note>
           <u xml:id="i-1a1ccafdd155ad66-610" who="Q1830351" next="i-1a1ccafdd155ad66-611">
             <seg xml:id="i-VsiL1SspYjx4rKNAmaNHRW">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--114.xml

Diff starting from line 2832

@@ -2913,10 +2832,7 @@
             10 § Svar på frågorna 1993/94:530 och 562 om förvaring av kärnbränsleavfall
           </note>
           <note xml:id="i-W6q3NR9rF9vhAGMRRWgp7E" type="speaker">
-            Anf. 61 Miljöminister OLOF JOHANSSON
-          </note>
-          <note xml:id="i-FQvq62qJ8TsAEcU2NXauQh" type="speaker">
-            (c):
+            Anf. 61 Miljöminister OLOF JOHANSSON (c):
           </note>
           <u xml:id="i-a54247adccbbcccc-326" who="Q2021126" next="i-a54247adccbbcccc-327">
             <seg xml:id="i-5HGeHZ9PZ8sigyGJKQ9Rmg">
  • Correct
  • Incorrect

corpus/protocols/199394/prot-199394--120.xml

Diff starting from line 10849

@@ -10900,10 +10849,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-R8YM3t6AL7yNXv21qx2yQR">
-            Anf. 91 LENNART HEDQUIST (m)
-          </note>
-          <note xml:id="i-7PLhpjDbgxhgWeBqEsE36J">
-            replik:
+            Anf. 91 LENNART HEDQUIST (m) replik:
           </note>
           <u xml:id="i-436916f7ef21a445-1189" who="Q5796375" next="i-436916f7ef21a445-1190">
             <seg xml:id="i-8L1HRgZxVwL94RUUsFtfKU">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--024.xml

Diff starting from line 4514

@@ -4517,10 +4514,7 @@
             </seg>
           </u>
           <note xml:id="i-VJXKgiyeBKXVQZvQCYdijV" type="speaker">
-            Anf. 64 Näringsminister STEN HECKSCHER
-          </note>
-          <note xml:id="i-6ZDfREyPYFhsCM4CFTobef" type="speaker">
-            (s):
+            Anf. 64 Näringsminister STEN HECKSCHER (s):
           </note>
           <pb facs="http://data.riksdagen.se/fil/8782DFC3-2BA2-4F70-AB58-DDCBBE5C9398#page=44"/>
           <u xml:id="i-8fed722b0750662f-513" who="Q4126210" next="i-8fed722b0750662f-514">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--046.xml

Diff starting from line 18898

@@ -18952,10 +18898,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-3B6YiWcHpXksLtxunu5v96">
-            Anf. 166 MICHAEL STJERNSTRÖM (kds)
-          </note>
-          <note xml:id="i-P6Ys8gLfGY3rzy3PmBQ2m4">
-            replik:
+            Anf. 166 MICHAEL STJERNSTRÖM (kds) replik:
           </note>
           <u xml:id="i-c6ffa3b287edec46-1913" who="Q6190751" next="i-c6ffa3b287edec46-1914">
             <seg xml:id="i-V5vSUZoDPGuBHM2coAD174">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--070.xml

Diff starting from line 2732

@@ -2750,10 +2732,7 @@
             </seg>
           </u>
           <note xml:id="i-7YGzfLA8hPaFrWSZHksH43" type="speaker">
-            Anf. 23 Utrikesminister LENA HJELM-
-          </note>
-          <note xml:id="i-HisNUioMEBoTDNLgiVtj4E" type="speaker">
-            WALLÉN (s)
+            Anf. 23 Utrikesminister LENA HJELM-WALLÉN (s)
           </note>
           <u xml:id="i-22d503e0a0859781-296" who="unknown" next="i-22d503e0a0859781-297">
             <seg xml:id="i-vM9PiyVeS6AGwFcXJ7NzX">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--082.xml

Diff starting from line 4284

@@ -4287,10 +4284,7 @@
             </seg>
           </u>
           <note xml:id="i-QND7WBNVmczbAM4NdhfX1v" type="speaker">
-            Anf. 46 Statsminister INGVAR
-          </note>
-          <note xml:id="i-ubEEXfbbf1j8E9G6d8w92" type="speaker">
-            CARLSSON (s)
+            Anf. 46 Statsminister INGVAR CARLSSON (s)
           </note>
           <u xml:id="i-519b2b123837f0d8-393" who="Q53740" next="i-519b2b123837f0d8-394">
             <seg xml:id="i-Hk58ELB4GtwmvEkDt4NuBF">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--111.xml

Diff starting from line 718

@@ -748,10 +718,7 @@
             4 § Svar på fråga 1994/95:515 om farliga vägstolpar
           </note>
           <note xml:id="i-HhhGrGg2o3y3SVKm9A4sRq" type="speaker">
-            Anf. 19 Kommunikationsminister INES
-          </note>
-          <note xml:id="i-W8yCwgSKHB7NsooavsmufM" type="speaker">
-            UUSMANN (s)
+            Anf. 19 Kommunikationsminister INES UUSMANN (s)
           </note>
           <u xml:id="i-a93c4ea845db2ec3-62" who="Q4984124" next="i-a93c4ea845db2ec3-63">
             <seg xml:id="i-C7LeRxHiGQH16zn7AFJwsU">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--113.xml

Diff starting from line 7337

@@ -7442,10 +7337,7 @@
             </seg>
           </u>
           <note xml:id="i-7UZ95RTYPgYnuixCQjc8Eq" type="speaker">
-            Anf. 97 Utrikesminister LENA HJELM-
-          </note>
-          <note xml:id="i-DCW1KugutH2EZE1ULBWwH7" type="speaker">
-            WALLÉN (s)
+            Anf. 97 Utrikesminister LENA HJELM-WALLÉN (s)
           </note>
           <pb facs="http://data.riksdagen.se/fil/E69C8960-86E0-4028-A871-BFE394D6D225#page=71"/>
           <u xml:id="i-099555037568b315-740" who="unknown" next="i-099555037568b315-741">
  • Correct
  • Incorrect

corpus/protocols/199495/prot-199495--115.xml

Diff starting from line 17722

@@ -17833,10 +17722,7 @@
             </seg>
           </u>
           <note xml:id="i-KSonqF2xizGVkXcJfvLKEd" type="speaker">
-            Anf. 186 BRITT-MARIE DANESTIG-
-          </note>
-          <note xml:id="i-SiStejWTQZbb7u1sgVRgmc" type="speaker">
-            OLOFSSON (v) replik
+            Anf. 186 BRITT-MARIE DANESTIG-OLOFSSON (v) replik
           </note>
           <u xml:id="i-6ae134e993875446-1764" who="unknown" next="i-6ae134e993875446-1765">
             <seg xml:id="i-Y4GAUaGiiDeiQppyrKARmk">
  • Correct
  • Incorrect

corpus/protocols/199596/prot-199596--027.xml

Diff starting from line 12413

@@ -12428,10 +12413,7 @@
             </seg>
           </u>
           <note xml:id="i-FwZb43G696mjNAUHxh1C8d" type="speaker">
-            Anf. 169 BRITT-MARIE DANESTIG-
-          </note>
-          <note xml:id="i-2Ei1pAuFkL8VsA1mR7sxRB" type="speaker">
-            OLOFSSON (v) replik
+            Anf. 169 BRITT-MARIE DANESTIG-OLOFSSON (v) replik
           </note>
           <u xml:id="i-6ae12adf3bf71d7c-1245" who="unknown" next="i-6ae12adf3bf71d7c-1246">
             <seg xml:id="i-2NtNtscShx9hC9X5eFLEBJ">
  • Correct
  • Incorrect

corpus/protocols/199596/prot-199596--115.xml

Diff starting from line 3890

@@ -3896,10 +3890,7 @@
             </seg>
           </u>
           <note xml:id="i-LCDPJMQ5njuy8bNmzSEFDt" type="speaker">
-            Anf. 53 Finansminister ERIK ÅSBRINK (s)
-          </note>
-          <note xml:id="i-UZ4r6h4rJKv58VVo9sr4pS">
-            replik
+            Anf. 53 Finansminister ERIK ÅSBRINK (s) replik
           </note>
           <u xml:id="i-f9ccb2b908c984cb-445" who="Q5388933" next="i-f9ccb2b908c984cb-446">
             <seg xml:id="i-4VPid76rKTUEMSnJ1h7che">
  • Correct
  • Incorrect

corpus/protocols/199798/prot-199798--029.xml

Diff starting from line 3175

@@ -3187,10 +3175,7 @@
             31 om studiemedel för studier utomlands
           </note>
           <note xml:id="i-CNFSXK1bNWLhcbzkoyBP1v" type="speaker">
-            Anf. 39 Utbildningsminister CARL
-          </note>
-          <note xml:id="i-MYBFbYP7aZGUnnXBU7za8r" type="speaker">
-            THAM (s):
+            Anf. 39 Utbildningsminister CARL THAM (s):
           </note>
           <u xml:id="i-5ca5387a9130b39d-319" who="Q6206776" next="i-5ca5387a9130b39d-320">
             <seg xml:id="i-NdyxrGc1Pd8PGFVCi1MLMM">
  • Correct
  • Incorrect

corpus/protocols/199798/prot-199798--046.xml

Diff starting from line 12176

@@ -12233,10 +12176,7 @@
             </seg>
           </u>
           <note xml:id="i-E19ry8Cs2eH2999Z3vXv7v" type="speaker">
-            Anf. 135 BRITT-MARIE DANESTIG (v)
-          </note>
-          <note xml:id="i-43CYuZMNU5rmLfWg1Bzpi3">
-            replik:
+            Anf. 135 BRITT-MARIE DANESTIG (v) replik:
           </note>
           <u xml:id="i-855ce8045999c393-1308" who="Q4944157" next="i-855ce8045999c393-1309">
             <seg xml:id="i-AHCNkefF5GygqrkwvxSyMU">
  • Correct
  • Incorrect

corpus/protocols/199798/prot-199798--107.xml

Diff starting from line 15615

@@ -15642,10 +15615,7 @@
             </seg>
           </u>
           <note type="speaker" xml:id="i-CRYB9Rj16mU7utacC3Mcup">
-            Anf. 196 MARIANNE ANDERSSON (c)
-          </note>
-          <note xml:id="i-3Zus61w4svaVDNTW4ag1Wa">
-            replik:
+            Anf. 196 MARIANNE ANDERSSON (c) replik:
           </note>
           <pb facs="http://data.riksdagen.se/fil/5B39D5AF-5369-476E-82EC-90A94641BF34#page=147"/>
           <u xml:id="i-a314fe48c3b4efa7-1560" who="Q4935892">
  • Correct
  • Incorrect

corpus/protocols/199899/prot-199899--101.xml

Diff starting from line 2962

@@ -2965,10 +2962,7 @@
             </seg>
           </u>
           <note xml:id="i-69RghLHY26jCXwhVxegDm4" type="speaker">
-            Anf. 28 Finansminister BOSSE RING-
-          </note>
-          <note xml:id="i-PaferMJif9P9DpxvfNKefd" type="speaker">
-            HOLM (s):
+            Anf. 28 Finansminister BOSSE RINGHOLM (s):
           </note>
           <u xml:id="i-fd1c663eb7b07f04-293" who="Q321595" next="i-fd1c663eb7b07f04-294">
             <seg xml:id="i-THNsXzdQWGGTjbaJ74NgmJ">
  • Correct
  • Incorrect

corpus/protocols/19992000/prot-19992000--025.xml

Diff starting from line 4336

@@ -4366,10 +4336,7 @@
             </seg>
           </u>
           <note xml:id="i-Pe4hLBz76ifSKYTLKST5f1" type="speaker">
-            Anf. 69 Statsminister GÖRAN PERS-
-          </note>
-          <note xml:id="i-9MRoRtcp8vnJaWn7iuoqHH" type="speaker">
-            SON (s):
+            Anf. 69 Statsminister GÖRAN PERSSON (s):
           </note>
           <u xml:id="i-c0739e3cd802cabb-452" who="Q53747" next="i-c0739e3cd802cabb-453">
             <seg xml:id="i-TeQUaBF6zL8b7mrqMLkK9u">
  • Correct
  • Incorrect

corpus/protocols/19992000/prot-19992000--044.xml

Diff starting from line 9854

@@ -9854,10 +9854,7 @@
             </seg>
           </u>
           <note xml:id="i-DakAjDB5rRxXUqw1RUJyYc" type="speaker">
-            Anf. 123 ESTER LINDSTEDT-STAAF (kd)
-          </note>
-          <note xml:id="i-9CrWMMj98v6T8jaYoh5NXR">
-            replik:
+            Anf. 123 ESTER LINDSTEDT-STAAF (kd) replik:
           </note>
           <u xml:id="i-42e01a4d28b9d50d-1008" who="Q4962937" next="i-42e01a4d28b9d50d-1009">
             <seg xml:id="i-E1B2cdSPzoAUr4W27H1YkD">
  • Correct
  • Incorrect

corpus/protocols/200001/prot-200001--035.xml

Diff starting from line 6335

@@ -6350,10 +6335,7 @@
             och jämställdhet
           </note>
           <note xml:id="i-Hbbwsh2m281UDz9wWToysS" type="speaker">
-            Anf. 81 Näringsminister BJÖRN ROSEN-
-          </note>
-          <note xml:id="i-FjBoSb6qHAzbmr7ZTw28xG" type="speaker">
-            GREN (s):
+            Anf. 81 Näringsminister BJÖRN ROSENGREN (s):
           </note>
           <u xml:id="i-1fc242e94e785fc3-683" who="Q3374466" next="i-1fc242e94e785fc3-685">
             <seg xml:id="i-BcEDsC5K9ZGKPhmbuhnaXJ">
  • Correct
  • Incorrect

corpus/protocols/200001/prot-200001--042.xml

Diff starting from line 16149

@@ -16320,10 +16149,7 @@
             </seg>
           </u>
           <note xml:id="i-71eQh71kMRQcHwFKjkx6r5" type="speaker">
-            Anf. 237 ULLA-BRITT HAGSTRÖM (kd)
-          </note>
-          <note xml:id="i-7XikfvrUwddcETTcsydkre">
-            replik:
+            Anf. 237 ULLA-BRITT HAGSTRÖM (kd) replik:
           </note>
           <u xml:id="i-03786a9fea02e888-1751" who="Q4952016" next="i-03786a9fea02e888-1752">
             <seg xml:id="i-SmpPrZ27U1kXw5vBwNB3pD">
  • Correct
  • Incorrect

corpus/protocols/200001/prot-200001--074.xml

Diff starting from line 9980

@@ -10073,10 +9980,7 @@
             organisationer
           </note>
           <note xml:id="i-7oB9L7ea5H6YQqxdQuBZ9Z" type="speaker">
-            Anf. 125 Finansminister BOSSE RING-
-          </note>
-          <note xml:id="i-Mt4hZb1LpxyQCWpaeTQCBA" type="speaker">
-            HOLM (s):
+            Anf. 125 Finansminister BOSSE RINGHOLM (s):
           </note>
           <pb facs="http://data.riksdagen.se/fil/1E52728F-24A1-4D91-A3F0-826D56D73AB3#page=97"/>
           <u xml:id="i-3d21e6756d249933-1039" who="Q321595" next="i-3d21e6756d249933-1040">
  • Correct
  • Incorrect

corpus/protocols/200001/prot-200001--110.xml

Diff starting from line 3358

@@ -3406,10 +3358,7 @@
             </seg>
           </u>
           <note xml:id="i-oecEapAnwBEZxC2gAnxdx" type="speaker">
-            Anf. 39 Näringsminister BJÖRN ROSEN-
-          </note>
-          <note xml:id="i-4FZw1QgqZfTNjRqjimFzBr" type="speaker">
-            GREN (s):
+            Anf. 39 Näringsminister BJÖRN ROSENGREN (s):
           </note>
           <pb facs="http://data.riksdagen.se/fil/F258E0BF-B765-4461-ACD3-85F0AD65D04E#page=30"/>
           <u xml:id="i-10df0e1581037889-367" who="Q3374466" next="i-10df0e1581037889-368">
  • Correct
  • Incorrect

corpus/protocols/200001/prot-200001--116.xml

Diff starting from line 10288

@@ -10381,10 +10288,7 @@
             </seg>
           </u>
           <note xml:id="i-L6P1msgfhh1cRjhB9rGc5g" type="speaker">
-            Anf. 116 Finansminister BOSSE RING-
-          </note>
-          <note xml:id="i-9jMSbrEnkxErzKjxspu3XQ" type="speaker">
-            HOLM (s):
+            Anf. 116 Finansminister BOSSE RINGHOLM (s):
           </note>
           <u xml:id="i-3a62f0c3eb7f6e30-1115" who="Q321595" next="i-3a62f0c3eb7f6e30-1116">
             <seg xml:id="i-VGbhJjGNNsj5w4e6pVPK8r">
  • Correct
  • Incorrect

corpus/protocols/200102/prot-200102--081.xml

Diff starting from line 16755

@@ -16857,10 +16755,7 @@
             </seg>
           </u>
           <note xml:id="i-HcPE4u34TF8vW3UxSW7q8G" type="speaker">
-            Anf. 205 Jordbruksminister MARGARETA
-          </note>
-          <note type="speaker" xml:id="i-9AHjWkzVpPdxRCuaN4FT3m">
-            WINBERG (s) replik:
+            Anf. 205 Jordbruksminister MARGARETA WINBERG (s) replik:
           </note>
           <u xml:id="i-59bf090eedaa9225-1702" who="unknown" next="i-59bf090eedaa9225-1703">
             <seg xml:id="i-A5dsXcsbcG4DxiqwZMXVv5">
  • Correct
  • Incorrect

corpus/protocols/200102/prot-200102--103.xml

Diff starting from line 2220

@@ -2244,10 +2220,7 @@
             </seg>
           </u>
           <note xml:id="i-3XR7dtrt2pZjsKqjhP4Kkd" type="speaker">
-            Anf. 25 RUNAR PATRIKSSON (fp)
-          </note>
-          <note xml:id="i-WhM1tBkJKUqJakf45JgMDr">
-            replik:
+            Anf. 25 RUNAR PATRIKSSON (fp) replik:
           </note>
           <u xml:id="i-dfa73534e1f7095a-239" who="Q6037205" next="i-dfa73534e1f7095a-240">
             <seg xml:id="i-9ceTRwZ5ys3gGH38bejdec">
  • Correct
  • Incorrect

corpus/protocols/200102/prot-200102--120.xml

Diff starting from line 16363

@@ -16402,10 +16363,7 @@
             </seg>
           </u>
           <note xml:id="i-2xZfQfZqRM8EfjabwW1xdt" type="speaker">
-            Anf. 165 Statsrådet INGELA THALÉN (s)
-          </note>
-          <note xml:id="i-DDw3cBFn7zoj2ZXMauvJcn">
-            replik:
+            Anf. 165 Statsrådet INGELA THALÉN (s) replik:
           </note>
           <u xml:id="i-92fb7d85905d7a7a-1731" who="Q4982419" next="i-92fb7d85905d7a7a-1732">
             <seg xml:id="i-2brHxVCPB3rNHKqoBcqHv2">
  • Correct
  • Incorrect

corpus/protocols/200203/prot-200203--109.xml

Diff starting from line 4390

@@ -4411,10 +4390,7 @@
             </seg>
           </u>
           <note xml:id="i-McPu1hSjWJ4WFnQfzUoLVg" type="speaker">
-            Anf. 63 Jordbruksminister ANN-CHRISTIN
-          </note>
-          <note type="speaker" xml:id="i-GAX6bciNbbBnyux7yakhDc">
-            NYKVIST (s):
+            Anf. 63 Jordbruksminister ANN-CHRISTIN NYKVIST (s):
           </note>
           <u xml:id="i-cca020918ee0d32e-454" who="Q547580" next="i-cca020918ee0d32e-455">
             <seg xml:id="i-CEa8gHpKzNmj5jwrVVB2uM">
  • Correct
  • Incorrect

corpus/protocols/200203/prot-200203--111.xml

Diff starting from line 515

@@ -527,10 +515,7 @@
             5 § Svar på interpellation 2002/03:391 om hantverkare inom kulturmiljöområdet
           </note>
           <note xml:id="i-Fa8ADW5JS3WBim2vWkFCL8" type="speaker">
-            Anf. 8 Kulturminister MARITA ULVS-
-          </note>
-          <note xml:id="i-9SCgXCXPyVhe6FPiQfva49" type="speaker">
-            KOG (s):
+            Anf. 8 Kulturminister MARITA ULVSKOG (s):
           </note>
           <u xml:id="i-366f00f18dffb575-46" who="Q3115681" next="i-366f00f18dffb575-47">
             <seg xml:id="i-LZAkjkcxgJUomJQWACvhyN">
  • Correct
  • Incorrect

corpus/protocols/200203/prot-200203--116.xml

Diff starting from line 12728

@@ -12812,10 +12728,7 @@
             </seg>
           </u>
           <note xml:id="i-NRvqWxmpvfeKfaUY3nMqBk" type="speaker">
-            Anf. 176 CATHARINA BRÅKENHI-
-          </note>
-          <note xml:id="i-Hj98bTjQTUmk7VyDuYCCgq" type="speaker">
-            ELM (s):
+            Anf. 176 CATHARINA BRÅKENHIELM (s):
           </note>
           <u xml:id="i-1a9ea5d8a15f1190-1400" who="Q3363818" next="i-1a9ea5d8a15f1190-1401">
             <seg xml:id="i-LYqfCEYmX28bz81R63tCFa">
  • Correct
  • Incorrect

Copy link
Collaborator

@ninpnin ninpnin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ninpnin
Copy link
Collaborator

ninpnin commented Dec 8, 2023

load_metadata should be refactored not to use input/segmentation/join_intros.csv though.

@BobBorges
Copy link
Collaborator Author

load_metadata should be refactored not to use input/segmentation/join_intros.csv though.

@ninpnin Then we also have to remove some things from detect_mps() in pyriksdagen.refine() -- so it seems like redetect was calculating this intro merging on the fly in that function. After we changed the file names, it couldn't handle it anymore and there's the reason for our drop in quality. @MansMeg

@BobBorges
Copy link
Collaborator Author

The sample is OK, lets merge this to the query-metadata branch, and I will fix these issues with load_metadata and detect_mps() in that branch -- I'll actually be running things there. We know the unit test will fail here and it's ok/expected.

@MansMeg
Copy link
Collaborator

MansMeg commented Dec 8, 2023

All correct? I think we should merge this into dev right away instead. I think we should have that as the principle. Otherwise it will become a mess with a risk that we need to check these edits again when the query branch creates a PR to dev.

If you merge this with dev, then you can merge dev into the query branch. It gives the same results - but we keep the process simpler.

I.e. only do sample qc to dev. What do you think?

@BobBorges
Copy link
Collaborator Author

don't merge do dev -- everything will fail

@BobBorges
Copy link
Collaborator Author

this branch has new metadata,but the redetect hasn't been correctly applied

@BobBorges
Copy link
Collaborator Author

merge to the query-metadata branch, we redetect until we're satisfied, then that one to dev

@BobBorges
Copy link
Collaborator Author

The query branch pr sample will only show edits to the who attrib.

@MansMeg
Copy link
Collaborator

MansMeg commented Dec 8, 2023

Ok! Then we need to handle the edits we now check when merging to dev.

@BobBorges
Copy link
Collaborator Author

These edits are in a commit. When we merge to the query branch and then redetect, the sample to merge that branch into dev will only have edits to the who attribs.

@BobBorges BobBorges merged commit 181fc17 into query-metadata Dec 8, 2023
2 of 3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants