-
Notifications
You must be signed in to change notification settings - Fork 5
Conversation
depreciates scripts/git-add_QC-sample.sh
This comment was marked as outdated.
This comment was marked as outdated.
organized diff-sampling stuff into subdir
The unit tests are failing? |
its the schema test. some of the 202122 protocols are empty. I found it just before i went home, so not really sure what the cause of that is yet. |
Seems like it captures page divs: |
Also commentSection does not really make sense semantically. I would go with debateSection and otherSection for now. edit: I saw this is the standard in the parlamint. But it hurts my eyes. So i would create our own sections here anyway. Simply because I think we will want to have a more elaborate sectioning further down the lines. |
Also. ParlaMint states that the first note after should be a header, so maybe add that as well? |
ParlaMint is the more restrictive version of the two, a strict subset of ParlaClarin. I think we should use it as a suggestion. In practice: sometimes the header is not available in our data, so I think we shouldn't put too much effort into following that rule. |
I think we should decide on a preliminary idea of how to adjust the divs now and I can implement it before we commit changes to the whole unicameral period. My thoughts:
|
I just talked it over with @ninpnin -- we'll leave the commentSection/debateSection for now. It's easy enough to change later. Parlaclarin, specifies a subtype attribute, so that solves my main issue about classifying types of debates. I see one check mark on an incorrect <div> -- who should check the rest so we can get on with this? |
Fair enough. Long term we probably want this information in tables anyways. Hence we should add IDs to the div tags just as we have for the notes and utterances. i suggest we just use uuid there as well. |
That's reasonable -- do you want to check the divs are correct enough first? I think it's a short script to add an id to the div tags -- we have a uuid generator function in the pyriksdagen module. |
the unit test fails because of a couple protocols in 2021/22 with no body. They're on the riksdag open data, will fix this in a separate PR. |
When I have been thinking a little longer. If we would remove type from the tags later, this would mean that we actually change the API. So we should try to avoid it and fix this right away. I also think MetaSolution was quite clear that the data should just include IDs to simplify linking and adding metadata. Hence, we should do this right away. I dont think its much work. This would mean:
Does this make sense? |
I think this is a fundamentally different approach than what we have done so far. So far, we have had a lot of annotations in the XML files. That's what ParlaClarin is for. Otherwise we would use tabular data, eg. CSVs for text too. My current gut feeling is that our current approach works better with git. Either way, I don't think we should add a new CSV now. Either we continue with our current approach, or change to a tabular structure later after more planning. |
That is true. I think we get some conflicting best practices here. ParlaClarin as a format and MetaSolutions recommendations re using ids and linked data. I agree with metasolutions long term, but you are right. Lets keep this as small as possible. Although we need to add id to all elements anyway since we gonna need to take samples of sections. |
Im hesitant to merge a PR that doesnt pass the tests. So we should then try to fix that assp. |
(re)curate empty protocols in 202122 year
add pb elems to previously empty protocols
Here comes a new sample with id atribs in the div and 'empty' protocols in the 202122 year curated. Lets hope the unit tests pass :D |
Sampled changescorpus/protocols/1972/prot-1972--24.xmlDiff starting from line 3172 @@ -3150,6 +3172,8 @@
<note xml:id="i-HHDhpAANZJrmYUDqzPhmCK">
Denna anhållan bordlades.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-X33R3qea3RbqeNGwrLd1mh">
<note xml:id="i-KzgEJ9DWhqzRBZbWpYnuwj">
§ 12 Anmäldes och bordlades Kungl. Maj:ts propositioner:
</note>
corpus/protocols/1973/prot-1973--120.xmlDiff starting from line 65 @@ -65,7 +65,7 @@
</div>
</front>
<body>
- <div>
+ <div type="commentSection" xml:id="i-7Hg1De5Po567941hoEN5Eb">
<pb facs="https://betalab.kb.se/prot-1973--120/prot_1973__120-000.jp2/_view"/>
<note xml:id="i-6MnUiXYLfspTq7tqvktbyu">
Riksdagens protokoll
corpus/protocols/197879/prot-197879--79.xmlDiff starting from line 62 @@ -62,7 +62,7 @@
</div>
</front>
<body>
- <div>
+ <div type="commentSection" xml:id="i-WhnS8hWbaziWUxiyVsjRu">
<pb facs="https://betalab.kb.se/prot-197879--79/prot_197879__79-000.jp2/_view"/>
<note xml:id="i-PGiGmFUjFqxeozFogDjSPY">
Riksdagens protokoll
corpus/protocols/197879/prot-197879--90.xmlDiff starting from line 3430 @@ -3400,18 +3430,26 @@
betänkande 1978/79:14 Jordbruksutskottets betänkande 1978/79:17
Näringsutskottets betänkanden 1978/79:19-21
</note>
+ </div>
+ <div type="commentSection" xml:id="i-GDV2NxHAuiYua1uC4LgXNa">
<note xml:id="i-BUGefzQkzxYYkaXkrsZx3X">
§ 19 Föredrogs och bifölls Interpellationsframställning 1978/79:149
</note>
+ </div>
+ <div type="commentSection" xml:id="i-QYro5141WoAx8z7uYXVSpa">
<note xml:id="i-WQtoWEzf5SCNFmFSB7tkmg">
§ 20 Talmannen meddelade att på föredragningslistan för morgondagens
sammanträde skulle finansutskottets betänkande nr 20 och skatteutskottets
betänkande nr 29 uppföras främst bland två gånger bordlagda ärenden.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-NbesPokTdQ24SVVR7fSbwU">
<note xml:id="i-UoJDppyqfyMhSS23nZhNgg">
§ 21 Anmäldes och bordlades Proposition 1978/79:89 om lokalhyra
</note>
<pb facs="https://betalab.kb.se/prot-197879--90/prot_197879__90-041.jp2/_view"/>
+ </div>
+ <div type="debateSection" xml:id="i-G7rCj1kjqCHQDawYn99ejp">
<note xml:id="i-N97S5BHKP6kAXoaYbEo9ea">
§ 22 Anmälan av interpellation
</note>
corpus/protocols/197980/prot-197980--41.xmlDiff starting from line 5419 @@ -5389,6 +5419,8 @@
av Lysekilsbanan kan genomföras utan dröjsmål?
</seg>
</u>
+ </div>
+ <div type="commentSection" xml:id="i-V3hXj8qo6S2Pzzt3L2yZpA">
<note xml:id="i-9YeCE5sNnJRu34tVFknkgb">
§ 17 Kammaren åtskildes kl. 15.01. In fidem
</note>
corpus/protocols/197980/prot-197980--56.xmlDiff starting from line 8043 @@ -8023,6 +8043,8 @@
<note xml:id="i-K4E5su54KdtTu5pacFps41">
Mom. 2-7 Kammaren biföll vad utskottet i dessa moment hemställt.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-UNjYVpvV3BHw9yDy8yT9Dg">
<note xml:id="i-8EHo2t5WvyzppWZyb8gyLb">
§ 12 Invandrarundervisning m. m.
</note>
corpus/protocols/198182/prot-198182--31.xmlDiff starting from line 64 @@ -64,7 +64,7 @@
</div>
</front>
<body>
- <div>
+ <div type="commentSection" xml:id="i-YKJQ9t6vq2g91Se14ztFjn">
<pb facs="https://betalab.kb.se/prot-198182--31/prot_198182__31-000.jp2/_view"/>
<note xml:id="i-5fKzSMmN1WerpNSyAhwcXV">
Riksdagens protokoll
corpus/protocols/198283/prot-198283--111.xmlDiff starting from line 353 @@ -353,6 +353,8 @@
<note xml:id="i-PnUNJn84bxmRD9K6GUAbZf">
suppleant i utbildningsutskottet Sonia Thomasson (vpk)
</note>
+ </div>
+ <div type="commentSection" xml:id="i-S1pessFPXLM6rWYzEH1QjS">
<note xml:id="i-PBCqXtv8qLCcE4P18gCuVF">
3§ Talmannen meddelade att Ingemar Konradsson (s) denna dag återtagit
sin plats i riksdagen, varigenom Ulla-Britt Carlssons uppdrag
corpus/protocols/198384/prot-198384--100.xmlDiff starting from line 3183 @@ -3175,15 +3183,21 @@
<note xml:id="i-9YZ5JzboDwZ4z63771xjK3">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-BsCQM62oikNZ3ioKPCXuVk">
<note xml:id="i-DbjQrUu8GsGVNnAbuZjLbi">
11 § På förslag av talmannen beslöt kammaren kl. 11.10 att ajournera
sina förhandlingar till kl. 14.00, då de till dagens bordläggning
anmälda utskottsbetänkandena väntades föreligga.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-BdgMumfDWQE9NA242JctvF">
<note xml:id="i-BGwYLbyW36NMRKdpzngTAC">
12 § Förhandlingarna återupptogs kl. 14.00 under ledning av förste
vice talmannen.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-NDrQvVL2ZQjE8AcX4cttKZ">
<note xml:id="i-7upkPfaSsBkcRFxuFV6S8a">
13 § Anmäldes och bordlades Proposition 1983/84:128 Förslag till
lag om företagshypotek m. m.
corpus/protocols/198384/prot-198384--155.xmlDiff starting from line 3523 @@ -3519,6 +3523,8 @@
<note xml:id="i-FSkMitL3nfGSkXpQC11GRs">
Övriga moment Utskottets hemställan bifölls.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-CFNt3WaMCeUbAjPuAUL5o2">
<note xml:id="i-A4tx6KGBJRvjLq9Hk5NkQ9">
5 §& Arbetsmiljöfrågor, m. m.
</note>
corpus/protocols/198586/prot-198586--110.xmlDiff starting from line 819 @@ -817,6 +819,8 @@
<note xml:id="i-HVqcDkZHt748vre4KMfoYj">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-Vgx1LYaDsrZqDRSozjEjtA">
<note xml:id="i-Vsy74coahMz5bjoq4eZm48">
3 § Svar på interpellation 1985/86:146 om åtgärder mot radioaktiva
utsläpp från engelsk upparbetningsanläggning
corpus/protocols/198687/prot-198687--73.xmlDiff starting from line 366 @@ -366,6 +366,8 @@
<note xml:id="i-7TqiRkjb2yAuoPYePo7jTv">
18 Justerades protokollet för den 9 innevarande månad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-Cc9TJgYWfzZsaGybgG1n5W">
<note xml:id="i-AmpTwAcF3WPs17Dhk2iYVZ">
2 § Svar på interpellation 1986/87:96 om åtgärder för att förenkla
och effektivisera socialförsäkringen
corpus/protocols/199091/prot-199091--78.xmlDiff starting from line 59 @@ -59,13 +59,15 @@
</div>
</front>
<body>
- <div>
+ <div type="commentSection" xml:id="i-Fkv3bwf9PRu2aakA4TSb2n">
<note xml:id="i-H8Se86iLdeznpxoeEpkSnk">
1 § Justering av protokoll
</note>
<note xml:id="i-Ab5VjJsL9Tzm9HS2Cmrk8y">
Justerades protokollet för den 8 mars.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-R3U6NZvuQwJefao1SVMMji">
<note xml:id="i-rTHHpwPubrfwA13NDtzyZ">
2 § Bordläggning
</note>
corpus/protocols/199192/prot-199192--121.xmlDiff starting from line 10376 @@ -10328,6 +10376,8 @@
Kammaren beslöt att ärendebehandlingen skulle fortsättas vid
arbetsplenum måndagen den 1 juni.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-WhuAoTJaCdoaqvPzkqS64Z">
<note xml:id="i-QhYGThrMHD1PiKTY1Rfbi7">
26 § Bordläggning
</note>
corpus/protocols/199293/prot-199293--71.xmlDiff starting from line 6983 @@ -6939,6 +6983,8 @@
<note xml:id="i-23akbUZ546t1WCuf495VAR">
1992/93:AU7, AU9 och AU15
</note>
+ </div>
+ <div type="commentSection" xml:id="i-4QcG6NWqhcZT5x5JbpkprZ">
<note xml:id="i-AFZCMQVmoGJBNbdzg9Exyu">
24 § Bordläggning
</note>
corpus/protocols/199394/prot-199394--124.xmlDiff starting from line 10470 @@ -10456,6 +10470,8 @@
<note xml:id="i-6PvWsiN7QtC2TzTu8BVf8V">
Förhandlingarna återupptogs kl. 15.00.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-34i1ky7Vu8UVS6qhcYpwRQ">
<note xml:id="i-Xf9w3uZ9NXgfNKVLbXEFHp">
9 § Avsägelse
</note>
corpus/protocols/199495/prot-199495--40.xmlDiff starting from line 518 @@ -504,6 +518,8 @@
<note xml:id="i-Y7rzZ3AgF24sFWMunRbJqS">
(Beslut skulle fattas den 14 december.)
</note>
+ </div>
+ <div type="commentSection" xml:id="i-LMooFNh7qWfKgy2ZXrbKj3">
<note xml:id="i-EW8VcGYMQSdCvE1iB7TWzZ">
8 § Oskäliga avtalsvillkor m.m.
</note>
corpus/protocols/199495/prot-199495--76.xmlDiff starting from line 69 @@ -69,6 +69,8 @@
________________________________________________________________________
</seg>
</u>
+ </div>
+ <div type="commentSection" xml:id="i-3guoHd4Xv1BTuGxzwzcJK8">
<note xml:id="i-DtVyPH8Joqh7Nki1jYXfPp">
1 § Avsägelse
</note>
corpus/protocols/199899/prot-199899--17.xmlDiff starting from line 4845 @@ -4825,6 +4845,8 @@
Interpellationerna redovisas i bilaga som fogas till riksdagens
snabbprotokoll tisdagen den 24 november.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-DhjpCNZzuFzXNSkuENgUJ3">
<note xml:id="i-TJo1UeQVntvrUtBpFVTXJZ">
11 § Anmälan om fråga för skriftligt svar
</note>
corpus/protocols/199899/prot-199899--38.xmlDiff starting from line 336 @@ -320,6 +336,8 @@
AU1 samt näringsutskottets betänkanden NU1, NU2 och NU3 skulle
avgöras i ett sammanhang efter avslutad debatt.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-3F8a7BPWWUveovCAFDHV9G">
<note xml:id="i-VURidbF1UszbSTjSQCsGf6">
9 § Ekonomisk trygghet vid arbetslöshet samt arbetsmarknad och
arbetsliv
corpus/protocols/19992000/prot-19992000--112.xmlDiff starting from line 3395 @@ -3379,6 +3395,8 @@
<note xml:id="i-9u95apYeYffGQ4b6dy4Tx2">
(Beslut fattades under 11 §.)
</note>
+ </div>
+ <div type="debateSection" xml:id="i-ETQD59HnRUTg3rFxjeuGda">
<note xml:id="i-PWc4tbZvVhN9ySwwtvfgtV">
9 § Tillträde till internationella instrument mot penningförfalskning
</note>
corpus/protocols/200001/prot-200001--35.xmlDiff starting from line 188 @@ -180,6 +188,8 @@
<note xml:id="i-NXmeydYpGAxgtrgtxNxWGo">
Ingegerd Wärnersson
</note>
+ </div>
+ <div type="debateSection" xml:id="i-kajQ6vJPDaaWnAuVDJ2mF">
<note xml:id="i-QWTK52XQPGju38N82HDtuS">
5 § Svar på interpellation 2000/01:97 om verksamheten vid Lunds
universitets historiska museum
corpus/protocols/200001/prot-200001--56.xmlDiff starting from line 15540 @@ -15490,6 +15540,8 @@
<note xml:id="i-R8atM2aknihnLFwqQbxcyn">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-KKR4LSNNrvaH2uD441gxzR">
<note xml:id="i-QeJXhHMkjXxLrtGf9BSpCd">
26 § Svar på interpellation 2000/01:188 om tomträtter
</note>
corpus/protocols/200001/prot-200001--64.xmlDiff starting from line 59 @@ -59,7 +59,7 @@
</div>
</front>
<body>
- <div>
+ <div type="commentSection" xml:id="i-Up1rjAzqTpCVu3qiGxjv5w">
<pb facs="http://data.riksdagen.se/fil/EAEC16F1-80A8-4F8B-AAC0-1C8AE4993D01#page=1"/>
<note xml:id="i-T2gmCNM3HbFu4pDN2mUgFP">
Det justerade protokollet beräknas utkomma om 3 veckor
corpus/protocols/200102/prot-200102--65.xmlDiff starting from line 70 @@ -70,12 +70,16 @@
<note xml:id="i-K8LC1ZsLfRNnC4XHiXkvKP">
-------------------------------------------------------------------
</note>
+ </div>
+ <div type="commentSection" xml:id="i-WWSVeFS8UUeZxUawjjzX4z">
<note xml:id="i-PHZWq5QuLYvuQUWAARAvJ5">
1 § Justering av protokoll
</note>
<note xml:id="i-YLarRBtfkqisttE3Xm3QLh">
Justerades protokollet för den 1 februari.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-XggoQqtkUGi3Mb2DdxJX5T">
<note xml:id="i-TtPsCZTzEyDsSRKGgV28nc">
2 § Meddelande om utrikespolitisk debatt
</note>
corpus/protocols/200102/prot-200102--79.xmlDiff starting from line 4816 @@ -4798,6 +4816,8 @@
<note xml:id="i-D2WnFNisRbsJ1fv78yRRpb">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-LVVXvDFV784HLHitman5w1">
<note xml:id="i-721gyy9MBKdF8MueZJeuyS">
10 § Svar på interpellation 2001/02:243 om
</note>
corpus/protocols/200304/prot-200304--25.xmlDiff starting from line 9060 @@ -9018,6 +9060,8 @@
tisdagen den 18 november.
</seg>
</u>
+ </div>
+ <div type="commentSection" xml:id="i-LDfA3KtpJomEeCCu9ZhQ6u">
<note xml:id="i-4amEiRs2H6QD57J1pBz3tV">
22 § Kammaren åtskildes kl. 21.51.
</note>
corpus/protocols/200405/prot-200405--101.xmlDiff starting from line 1840 @@ -1828,6 +1840,8 @@
<note xml:id="i-WfUyr6bDix8pQydRBvYZnc">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-AfTQ1twiztLp9zz5FWX2xC">
<note xml:id="i-3rc7DmGdCMBNkKsQzR6Moo">
7 § Kommunal demokrati och kompetens
</note>
corpus/protocols/200405/prot-200405--49.xmlDiff starting from line 12097 @@ -12081,6 +12097,8 @@
<note xml:id="i-WPQNZ4ZexSQjgLEEmFjrKz">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-2wFWqZkSPkaRicGQUS4QkA">
<note xml:id="i-7Exh5dsrheozxMigKJXvWH">
9 § Jord- och skogsbruk, fiske med anslutande näringar
</note>
corpus/protocols/200607/prot-200607--105.xmlDiff starting from line 6734 @@ -6704,6 +6734,8 @@
tisdagen den 15 maj.
</seg>
</u>
+ </div>
+ <div type="commentSection" xml:id="i-7XoJcB6wGnkbkZnvzWbZKS">
<note xml:id="i-E7Md5uEf3X81TWwqgJcmDs">
16 § Kammaren åtskildes kl. 13.37.
</note>
corpus/protocols/200607/prot-200607--111.xmlDiff starting from line 8544 @@ -8514,6 +8544,8 @@
<note xml:id="i-DJApmBGuyE7ZCZvgJpKzXq">
Förste vice talmannen konstaterade att ingen talare var anmäld.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-AcokcFEjeByJPJ5v8MqBM1">
<note xml:id="i-WWx9WS9xnkDb7MA6Usg6Jq">
16 § Avskaffande av åldersgräns
</note>
corpus/protocols/200708/prot-200708--112.xmlDiff starting from line 11085 @@ -11061,6 +11085,8 @@
<note xml:id="i-5MtwaUpEnbdqsHgpMgfiWA">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-5yLTXSMvfTtW3QKsgJ2xE9">
<note xml:id="i-Pc8RzFZwcmC7gf6GuvGYTy">
13 § Ny instansordning för arbetsmiljöärenden
</note>
corpus/protocols/200708/prot-200708--138.xmlDiff starting from line 538 @@ -530,6 +538,8 @@
<note xml:id="i-EBL7QFRPax7LF4C5dLpdD2">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-27dDYET3WdPSjHCEj5wq8W">
<note xml:id="i-Vny6bvLYnMyxjjBur38FNC">
5 § Svar på interpellation 2007/08:837 om kommunernas ekonomi
</note>
corpus/protocols/200809/prot-200809--46.xmlDiff starting from line 561 @@ -545,6 +561,8 @@
<note xml:id="i-PizfC89y4Rb3dBCzerUGoH">
Punkterna 37
</note>
+ </div>
+ <div type="commentSection" xml:id="i-3cJLnXtTKXxgthXAvtP9et">
<note xml:id="i-BG5doLJXVjx69Mq5s9PmTf">
9 § Beslut om ärenden som slutdebatterats den 8 december
</note>
corpus/protocols/200910/prot-200910--11.xmlDiff starting from line 239 @@ -225,6 +239,8 @@
<note xml:id="i-EYKHweT77ozmbLM9zB1ZRr">
Anmäldes och bordlades
</note>
+ </div>
+ <div type="commentSection" xml:id="i-Qe4pDnCh4V6Et9tfDMbjHt">
<note xml:id="i-HrBXgaTLy5sdWanYYzqpmJ">
8 § Anmälan om interpellationer
</note>
corpus/protocols/200910/prot-200910--145.xmlDiff starting from line 14647 @@ -14601,6 +14647,8 @@
<note xml:id="i-Ef7v3QnAjmwK3YPfyo8VjZ">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-8oP5aKFZKeWhhwcqaG6ZsT">
<note xml:id="i-Brxy4sUCXTSQckPAnmZZSK">
24 § Svar på interpellation 2009/10:451 om en allmän och solidarisk
a-kassa
corpus/protocols/201213/prot-201213--110.xmlDiff starting from line 2565 @@ -2547,6 +2565,8 @@
<note xml:id="i-2dFGfFENRyQ6MWqrsy7X91">
Förhandlingarna återupptogs kl. 14.00.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-YSpr6xrU1VxqY1v3b4WudA">
<note xml:id="i-WypqQ9kBen4z8vFrur8ntj">
10 § Statsministerns frågestund
</note>
corpus/protocols/201314/prot-201314--106.xmlDiff starting from line 9972 @@ -9936,6 +9972,8 @@
<note xml:id="i-3ZF5gNg6DqSkKLoAZJxXWq">
Hans Hoff
</note>
+ </div>
+ <div type="commentSection" xml:id="i-NtkHTW2aNp1shs4gWNa1Xe">
<note xml:id="i-RZmLCJhytDFJw8X2PxTMN">
19 § Anmälan om skriftliga svar på frågor
</note>
corpus/protocols/201314/prot-201314--92.xmlDiff starting from line 1773 @@ -1759,6 +1773,8 @@
<note xml:id="i-3hzU8zPGcz1a6wyV3CXm4U">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-9xbf4UDfdTmJGeWN1QcHwN">
<note xml:id="i-A8vb8aPdq5xaRWjGShVR8f">
8 § Svar på interpellation 2013/14:265 om nedsättningen av arbetsgivaravgiften
för unga
corpus/protocols/201415/prot-201415--121.xmlDiff starting from line 6277 @@ -6221,51 +6277,83 @@
<note xml:id="i-HYL11c4ALwi7G75cDWejwC">
Innehållsförteckning
</note>
+ </div>
+ <div type="commentSection" xml:id="i-LF6ocLKPvDi2aGX9xim9m1">
<note xml:id="i-BENKRLk9gwSs8AfMNGX1FZ">
§ 1 Justering av protokoll
</note>
+ </div>
+ <div type="commentSection" xml:id="i-C3AeoTJtidce8KFFhihRE6">
<note xml:id="i-CyvwfEQE78ZviPMQBDnUJy">
§ 2 Anmälan om interpellationer
</note>
+ </div>
+ <div type="commentSection" xml:id="i-H7WqmxQ1Rx4Rue67FjRGFB">
<note xml:id="i-qwqsUrvwJ43QJtE3tGHZ7">
§ 3 Anmälan om skriftliga frågor och svar
</note>
+ </div>
+ <div type="commentSection" xml:id="i-17NvSH25pNM9mtfwF6wPyE">
<note xml:id="i-5L8gCSf1Xvu1Vb1oByo2Jz">
§ 4 Anmälan om ny riksdagsledamot
</note>
+ </div>
+ <div type="commentSection" xml:id="i-Cuhqodpmrr5niztLQwofHm">
<note xml:id="i-KHjec4Capk2BqQs3yxgvfi">
§ 5 Anmälan om återtagande av plats i riksdagen
</note>
+ </div>
+ <div type="commentSection" xml:id="i-Ft99LcFwpf2HuKzgG5fFDu">
<note xml:id="i-22p2sBTuZ4Hd9TLRZktw1t">
§ 6 Avsägelser
</note>
+ </div>
+ <div type="commentSection" xml:id="i-TKV2MKo6JrMzT3tR3JP96o">
<note xml:id="i-K4NXZNhjqedhLT2mgspPYb">
§ 7 Anmälan om ersättare
</note>
+ </div>
+ <div type="commentSection" xml:id="i-37TeapwriAtaYsUehZSTaF">
<note xml:id="i-pjaWvS52ctLq5NctfmAyj">
§ 8 Anmälan om ersättare för statsråd
</note>
+ </div>
+ <div type="commentSection" xml:id="i-SXudBx9J8gZeiRQxAwAboC">
<note xml:id="i-TCZELaX43MvEdJr9MeM7ZQ">
§ 9 Anmälan om ersättare för talman
</note>
+ </div>
+ <div type="commentSection" xml:id="i-AGTuEeZLnF9p6AKLFNYmx7">
<note xml:id="i-CGRLxVnpW4fHNcQRKciDct">
§ 10 Anmälan om kompletteringsval
</note>
+ </div>
+ <div type="commentSection" xml:id="i-XbHjeFUgBGmAeSP9R78yqU">
<note xml:id="i-9RbJy9TKZEqKnirLvvSQat">
§ 11 Anmälan om ny ledamot i Europaparlamentet
</note>
+ </div>
+ <div type="commentSection" xml:id="i-H5CkXN1RYHysvhbC6XCt4s">
<note xml:id="i-16z4Yxx2H5ufGG7x8pxVkH">
§ 12 Anmälan om fördröjda svar på interpellationer
</note>
+ </div>
+ <div type="commentSection" xml:id="i-SiPoBr8ZPyL4sSnNG4Yybz">
<note xml:id="i-MqQLhzqUdZQShQcqj4L4U1">
§ 13 Anmälan om faktapromemorior
</note>
+ </div>
+ <div type="commentSection" xml:id="i-8tFw1bSgPtUBXxKRj6kfzL">
<note xml:id="i-HqAj21UMDMwhq3nyv7wfBU">
§ 14 Anmälan om granskningsrapporter
</note>
+ </div>
+ <div type="commentSection" xml:id="i-NCDvCP3GQ5xEbC7goubgA9">
<note xml:id="i-L9H5AtAJ1pNtv6HE8Xp4wV">
§ 15 Anmälan och omedelbar hänvisning av ärenden till utskott
</note>
+ </div>
+ <div type="debateSection" xml:id="i-XRMzZeUZqBvbSsCGZn8dbc">
<note xml:id="i-Sjmy9DsGJ9KmfAZivikbrh">
§ 16 Svar på interpellation 2014/15:629 om Öresundssamarbete
</note>
corpus/protocols/201516/prot-201516--102.xmlDiff starting from line 8679 @@ -8595,6 +8679,8 @@
<note xml:id="i-Lg4UUSYZEaM7zEpP2DMg34" type="speaker">
Anf. 80 Utbildningsminister GUSTAV FRIDOLIN (MP)
</note>
+ </div>
+ <div type="debateSection" xml:id="i-5ZQs9Uyf91eac6x2jvDbrP">
<note xml:id="i-FtDtadi4A1CgxwPy42nFz4">
§ 17 Svar på interpellation 2015/16:597 om digitala verktyg till
nyanlända elever
corpus/protocols/201516/prot-201516--118.xmlDiff starting from line 7422 @@ -7400,6 +7422,8 @@
<note xml:id="i-8ZWcNpAkEtWdvnMdExgNR2">
(Beslut skulle fattas den 15 juni.)
</note>
+ </div>
+ <div type="debateSection" xml:id="i-LuTkQsGxeFj6DaaX6y3SEB">
<note xml:id="i-TRLzdicc649cdM9zHBbU6U">
§ 4 Övergångsstyre och utjämning vid ändrad kommun- och landstingsindelning
</note>
corpus/protocols/201617/prot-201617--132.xmlDiff starting from line 5507 @@ -5413,6 +5507,8 @@
<note xml:id="i-DsfKwnirY3ayry8TLiQZLk" type="speaker">
Anf. 70 Statsrådet ANNA EKSTRÖM (S)
</note>
+ </div>
+ <div type="debateSection" xml:id="i-WHuUJvLk4MBvaTWbKZDEXz">
<note xml:id="i-NuKiNa3oRbRx1UwtC5JFyq">
§ 22 Svar på interpellation 2016/17:567 om psykisk ohälsa i gymnasiet
</note>
corpus/protocols/201617/prot-201617--26.xmlDiff starting from line 2246 @@ -2230,6 +2246,8 @@
<note xml:id="i-KN2TqXyaGbYBWfTMo8dmtb">
Överläggningen var härmed avslutad.
</note>
+ </div>
+ <div type="debateSection" xml:id="i-UTP5wAXm5cv23Ug2tC7mQW">
<note xml:id="i-4WbDPFnVBDUWCixCXgUJ93">
§ 9 Svar på interpellation 2016/17:68 om svenska kommuners skatteintäkter
</note>
corpus/protocols/201617/prot-201617--29.xmlDiff starting from line 6744 @@ -6706,6 +6744,8 @@
investeringsprodukter för icke-professionella investerare (Priip-produkter)
vad gäller förordningens tillämpningsdag
</note>
+ </div>
+ <div type="commentSection" xml:id="i-VzJ1qH8cMTdndKv4wkVYBC">
<note xml:id="i-SWLCKs9UiqCqTLYQTNZxLC">
§ 20 Anmälan om interpellationer
</note>
corpus/protocols/201617/prot-201617--71.xmlDiff starting from line 59 @@ -59,14 +59,18 @@
</div>
</front>
<body>
- <div>
+ <div type="commentSection" xml:id="i-YFtCuWMiDDWu6Uop9FZMJt">
<pb facs="http://data.riksdagen.se/fil/FEB488CD-C695-4AC6-BF50-03E8DD992394#page=1"/>
+ </div>
+ <div type="commentSection" xml:id="i-BMV2VyzC6umMTWsPP1uyaw">
<note xml:id="i-RHzFCz3FUyw7P9THXX4PPd">
§ 1 Justering av protokoll
</note>
<note xml:id="i-HkQCKHagHXAHV4tTTmBQLY">
Protokollet för den 31 januari justerades.
</note>
+ </div>
+ <div type="commentSection" xml:id="i-MnPga3aEKRTDK14nvituAw">
<note xml:id="i-XeK4DbvNUKiAio9JknQBBb">
§ 2 Anmälan om ny riksdagsledamot
</note>
corpus/protocols/201718/prot-201718--16.xmlDiff starting from line 2097 @@ -2087,6 +2097,8 @@
<note xml:id="i-VprfTiJcobS5BY1aFLWkUu">
till statsrådet Tomas Eneroth (S)
</note>
+ </div>
+ <div type="commentSection" xml:id="i-9AJtcwMQdjThYeF4js2jhp">
<note xml:id="i-42sh1hHRcpx3F8E3zerAEG">
§ 6 Anmälan om frågor för skriftliga svar
</note>
corpus/protocols/201819/prot-201819--29.xmlDiff starting from line 896 @@ -882,6 +896,8 @@
RiR 2018:32 Förvaltningen av premiepensionssystemet – kostnadseffektivitet
för spararnas bästa?
</note>
+ </div>
+ <div type="commentSection" xml:id="i-URPXaDxaHtMWPbrpGJo7Nn">
<note xml:id="i-AamRW2k5tVJHzPJHHx3khK">
§ 8 Ärende för hänvisning till utskott
</note>
corpus/protocols/201819/prot-201819--81.xmlDiff starting from line 12005 @@ -12005,7 +12005,7 @@
Anf. 74 Statsminister STEFAN LÖFVEN (S)
</note>
</div>
- <div type="debateSection">
+ <div type="debateSection" xml:id="i-9SAZJyKhY4oW94uYaAuTm6">
<note xml:id="i-7PskJ8BQ6tG5KGB9E3NL4M">
§ 8 (forts. från § 6) Kriminalvårdsfrågor (forts. JuU13)
</note>
corpus/protocols/202021/prot-202021--12.xmlDiff starting from line 930 @@ -908,21 +930,33 @@
<note xml:id="i-4zgVw8CEi2V7WRGwEPDeUT">
Innehållsförteckning
</note>
+ </div>
+ <div type="commentSection" xml:id="i-2yzxffrPR29NXGwzQyf65k">
<note xml:id="i-TFmWCfnWAdd55cQ87ESsaG">
§ 1 Avsägelser
</note>
+ </div>
+ <div type="commentSection" xml:id="i-3X51zpqMQnPRasbrdG4qEZ">
<note xml:id="i-V2SEZX2HhFhFB6rg2iR2Jq">
§ 2 Anmälan om kompletteringsval
</note>
+ </div>
+ <div type="commentSection" xml:id="i-KzbhxJEVuoCbBNVf2jZWU9">
<note xml:id="i-DUfcrF46i39FucKjtYMUNQ">
§ 3 Anmälan om subsidiaritetsprövning
</note>
+ </div>
+ <div type="commentSection" xml:id="i-Xvxh559FpJiAFQb7HF7ygT">
<note xml:id="i-Q2an59QFjwVGMYVhqT2Qm9">
§ 4 Anmälan om fördröjt svar på interpellation
</note>
+ </div>
+ <div type="commentSection" xml:id="i-7BDA4TFYeNEFgy8PCrZJDQ">
<note xml:id="i-LZDTcAcrZpcLuqjtAxcXyf">
§ 5 Ärenden för hänvisning till utskott
</note>
+ </div>
+ <div type="debateSection" xml:id="i-SVdbsBgArW7NeiGihZY8HX">
<note xml:id="i-NRBVBusRwHf7TJjewfHLwx">
§ 6 Svar på interpellation 2019/20:451 om bistånd till stater
som inte respekterar mänskliga rättigheter
|
Any ideas how we formally know if it is correct or not? |
Still the problem that tags becomes a section. This should be easy to fix? Also, an innehållsförteckning seem to incorrectly end up in a large number of sections. Is this easy to fix? |
I guess if the div is not empty, doesn't contain multiple sections, and has the type+id attribs. |
I don't follow.
After merging this it's what I wanted to do first after taking a first crack at identifying the interpellation debates. I don't think it would be too difficult, but you never know until you actually start doing it. |
I fully agree.
|
No it's not that much work to fix stray <pb> elems in a section, but... 2.1. We wanted to do this quality control before committing edits to the whole set of protocols for reasons of economy. So either we approve what's here and I can commit it, then fix the pb thing with another commit (before merging the PR), or I can fix it now in the already modified files, but then we conflate 'types' of edits in one piece of the revision history. 2.2. Debate sections have intros, comment sections don't -- it seems like a reasonable criterion for evaluation. Should I check that? in the sample? I'd like to be able to take this a step or two forward today. |
2.a. Im not sure I followed. So I just checked for obvious errors and found those. If we fix those, we can get a new sample we can assess. That should not conflate anything or be problematic? 2.b. Great. I just wanted to know. Then it seems good to just check the debates based on this definition and check that the commentSections are not incorrect and that not incorrect divs are introduced. But this raises an issue that we need to start to define divs in a better way. Because this is slightly in between an analytic decision and an data authentic one. And we want to be as close to the latter as possible. |
I've gone through them now: mostly they're ok. Marked correct if:
It looks like 6 are incorrect by those criteria and the incorrect ones are due to lone <pb> elems in a div or the content of the table of contents section getting tagged as section head and intros. I'll commit the rest of the protocols, then let's merge and I'll open issues for these two problems. |
Great! Do you open an issue? |
@MansMeg will you merge when the tests pass? |
Here I use existing code (
scripts/split_into_sections.py
) to divide up the unicameral period protocols into sections (based on the§
character), delimited by <div> elements.I also sneak in a script (
scripts/git-add_diff-sample.py
) which should work in tandem with @ninpnin 'ssample-git-diffs
, in order to quicklygit add
the files that were sampled from the diff.Sample for quality assessment to follow.