Dingemanse M, Vreede B, Viviani E, Rodríguez-Sánchez P (2024).
talkr: Plotting Conversation Data.
-R package version 0.1.1.
+R package version 0.1.2.
@Manual{,
title = {talkr: Plotting Conversation Data},
author = {Mark Dingemanse and Barbara Vreede and Eva Viviani and Pablo Rodríguez-Sánchez},
year = {2024},
- note = {R package version 0.1.1},
+ note = {R package version 0.1.2},
}
diff --git a/pkgdown.yml b/pkgdown.yml
index acc01a7..25e9d39 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -3,7 +3,7 @@ pkgdown: 2.1.1
pkgdown_sha: ~
articles:
workflows: workflows.html
-last_built: 2024-09-21T17:17Z
+last_built: 2024-11-05T09:35Z
urls:
reference: https://github.io/elpaco-escience/talkr/reference
article: https://github.io/elpaco-escience/talkr/articles
diff --git a/reference/add_lines.html b/reference/add_lines.html
index 4039b53..84ac56c 100644
--- a/reference/add_lines.html
+++ b/reference/add_lines.html
@@ -15,7 +15,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/calculate_timing.html b/reference/calculate_timing.html
index 939bf67..39aad2c 100644
--- a/reference/calculate_timing.html
+++ b/reference/calculate_timing.html
@@ -13,7 +13,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/check_columns.html b/reference/check_columns.html
index ae78575..789007e 100644
--- a/reference/check_columns.html
+++ b/reference/check_columns.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/check_talkr.html b/reference/check_talkr.html
index ba838d7..705551e 100644
--- a/reference/check_talkr.html
+++ b/reference/check_talkr.html
@@ -19,7 +19,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/check_time.html b/reference/check_time.html
index b2baeff..aaa31c8 100644
--- a/reference/check_time.html
+++ b/reference/check_time.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/geom_token.html b/reference/geom_token.html
index bdf6ce1..815c153 100644
--- a/reference/geom_token.html
+++ b/reference/geom_token.html
@@ -11,7 +11,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/geom_turn.html b/reference/geom_turn.html
index 95dd61e..83f63c3 100644
--- a/reference/geom_turn.html
+++ b/reference/geom_turn.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/get_ifadv.html b/reference/get_ifadv.html
index fde9335..9718af6 100644
--- a/reference/get_ifadv.html
+++ b/reference/get_ifadv.html
@@ -11,7 +11,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/ggplot2-ggproto.html b/reference/ggplot2-ggproto.html
index e639ba5..7756d9a 100644
--- a/reference/ggplot2-ggproto.html
+++ b/reference/ggplot2-ggproto.html
@@ -9,7 +9,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/index.html b/reference/index.html
index 7b9fafe..549dda7 100644
--- a/reference/index.html
+++ b/reference/index.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/init.html b/reference/init.html
index 8bfc562..2e174c5 100644
--- a/reference/init.html
+++ b/reference/init.html
@@ -13,7 +13,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/plot_density.html b/reference/plot_density.html
index 30c5c20..6a11c29 100644
--- a/reference/plot_density.html
+++ b/reference/plot_density.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/plot_quality.html b/reference/plot_quality.html
index 4c439f3..28a3cec 100644
--- a/reference/plot_quality.html
+++ b/reference/plot_quality.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/plot_scatter.html b/reference/plot_scatter.html
index 1e2dd18..c46b7e2 100644
--- a/reference/plot_scatter.html
+++ b/reference/plot_scatter.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/report_stats.html b/reference/report_stats.html
index 17f76fa..f0b33ff 100644
--- a/reference/report_stats.html
+++ b/reference/report_stats.html
@@ -11,7 +11,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/theme_turnPlot.html b/reference/theme_turnPlot.html
index 22c2dda..19eefda 100644
--- a/reference/theme_turnPlot.html
+++ b/reference/theme_turnPlot.html
@@ -7,7 +7,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/reference/tokenize.html b/reference/tokenize.html
index f15f92e..2ad3f3d 100644
--- a/reference/tokenize.html
+++ b/reference/tokenize.html
@@ -15,7 +15,7 @@
talkr
- 0.1.1
+ 0.1.2
diff --git a/search.json b/search.json
index 9e1056f..db67a6a 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Basic workflows for talkr","text":"talkr package designed working conversational data R.","code":""},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"loading-some-data","dir":"Articles","previous_headings":"","what":"Loading some data","title":"Basic workflows for talkr","text":"using IFADV corpus example data workflow talkr. corpus consisting 20 dyadic conversations Dutch, published Nederlandse Taalunie 2007 (source) snippet initializes talkr dataset using ifadv data. information IFADV dataset, see repository link. Essential talkr workflow minimal set data fields. following: source: source conversation (corpus can consist multiple sources) begin: begin time (ms) utterance end: end time (ms) utterance utterance: content utterance participant: person produced utterance init() function takes minimal fields generates uid: unique identifier utterance-level can used reference select filter specific utterances. init() function can used rename columns necessary. example, column participant named speaker, can rename follows: dataset can contain additional fields. instance, IFADV sample dataset also contain language (Dutch) utterance_raw (fuller, less processed version utterance content). also contains measures related turn-taking timing, including FTO (floor transfer offset, offset current turn prior participant, milliseconds) freq rank, frequency measures utterance content.","code":"data <- get_ifadv() data <- init(data) #> Warning in init(data): Column 'uid' already exists in the dataset. This column #> will be renamed to `original_uid`. talkr_data <- init(data, participant = \"speaker\")"},{"path":[]},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"summary-statistics","dir":"Articles","previous_headings":"Workflow 1: Quality control","what":"Summary statistics","title":"Basic workflows for talkr","text":"report_stats function provides simple summary dataset, including total number utterances, total duration conversation, number participants, number sources.","code":"report_stats(data) #> ### Corpus-level statistics #> #> nr of sources: 20 #> nr of participants: 40 #> nr of turns: 14022 #> mean turn duration (ms): 1257 #> turns per hour: 2806 #> total recording (min): 299.9 #> total recording (hours): 5 #> total speaking time (min): 293.8 #> total speaking time (hours): 4.9 #> #> ### Conversation-level statistics (per source) #> #> |source | nr of participants| total recording (ms)| total recording (min)| total speaking time (ms)| total speaking time (min)| #> |:---------------|------------------:|--------------------:|---------------------:|------------------------:|-------------------------:| #> |/dutch2/DVA10O | 2| 899341| 15| 829140| 13.8| #> |/dutch2/DVA11Q | 2| 899061| 15| 884734| 14.7| #> |/dutch2/DVA12S | 2| 900000| 15| 785323| 13.1| #> |/dutch2/DVA13U | 2| 900000| 15| 877328| 14.6| #> |/dutch2/DVA14W | 2| 900000| 15| 796924| 13.3| #> |/dutch2/DVA15Y | 2| 899379| 15| 1032776| 17.2| #> |/dutch2/DVA16AA | 2| 900000| 15| 974312| 16.2| #> |/dutch2/DVA17AC | 2| 899961| 15| 885257| 14.8| #> |/dutch2/DVA19AG | 2| 899236| 15| 788268| 13.1| #> |/dutch2/DVA1A | 2| 899538| 15| 879384| 14.7| #> |/dutch2/DVA20AI | 2| 899145| 15| 918466| 15.3| #> |/dutch2/DVA22AL | 2| 899838| 15| 879580| 14.7| #> |/dutch2/DVA24AK | 2| 900000| 15| 887106| 14.8| #> |/dutch2/DVA2C | 2| 899635| 15| 1097376| 18.3| #> |/dutch2/DVA3E | 2| 899091| 15| 851393| 14.2| #> |/dutch2/DVA4C | 2| 899108| 15| 850314| 14.2| #> |/dutch2/DVA6H | 2| 899981| 15| 889722| 14.8| #> |/dutch2/DVA7B | 2| 899664| 15| 913682| 15.2| #> |/dutch2/DVA8K | 2| 900000| 15| 807418| 13.5| #> |/dutch2/DVA9M | 2| 899773| 15| 802054| 13.4| #> #> ### Turn statistics (per source) #> #> |source | nr of turns| mean turn duration (ms)| shortest turn (ms)| longest turn (ms)| turns per hour| #> |:---------------|-----------:|-----------------------:|------------------:|-----------------:|--------------:| #> |/dutch2/DVA10O | 501| 1655| 233| 3001| 2005| #> |/dutch2/DVA11Q | 792| 1117| 124| 4188| 3171| #> |/dutch2/DVA12S | 640| 1227| 173| 3819| 2560| #> |/dutch2/DVA13U | 717| 1224| 196| 3173| 2868| #> |/dutch2/DVA14W | 721| 1105| 97| 3115| 2884| #> |/dutch2/DVA15Y | 770| 1341| 154| 4973| 3082| #> |/dutch2/DVA16AA | 604| 1613| 292| 3447| 2416| #> |/dutch2/DVA17AC | 782| 1132| 0| 3453| 3128| #> |/dutch2/DVA19AG | 648| 1216| 136| 2988| 2594| #> |/dutch2/DVA1A | 681| 1291| 143| 3417| 2725| #> |/dutch2/DVA20AI | 758| 1212| 167| 5472| 3035| #> |/dutch2/DVA22AL | 745| 1181| 143| 3448| 2981| #> |/dutch2/DVA24AK | 685| 1295| 121| 4421| 2740| #> |/dutch2/DVA2C | 765| 1434| 217| 3415| 3061| #> |/dutch2/DVA3E | 753| 1131| 175| 3199| 3015| #> |/dutch2/DVA4C | 719| 1183| 145| 3697| 2879| #> |/dutch2/DVA6H | 683| 1303| 122| 4786| 2732| #> |/dutch2/DVA7B | 691| 1322| 197| 3395| 2765| #> |/dutch2/DVA8K | 612| 1319| 200| 8712| 2448| #> |/dutch2/DVA9M | 755| 1062| 163| 2909| 3021|"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"visual-quality-checks","dir":"Articles","previous_headings":"Workflow 1: Quality control","what":"Visual quality checks","title":"Basic workflows for talkr","text":"plot_quality function provides visual check nature data, visualizing distribution turn durations, transition timing. Transition timing similar FTO, calculated without additional quality checks: transitions identified participant changes one turn next. transition time calculated difference beginning turn new participant, end turn previous one. default, plot_quality() plot entire dataset: Quality plots can also run specific source: quality plot consists three separate visualizations, designed allow rapid visual inspection spotting oddities: density plot turn durations. normally expected look like distribution peak around 2000ms (2 seconds) maximum lengths far exceed 10000ms (10 seconds) (Liesenfeld & Dingemanse 2022). goal plot allow eyeballing oddities like turns extreme durations sets turns exact duration (unlikely carefully segmented conversational data). density plot turn transition times. plot like expected look like normal distribution centered around 0-200ms (Stivers et al. 2009). Deviations may signal problems dataset, instance due imprecise automated annotation methods. scatterplot turn transition (x) turn duration (y). combines distributions expected look like cloud datapoints thickest middle region. standout patterns (instance, turns whose duration equal transition time) indicative problems segmentation timing data. three plots can also generated separately:","code":"plot_quality(data) plot_quality(data, source = \"/dutch2/DVA8K\") plot_density(data, colname=\"duration\", title=\"Turn durations\",xlab=\"duration (ms)\") plot_density(data, colname=\"FTO\", title=\"Turn transitions (FTO)\",xlab=\"FTO (ms)\") plot_scatter(data, colname_x=\"FTO\",colname_y=\"duration\",title=\"Turn transitions and durations\",xlab=\"transition (ms)\", ylab=\"duration (ms)\")"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"workflow-2-plot-conversations","dir":"Articles","previous_headings":"","what":"Workflow 2: Plot conversations","title":"Basic workflows for talkr","text":"Another key use talkr visualize conversational patterns. first way geom_turn(), ggplot2-compatible geom visualizes timing duration turns conversation. can start simply visualizing conversations dataset. sample first four plot first minute . display together using facet_wrap() source. often, want plot single conversation explore detail. Let’s zoom one first four. plot without tweaking, helpful: conversation 15 minutes long hard appreciate structure put single line. similar conversational transcripts: present conversation left--right, top--bottom grid. , first need divide long conversation number shorter lines. using add_lines(). default, divide conversation lines 60000ms (1 minute), creating many lines needed. now, let’s focus first 4 minutes, can filtering line_id < 5 ’ve added lines. can style plot like using available variables. instance, can highlight turns produced overlap: far just visualized temporal structure. conversational turns typically consist words elements.","code":"library(ggplot2) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union # we simplify participant names conv <- data |> group_by(source) |> mutate(participant = as.character(factor(participant, labels=c(\"A\",\"B\"),ordered=T))) # select first four conversations these_sources <- unique(data$source)[1:4] conv |> filter(end < 60000, # select first 60 seconds source %in% these_sources) |> # filter to keep only these conversations ggplot(aes(x = end, y = participant)) + geom_turn(aes( begin = begin, end = end)) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() + facet_wrap(~source) # let's facet to show the conversations side by side conv |> filter(source == \"/dutch2/DVA12S\") |> ggplot(aes(x = end, y = participant)) + geom_turn(aes( begin = begin, end = end)) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() conv <- conv |> add_lines(line_duration = 60000) conv |> filter(source == \"/dutch2/DVA12S\", line_id < 5) |> # limit to the first five lines ggplot(aes(x = line_end, y = line_participant)) + ggtitle(\"The first four minutes from DVA12S\") + geom_turn(aes( begin = line_begin, # the begin and end aesthetics are now line-relative end = line_end)) + scale_y_reverse(breaks = seq(1, max(conv$line_id))) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() p <- last_plot() p + ggtitle(\"Turns produced in overlap\") + geom_turn(aes( begin = line_begin, end = line_end, fill=overlap, colour=overlap)) + scale_fill_discrete(na.translate=F) + # stop NA value from showing up in legend scale_colour_discrete(na.translate=F) # stop NA value from showing up in legend p + ggtitle(\"Turns produced in overlap\") + geom_turn(aes( begin = line_begin, end = line_end, fill=overlap, colour=overlap)) + scale_fill_discrete(na.translate=F) + # stop NA value from showing up in legend scale_colour_discrete(na.translate=F) # stop NA value from showing up in legend"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"looking-into-tokens","dir":"Articles","previous_headings":"Workflow 2: Plot conversations","what":"Looking into tokens","title":"Basic workflows for talkr","text":"can start looking internal structure turns plotting occurrence tokens. , first need generate token-specific dataframe tokenize(). calculate token frequencies tokens selected dataset (data default). also calculates relative positions time individual tokens turn. Finally, provides simple positional classification (token appears ), first (token turn-initial), last (token utterance-final), middle (token first last). information tokens hand, can start asking questions. instance, relative frequency words relate position turn? explore question, let’s look shorter excerpt: 1 minute total, divided 4 lines. , create dataframe this_conv, dividing 4 lines 15 seconds . also create dataframe these_tokens tokenized turn elements conversation, divided way. Finally, can also print content elements. , pick frequent turn-initial elements plotting, highlight another layer geom_token() plot text using geom_label_repel():","code":"conv_tokens <- conv |> tokenize() this_conv <- conv |> add_lines(line_duration=15000) |> filter(source == \"/dutch2/DVA12S\", line_id < 5) # let's look at the first three lines these_tokens <- conv_tokens |> add_lines(line_duration=15000, time_columns = \"relative_time\") |> filter(source == \"/dutch2/DVA12S\", line_id < 5) this_conv |> ggplot(aes(x = line_end, y = line_participant)) + ggtitle(\"Relative frequency of elements within turns\") + scale_y_reverse() + # we reverse the axis because lines run top to bottom geom_turn(aes( begin = line_begin, end = line_end)) + geom_token(data=these_tokens, aes(x=line_relative_time, size=frequency)) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() p <- last_plot() these_tokens_first <- these_tokens |> filter(order==\"first\", rank < 10) p + ggtitle(\"Some frequent turn-initial elements\") + geom_token(data=these_tokens_first, aes(x=line_relative_time), color=\"red\") + ggrepel::geom_label_repel(data=these_tokens_first, aes(x=line_relative_time, label=token), direction=\"y\")"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"notes","dir":"Articles","previous_headings":"Workflow 2: Plot conversations","what":"Notes","title":"Basic workflows for talkr","text":"init function can also used reformat timestamps. Default “ms”, expects milliseconds. ‘%H:%M:%OS’ format eg. 00:00:00.010 milliseconds (10). See ‘?strptime’ format examples. Token frequencies calculated entire dataset. want source-specific data, can filter source prior tokenization:","code":"init(format_timestamps=\"ms\") tokens_DVA9M <- data |> filter(source == \"/dutch2/DVA9M\") |> tokenize() tokens_DVA9M"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Basic workflows for talkr","text":"Liesenfeld, Andreas, Mark Dingemanse. 2022. ‘Building Curating Conversational Corpora Diversity-Aware Language Science Technology’. Proceedings 13th Conference Language Resources Evaluation (LREC 2022), 1178–92. Marseille. doi:10.48550/arXiv.2203.03399. Stivers, Tanya, N. J. Enfield, Penelope Brown, C. Englert, Makoto Hayashi, Trine Heinemann, Gertie Hoymann, Federico Rossano, J. P. de Ruiter, Kyung-Eun Yoon, Stephen C. Levinson. 2009. ‘Universals Cultural Variation Turn-Taking Conversation’. Proceedings National Academy Sciences 106 (26): 10587–92. doi:10.1073/pnas.0903616106.","code":""},{"path":"https://github.io/elpaco-escience/talkr/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Mark Dingemanse. Author, maintainer. Barbara Vreede. Author. Eva Viviani. Author. Pablo Rodríguez-Sánchez. Author. Andreas Liesenfeld. Contributor. Netherlands eScience Center. Copyright holder, funder.","code":""},{"path":"https://github.io/elpaco-escience/talkr/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Dingemanse M, Vreede B, Viviani E, Rodríguez-Sánchez P (2024). talkr: Plotting Conversation Data. R package version 0.1.1.","code":"@Manual{, title = {talkr: Plotting Conversation Data}, author = {Mark Dingemanse and Barbara Vreede and Eva Viviani and Pablo Rodríguez-Sánchez}, year = {2024}, note = {R package version 0.1.1}, }"},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"talkr-","dir":"","previous_headings":"","what":"Plotting Conversation Data","title":"Plotting Conversation Data","text":"talkr offers set convenience functions quality control, visualisation analysis conversational data. provides support least two common needs people working conversational corpora: (1) quality control offering rapid insights nature, timing, quality time-aligned annotations conversational corpus; (2) compelling visualisations offering range plotting functions play well ggplot tidyverse. Note: talkr plays well companion python package scikit-talk, focused processing conversational data readying downstream analysis.","code":""},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Plotting Conversation Data","text":"can install development version talkr GitHub : talkr submitted CRAN, soon installable :","code":"# install.packages(\"devtools\") devtools::install_github(\"elpaco-escience/talkr\", build_vignettes = TRUE) install.packages(\"talkr\")"},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"workflows","dir":"","previous_headings":"","what":"Workflows","title":"Plotting Conversation Data","text":"Two workflows talkr described workflows vignette. can also access vignette RStudio, running:","code":"vignette(\"workflows\", package = \"talkr\")"},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"contact","dir":"","previous_headings":"","what":"Contact","title":"Plotting Conversation Data","text":"Questions? Comments? welcome! package active development still rough edges. Interact us issues questions suggestions.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":null,"dir":"Reference","previous_headings":"","what":"Add information for line-by-line visualization — add_lines","title":"Add information for line-by-line visualization — add_lines","text":"function adds columns dataset adds line ID, changes columns timestamps relative beginning line, data can visualized line--line. participant column also adjusted create Y-coordinate speaker. line duration set 60 seconds default.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add information for line-by-line visualization — add_lines","text":"","code":"add_lines(data, time_columns = c(\"begin\", \"end\"), line_duration = 60000)"},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add information for line-by-line visualization — add_lines","text":"data dataset divide lines time_columns columns timestamps need adjusted line-relative time line_duration length line (ms)","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add information for line-by-line visualization — add_lines","text":"data set added columns: `line_id`, `line_participant`, `line_column` every column `time_columns`","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Add information for line-by-line visualization — add_lines","text":"transformation can done multiple columns time-stamped data. Use `time_columns` argument supply names one columns transformed.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate conversation properties — calculate_timing","title":"Calculate conversation properties — calculate_timing","text":"dataframe generated conversation properties related timing. data made quality control purposes , contain sophisticated transition calculation methods. , refer python package `scikit-talk`.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate conversation properties — calculate_timing","text":"","code":"calculate_timing(data)"},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate conversation properties — calculate_timing","text":"data talkr data frame","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate conversation properties — calculate_timing","text":"data frame containing UIDs calculated columns turn_duration, transition_time","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":null,"dir":"Reference","previous_headings":"","what":"Check the presence of necessary columns in a dataset — check_columns","title":"Check the presence of necessary columns in a dataset — check_columns","text":"Check presence necessary columns dataset","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check the presence of necessary columns in a dataset — check_columns","text":"","code":"check_columns(data, columns)"},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check the presence of necessary columns in a dataset — check_columns","text":"data dataset check columns vector column names must present","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check the presence of necessary columns in a dataset — check_columns","text":"nothing, throws error column missing","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":null,"dir":"Reference","previous_headings":"","what":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"Uses check_columns() check : - begin - end - participant - utterance - source - uid","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"","code":"check_talkr(data)"},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"data dataset check","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"Verifies begin end columns numeric, likely indicate milliseconds.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":null,"dir":"Reference","previous_headings":"","what":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"Verify timing columns numeric likely indicate milliseconds.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"","code":"check_time(column, name)"},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"column vector timing information name name column","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"nothing, throws error column numeric warns column may indicate milliseconds","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_token.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot individual tokens — geom_token","title":"Plot individual tokens — geom_token","text":"separate data frame containing tokenized data, plot individual tokens estimated time. Data must provided separately, contain column participant (y) column time (x).","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_token.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot individual tokens — geom_token","text":"","code":"geom_token( data, mapping = NULL, stat = \"identity\", position = \"identity\", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )"},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_token.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot individual tokens — geom_token","text":"data tokenized data frame (see `tokenize()`). mapping Set aesthetic mappings created aes(). specified inherit.aes = TRUE (default), combined default mapping top level plot. must supply mapping plot mapping. stat statistical transformation use data layer. using geom_*() function construct layer, stat argument can used override default coupling geoms stats. stat argument accepts following: Stat ggproto subclass, example StatCount. string naming stat. give stat string, strip function name stat_ prefix. example, use stat_count(), give stat \"count\". information ways specify stat, see layer stat documentation. position position adjustment use data layer. can used various ways, including prevent overplotting improving display. position argument accepts following: result calling position function, position_jitter(). method allows passing extra arguments position. string naming position adjustment. give position string, strip function name position_ prefix. example, use position_jitter(), give position \"jitter\". information ways specify position, see layer position documentation. ... arguments passed layer()'s params argument. arguments broadly fall one 4 categories . Notably, arguments position argument, aesthetics required can passed .... Unknown arguments part 4 categories ignored. Static aesthetics mapped scale, fixed value apply layer whole. example, colour = \"red\" linewidth = 3. geom's documentation Aesthetics section lists available options. 'required' aesthetics passed params. Please note passing unmapped aesthetics vectors technically possible, order required length guaranteed parallel input data. constructing layer using stat_*() function, ... argument can used pass parameters geom part layer. example stat_density(geom = \"area\", outline.type = \"\"). geom's documentation lists parameters can accept. Inversely, constructing layer using geom_*() function, ... argument can used pass parameters stat part layer. example geom_area(stat = \"density\", adjust = 0.5). stat's documentation lists parameters can accept. key_glyph argument layer() may also passed .... can one functions described key glyphs, change display layer legend. na.rm FALSE, default, missing values removed warning. TRUE, missing values silently removed. show.legend logical. layer included legends? NA, default, includes aesthetics mapped. FALSE never includes, TRUE always includes. can also named logical vector finely select aesthetics display. inherit.aes FALSE, overrides default aesthetics, rather combining . useful helper functions define data aesthetics inherit behaviour default plot specification, e.g. borders().","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_turn.html","id":null,"dir":"Reference","previous_headings":"","what":"Show turn-taking in visualized conversations — geom_turn","title":"Show turn-taking in visualized conversations — geom_turn","text":"Show turn-taking visualized conversations","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_turn.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Show turn-taking in visualized conversations — geom_turn","text":"","code":"geom_turn( mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", ..., na.rm = FALSE, height = 0.5, show.legend = NA, inherit.aes = TRUE )"},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_turn.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Show turn-taking in visualized conversations — geom_turn","text":"mapping Set aesthetic mappings created `ggplot2::aes()`. Requires specification `begin` `end` turns. Inherits default mapping top level plot, `inherit.aes` set `TRUE` (default). data data displayed layer. three options: NULL, default, data inherited plot data specified call ggplot(). data.frame, object, override plot data. objects fortified produce data frame. See fortify() variables created. function called single argument, plot data. return value must data.frame, used layer data. function can created formula (e.g. ~ head(.x, 10)). stat statistical transformation use data layer. using geom_*() function construct layer, stat argument can used override default coupling geoms stats. stat argument accepts following: Stat ggproto subclass, example StatCount. string naming stat. give stat string, strip function name stat_ prefix. example, use stat_count(), give stat \"count\". information ways specify stat, see layer stat documentation. position position adjustment use data layer. can used various ways, including prevent overplotting improving display. position argument accepts following: result calling position function, position_jitter(). method allows passing extra arguments position. string naming position adjustment. give position string, strip function name position_ prefix. example, use position_jitter(), give position \"jitter\". information ways specify position, see layer position documentation. ... arguments passed layer()'s params argument. arguments broadly fall one 4 categories . Notably, arguments position argument, aesthetics required can passed .... Unknown arguments part 4 categories ignored. Static aesthetics mapped scale, fixed value apply layer whole. example, colour = \"red\" linewidth = 3. geom's documentation Aesthetics section lists available options. 'required' aesthetics passed params. Please note passing unmapped aesthetics vectors technically possible, order required length guaranteed parallel input data. constructing layer using stat_*() function, ... argument can used pass parameters geom part layer. example stat_density(geom = \"area\", outline.type = \"\"). geom's documentation lists parameters can accept. Inversely, constructing layer using geom_*() function, ... argument can used pass parameters stat part layer. example geom_area(stat = \"density\", adjust = 0.5). stat's documentation lists parameters can accept. key_glyph argument layer() may also passed .... can one functions described key glyphs, change display layer legend. na.rm FALSE, default, missing values removed warning. TRUE, missing values silently removed. height height turn-taking rectangles show.legend logical. layer included legends? NA, default, includes aesthetics mapped. FALSE never includes, TRUE always includes. can also named logical vector finely select aesthetics display. inherit.aes FALSE, overrides default aesthetics, rather combining . useful helper functions define data aesthetics inherit behaviour default plot specification, e.g. borders().","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":null,"dir":"Reference","previous_headings":"","what":"Get IFADV data — get_ifadv","title":"Get IFADV data — get_ifadv","text":"IFA Dialog Video corpus data Available public repository: https://github.com/elpaco-escience/ifadv","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get IFADV data — get_ifadv","text":"","code":"get_ifadv( source = \"https://raw.githubusercontent.com/elpaco-escience/ifadv/csv/data/ifadv.csv\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get IFADV data — get_ifadv","text":"source (default = \"https://raw.githubusercontent.com/elpaco-escience/ifadv/csv/data/ifadv.csv\")","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get IFADV data — get_ifadv","text":"data frame containing IFADV dataset","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get IFADV data — get_ifadv","text":"function requires internet connection.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/ggplot2-ggproto.html","id":null,"dir":"Reference","previous_headings":"","what":"GeomToken — GeomToken","title":"GeomToken — GeomToken","text":"GeomToken GeomTurn","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":null,"dir":"Reference","previous_headings":"","what":"Initialize a `talkr` dataset — init","title":"Initialize a `talkr` dataset — init","text":"dataframe object, generate talkr dataset. dataset contains columns used throughout talkr infrastructure visualize conversations language corpora. Initializing talkr dataset first step talkr workflow.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Initialize a `talkr` dataset — init","text":"","code":"init( data, source = \"source\", begin = \"begin\", end = \"end\", participant = \"participant\", utterance = \"utterance\", format_timestamps = \"ms\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Initialize a `talkr` dataset — init","text":"data dataframe object source column name identifying conversation source (e.g. filename; used unique conversation ID). different sources data, set parameter `NULL`. begin column name begin time utterance (milliseconds) end column name end time utterance (milliseconds) participant column name participant produced utterance utterance column name utterance format_timestamps format timestamps begin end columns. Default \"ms\", expects milliseconds. `%H:%M:%OS` format eg. 00:00:00.010 milliseconds (10). See `?strptime` format examples.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Initialize a `talkr` dataset — init","text":"dataframe object columns needed talkr workflow","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a density plot of a specific column — plot_density","title":"Make a density plot of a specific column — plot_density","text":"Make density plot specific column","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a density plot of a specific column — plot_density","text":"","code":"plot_density( data, colname, title = \"Density\", xlab = \"value\", ylab = \"density\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a density plot of a specific column — plot_density","text":"data data frame containing column colname column name density plotted title plot title xlab x-axis label ylab y-axis label","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Make a density plot of a specific column — plot_density","text":"recorded plot","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":null,"dir":"Reference","previous_headings":"","what":"Check source quality by plotting timing data — plot_quality","title":"Check source quality by plotting timing data — plot_quality","text":"Check source quality plotting timing data","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check source quality by plotting timing data — plot_quality","text":"","code":"plot_quality(data, source = \"all\", saveplot = FALSE)"},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check source quality by plotting timing data — plot_quality","text":"data talkr data frame source source checked (default \"\", source selected) saveplot save plot file (default FALSE)","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check source quality by plotting timing data — plot_quality","text":"list recorded plots","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a scatter plot of two columns — plot_scatter","title":"Make a scatter plot of two columns — plot_scatter","text":"Make scatter plot two columns","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a scatter plot of two columns — plot_scatter","text":"","code":"plot_scatter( data, colname_x, colname_y, title = \"Scatter\", xlab = \"x\", ylab = \"y\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a scatter plot of two columns — plot_scatter","text":"data data frame containing columns colname_x name column plotted x-axis colname_y name column plotted y-axis title plot title xlab x-axis label ylab y-axis label","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Make a scatter plot of two columns — plot_scatter","text":"recorded plot","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":null,"dir":"Reference","previous_headings":"","what":"Report corpus-level and conversation-level statistics — report_stats","title":"Report corpus-level and conversation-level statistics — report_stats","text":"Basic conversation statistics reported console: - Corpus-level statistics, reporting dataset whole; - Conversation-level statistics, reporting per source.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Report corpus-level and conversation-level statistics — report_stats","text":"","code":"report_stats(data)"},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Report corpus-level and conversation-level statistics — report_stats","text":"data talkr dataset","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Report corpus-level and conversation-level statistics — report_stats","text":"input function must `talkr` dataset, containing columns `source`, `participant`, `begin`, `end`. Time stamps columns `begin` `end` must milliseconds. easily transform dataset `talkr` dataset, consult `talkr::init()`.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/theme_turnPlot.html","id":null,"dir":"Reference","previous_headings":"","what":"theme for the turn plot — theme_turnPlot","title":"theme for the turn plot — theme_turnPlot","text":"theme turn plot","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/theme_turnPlot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"theme for the turn plot — theme_turnPlot","text":"","code":"theme_turnPlot(base_size = 11, base_family = \"serif\", ticks = TRUE)"},{"path":"https://github.io/elpaco-escience/talkr/reference/theme_turnPlot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"theme for the turn plot — theme_turnPlot","text":"base_size int base_family chr ticks bool","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate a token-specific dataframe — tokenize","title":"Generate a token-specific dataframe — tokenize","text":"dataframe utterances, generate dataframe separates tokens utterances, assesses relative timing. returned data contains information original utterance (`uid`), well number tokens utterance (`nwords`), relative time token utterance (`relative_time`).","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate a token-specific dataframe — tokenize","text":"","code":"tokenize(data, utterancecol = \"utterance\")"},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate a token-specific dataframe — tokenize","text":"data talkr dataset utterancecol name column containing clean utterance (defaults \"utterance\")","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate a token-specific dataframe — tokenize","text":"dataframe details token utterance","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Generate a token-specific dataframe — tokenize","text":"relative time calculated token utterance equal duration (duration utterance divided number words), first token utterance beginning beginning utterance. input column provided argument `utterancecol` used generate tokens. advised provide version utterance cleaned stripped special characters. Cleaning performed function. Spaces used separate tokens.","code":""}]
+[{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"https://github.io/elpaco-escience/talkr/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Basic workflows for talkr","text":"talkr package designed working conversational data R.","code":""},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"loading-some-data","dir":"Articles","previous_headings":"","what":"Loading some data","title":"Basic workflows for talkr","text":"using IFADV corpus example data workflow talkr. corpus consisting 20 dyadic conversations Dutch, published Nederlandse Taalunie 2007 (source) snippet initializes talkr dataset using ifadv data. information IFADV dataset, see repository link. Essential talkr workflow minimal set data fields. following: source: source conversation (corpus can consist multiple sources) begin: begin time (ms) utterance end: end time (ms) utterance utterance: content utterance participant: person produced utterance init() function takes minimal fields generates uid: unique identifier utterance-level can used reference select filter specific utterances. init() function can used rename columns necessary. example, column participant named speaker, can rename follows: dataset can contain additional fields. instance, IFADV sample dataset also contain language (Dutch) utterance_raw (fuller, less processed version utterance content). also contains measures related turn-taking timing, including FTO (floor transfer offset, offset current turn prior participant, milliseconds) freq rank, frequency measures utterance content.","code":"data <- get_ifadv() data <- init(data) #> Warning in init(data): Column 'uid' already exists in the dataset. This column #> will be renamed to `original_uid`. talkr_data <- init(data, participant = \"speaker\")"},{"path":[]},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"summary-statistics","dir":"Articles","previous_headings":"Workflow 1: Quality control","what":"Summary statistics","title":"Basic workflows for talkr","text":"report_stats function provides simple summary dataset, including total number utterances, total duration conversation, number participants, number sources.","code":"report_stats(data) #> ### Corpus-level statistics #> #> nr of sources: 20 #> nr of participants: 40 #> nr of turns: 14022 #> mean turn duration (ms): 1257 #> turns per hour: 2806 #> total recording (min): 299.9 #> total recording (hours): 5 #> total speaking time (min): 293.8 #> total speaking time (hours): 4.9 #> #> ### Conversation-level statistics (per source) #> #> |source | nr of participants| total recording (ms)| total recording (min)| total speaking time (ms)| total speaking time (min)| #> |:---------------|------------------:|--------------------:|---------------------:|------------------------:|-------------------------:| #> |/dutch2/DVA10O | 2| 899341| 15| 829140| 13.8| #> |/dutch2/DVA11Q | 2| 899061| 15| 884734| 14.7| #> |/dutch2/DVA12S | 2| 900000| 15| 785323| 13.1| #> |/dutch2/DVA13U | 2| 900000| 15| 877328| 14.6| #> |/dutch2/DVA14W | 2| 900000| 15| 796924| 13.3| #> |/dutch2/DVA15Y | 2| 899379| 15| 1032776| 17.2| #> |/dutch2/DVA16AA | 2| 900000| 15| 974312| 16.2| #> |/dutch2/DVA17AC | 2| 899961| 15| 885257| 14.8| #> |/dutch2/DVA19AG | 2| 899236| 15| 788268| 13.1| #> |/dutch2/DVA1A | 2| 899538| 15| 879384| 14.7| #> |/dutch2/DVA20AI | 2| 899145| 15| 918466| 15.3| #> |/dutch2/DVA22AL | 2| 899838| 15| 879580| 14.7| #> |/dutch2/DVA24AK | 2| 900000| 15| 887106| 14.8| #> |/dutch2/DVA2C | 2| 899635| 15| 1097376| 18.3| #> |/dutch2/DVA3E | 2| 899091| 15| 851393| 14.2| #> |/dutch2/DVA4C | 2| 899108| 15| 850314| 14.2| #> |/dutch2/DVA6H | 2| 899981| 15| 889722| 14.8| #> |/dutch2/DVA7B | 2| 899664| 15| 913682| 15.2| #> |/dutch2/DVA8K | 2| 900000| 15| 807418| 13.5| #> |/dutch2/DVA9M | 2| 899773| 15| 802054| 13.4| #> #> ### Turn statistics (per source) #> #> |source | nr of turns| mean turn duration (ms)| shortest turn (ms)| longest turn (ms)| turns per hour| #> |:---------------|-----------:|-----------------------:|------------------:|-----------------:|--------------:| #> |/dutch2/DVA10O | 501| 1655| 233| 3001| 2005| #> |/dutch2/DVA11Q | 792| 1117| 124| 4188| 3171| #> |/dutch2/DVA12S | 640| 1227| 173| 3819| 2560| #> |/dutch2/DVA13U | 717| 1224| 196| 3173| 2868| #> |/dutch2/DVA14W | 721| 1105| 97| 3115| 2884| #> |/dutch2/DVA15Y | 770| 1341| 154| 4973| 3082| #> |/dutch2/DVA16AA | 604| 1613| 292| 3447| 2416| #> |/dutch2/DVA17AC | 782| 1132| 0| 3453| 3128| #> |/dutch2/DVA19AG | 648| 1216| 136| 2988| 2594| #> |/dutch2/DVA1A | 681| 1291| 143| 3417| 2725| #> |/dutch2/DVA20AI | 758| 1212| 167| 5472| 3035| #> |/dutch2/DVA22AL | 745| 1181| 143| 3448| 2981| #> |/dutch2/DVA24AK | 685| 1295| 121| 4421| 2740| #> |/dutch2/DVA2C | 765| 1434| 217| 3415| 3061| #> |/dutch2/DVA3E | 753| 1131| 175| 3199| 3015| #> |/dutch2/DVA4C | 719| 1183| 145| 3697| 2879| #> |/dutch2/DVA6H | 683| 1303| 122| 4786| 2732| #> |/dutch2/DVA7B | 691| 1322| 197| 3395| 2765| #> |/dutch2/DVA8K | 612| 1319| 200| 8712| 2448| #> |/dutch2/DVA9M | 755| 1062| 163| 2909| 3021|"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"visual-quality-checks","dir":"Articles","previous_headings":"Workflow 1: Quality control","what":"Visual quality checks","title":"Basic workflows for talkr","text":"plot_quality function provides visual check nature data, visualizing distribution turn durations, transition timing. Transition timing similar FTO, calculated without additional quality checks: transitions identified participant changes one turn next. transition time calculated difference beginning turn new participant, end turn previous one. default, plot_quality() plot entire dataset: Quality plots can also run specific source: quality plot consists three separate visualizations, designed allow rapid visual inspection spotting oddities: density plot turn durations. normally expected look like distribution peak around 2000ms (2 seconds) maximum lengths far exceed 10000ms (10 seconds) (Liesenfeld & Dingemanse 2022). goal plot allow eyeballing oddities like turns extreme durations sets turns exact duration (unlikely carefully segmented conversational data). density plot turn transition times. plot like expected look like normal distribution centered around 0-200ms (Stivers et al. 2009). Deviations may signal problems dataset, instance due imprecise automated annotation methods. scatterplot turn transition (x) turn duration (y). combines distributions expected look like cloud datapoints thickest middle region. standout patterns (instance, turns whose duration equal transition time) indicative problems segmentation timing data. three plots can also generated separately:","code":"plot_quality(data) plot_quality(data, source = \"/dutch2/DVA8K\") plot_density(data, colname=\"duration\", title=\"Turn durations\",xlab=\"duration (ms)\") plot_density(data, colname=\"FTO\", title=\"Turn transitions (FTO)\",xlab=\"FTO (ms)\") plot_scatter(data, colname_x=\"FTO\",colname_y=\"duration\",title=\"Turn transitions and durations\",xlab=\"transition (ms)\", ylab=\"duration (ms)\")"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"workflow-2-plot-conversations","dir":"Articles","previous_headings":"","what":"Workflow 2: Plot conversations","title":"Basic workflows for talkr","text":"Another key use talkr visualize conversational patterns. first way geom_turn(), ggplot2-compatible geom visualizes timing duration turns conversation. can start simply visualizing conversations dataset. sample first four plot first minute . display together using facet_wrap() source. often, want plot single conversation explore detail. Let’s zoom one first four. plot without tweaking, helpful: conversation 15 minutes long hard appreciate structure put single line. similar conversational transcripts: present conversation left--right, top--bottom grid. , first need divide long conversation number shorter lines. using add_lines(). default, divide conversation lines 60000ms (1 minute), creating many lines needed. now, let’s focus first 4 minutes, can filtering line_id < 5 ’ve added lines. can style plot like using available variables. instance, can highlight turns produced overlap: far just visualized temporal structure. conversational turns typically consist words elements.","code":"library(ggplot2) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union # we simplify participant names conv <- data |> group_by(source) |> mutate(participant = as.character(factor(participant, labels=c(\"A\",\"B\"),ordered=T))) # select first four conversations these_sources <- unique(data$source)[1:4] conv |> filter(end < 60000, # select first 60 seconds source %in% these_sources) |> # filter to keep only these conversations ggplot(aes(x = end, y = participant)) + geom_turn(aes( begin = begin, end = end)) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() + facet_wrap(~source) # let's facet to show the conversations side by side conv |> filter(source == \"/dutch2/DVA12S\") |> ggplot(aes(x = end, y = participant)) + geom_turn(aes( begin = begin, end = end)) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() conv <- conv |> add_lines(line_duration = 60000) conv |> filter(source == \"/dutch2/DVA12S\", line_id < 5) |> # limit to the first five lines ggplot(aes(x = line_end, y = line_participant)) + ggtitle(\"The first four minutes from DVA12S\") + geom_turn(aes( begin = line_begin, # the begin and end aesthetics are now line-relative end = line_end)) + scale_y_reverse(breaks = seq(1, max(conv$line_id))) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() p <- last_plot() p + ggtitle(\"Turns produced in overlap\") + geom_turn(aes( begin = line_begin, end = line_end, fill=overlap, colour=overlap)) + scale_fill_discrete(na.translate=F) + # stop NA value from showing up in legend scale_colour_discrete(na.translate=F) # stop NA value from showing up in legend p + ggtitle(\"Turns produced in overlap\") + geom_turn(aes( begin = line_begin, end = line_end, fill=overlap, colour=overlap)) + scale_fill_discrete(na.translate=F) + # stop NA value from showing up in legend scale_colour_discrete(na.translate=F) # stop NA value from showing up in legend"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"looking-into-tokens","dir":"Articles","previous_headings":"Workflow 2: Plot conversations","what":"Looking into tokens","title":"Basic workflows for talkr","text":"can start looking internal structure turns plotting occurrence tokens. , first need generate token-specific dataframe tokenize(). calculate token frequencies tokens selected dataset (data default). also calculates relative positions time individual tokens turn. Finally, provides simple positional classification (token appears ), first (token turn-initial), last (token utterance-final), middle (token first last). information tokens hand, can start asking questions. instance, relative frequency words relate position turn? explore question, let’s look shorter excerpt: 1 minute total, divided 4 lines. , create dataframe this_conv, dividing 4 lines 15 seconds . also create dataframe these_tokens tokenized turn elements conversation, divided way. Finally, can also print content elements. , pick frequent turn-initial elements plotting, highlight another layer geom_token() plot text using geom_label_repel():","code":"conv_tokens <- conv |> tokenize() this_conv <- conv |> add_lines(line_duration=15000) |> filter(source == \"/dutch2/DVA12S\", line_id < 5) # let's look at the first three lines these_tokens <- conv_tokens |> add_lines(line_duration=15000, time_columns = \"relative_time\") |> filter(source == \"/dutch2/DVA12S\", line_id < 5) this_conv |> ggplot(aes(x = line_end, y = line_participant)) + ggtitle(\"Relative frequency of elements within turns\") + scale_y_reverse() + # we reverse the axis because lines run top to bottom geom_turn(aes( begin = line_begin, end = line_end)) + geom_token(data=these_tokens, aes(x=line_relative_time, size=frequency)) + xlab(\"Time (ms)\") + ylab(\"\") + theme_turnPlot() p <- last_plot() these_tokens_first <- these_tokens |> filter(order==\"first\", rank < 10) p + ggtitle(\"Some frequent turn-initial elements\") + geom_token(data=these_tokens_first, aes(x=line_relative_time), color=\"red\") + ggrepel::geom_label_repel(data=these_tokens_first, aes(x=line_relative_time, label=token), direction=\"y\")"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"notes","dir":"Articles","previous_headings":"Workflow 2: Plot conversations","what":"Notes","title":"Basic workflows for talkr","text":"init function can also used reformat timestamps. Default “ms”, expects milliseconds. ‘%H:%M:%OS’ format eg. 00:00:00.010 milliseconds (10). See ‘?strptime’ format examples. Token frequencies calculated entire dataset. want source-specific data, can filter source prior tokenization:","code":"init(format_timestamps=\"ms\") tokens_DVA9M <- data |> filter(source == \"/dutch2/DVA9M\") |> tokenize() tokens_DVA9M"},{"path":"https://github.io/elpaco-escience/talkr/articles/workflows.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"Basic workflows for talkr","text":"Liesenfeld, Andreas, Mark Dingemanse. 2022. ‘Building Curating Conversational Corpora Diversity-Aware Language Science Technology’. Proceedings 13th Conference Language Resources Evaluation (LREC 2022), 1178–92. Marseille. doi:10.48550/arXiv.2203.03399. Stivers, Tanya, N. J. Enfield, Penelope Brown, C. Englert, Makoto Hayashi, Trine Heinemann, Gertie Hoymann, Federico Rossano, J. P. de Ruiter, Kyung-Eun Yoon, Stephen C. Levinson. 2009. ‘Universals Cultural Variation Turn-Taking Conversation’. Proceedings National Academy Sciences 106 (26): 10587–92. doi:10.1073/pnas.0903616106.","code":""},{"path":"https://github.io/elpaco-escience/talkr/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Mark Dingemanse. Author, maintainer. Barbara Vreede. Author. Eva Viviani. Author. Pablo Rodríguez-Sánchez. Author. Andreas Liesenfeld. Contributor. Netherlands eScience Center. Copyright holder, funder.","code":""},{"path":"https://github.io/elpaco-escience/talkr/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Dingemanse M, Vreede B, Viviani E, Rodríguez-Sánchez P (2024). talkr: Plotting Conversation Data. R package version 0.1.2.","code":"@Manual{, title = {talkr: Plotting Conversation Data}, author = {Mark Dingemanse and Barbara Vreede and Eva Viviani and Pablo Rodríguez-Sánchez}, year = {2024}, note = {R package version 0.1.2}, }"},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"talkr-","dir":"","previous_headings":"","what":"Plotting Conversation Data","title":"Plotting Conversation Data","text":"talkr offers set convenience functions quality control, visualisation analysis conversational data. provides support least two common needs people working conversational corpora: (1) quality control offering rapid insights nature, timing, quality time-aligned annotations conversational corpus; (2) compelling visualisations offering range plotting functions play well ggplot tidyverse. Note: talkr plays well companion python package scikit-talk, focused processing conversational data readying downstream analysis.","code":""},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Plotting Conversation Data","text":"can install development version talkr GitHub : talkr submitted CRAN, soon installable :","code":"# install.packages(\"devtools\") devtools::install_github(\"elpaco-escience/talkr\", build_vignettes = TRUE) install.packages(\"talkr\")"},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"workflows","dir":"","previous_headings":"","what":"Workflows","title":"Plotting Conversation Data","text":"Two workflows talkr described workflows vignette. can also access vignette RStudio, running:","code":"vignette(\"workflows\", package = \"talkr\")"},{"path":"https://github.io/elpaco-escience/talkr/index.html","id":"contact","dir":"","previous_headings":"","what":"Contact","title":"Plotting Conversation Data","text":"Questions? Comments? welcome! package active development still rough edges. Interact us issues questions suggestions.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":null,"dir":"Reference","previous_headings":"","what":"Add information for line-by-line visualization — add_lines","title":"Add information for line-by-line visualization — add_lines","text":"function adds columns dataset adds line ID, changes columns timestamps relative beginning line, data can visualized line--line. participant column also adjusted create Y-coordinate speaker. line duration set 60 seconds default.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add information for line-by-line visualization — add_lines","text":"","code":"add_lines(data, time_columns = c(\"begin\", \"end\"), line_duration = 60000)"},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add information for line-by-line visualization — add_lines","text":"data dataset divide lines time_columns columns timestamps need adjusted line-relative time line_duration length line (ms)","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add information for line-by-line visualization — add_lines","text":"data set added columns: `line_id`, `line_participant`, `line_column` every column `time_columns`","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/add_lines.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Add information for line-by-line visualization — add_lines","text":"transformation can done multiple columns time-stamped data. Use `time_columns` argument supply names one columns transformed.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate conversation properties — calculate_timing","title":"Calculate conversation properties — calculate_timing","text":"dataframe generated conversation properties related timing. data made quality control purposes , contain sophisticated transition calculation methods. , refer python package `scikit-talk`.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate conversation properties — calculate_timing","text":"","code":"calculate_timing(data)"},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate conversation properties — calculate_timing","text":"data talkr data frame","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/calculate_timing.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate conversation properties — calculate_timing","text":"data frame containing UIDs calculated columns turn_duration, transition_time","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":null,"dir":"Reference","previous_headings":"","what":"Check the presence of necessary columns in a dataset — check_columns","title":"Check the presence of necessary columns in a dataset — check_columns","text":"Check presence necessary columns dataset","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check the presence of necessary columns in a dataset — check_columns","text":"","code":"check_columns(data, columns)"},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check the presence of necessary columns in a dataset — check_columns","text":"data dataset check columns vector column names must present","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_columns.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check the presence of necessary columns in a dataset — check_columns","text":"nothing, throws error column missing","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":null,"dir":"Reference","previous_headings":"","what":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"Uses check_columns() check : - begin - end - participant - utterance - source - uid","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"","code":"check_talkr(data)"},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"data dataset check","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_talkr.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Check the presence of talkr-workflow columns in the dataset. — check_talkr","text":"Verifies begin end columns numeric, likely indicate milliseconds.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":null,"dir":"Reference","previous_headings":"","what":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"Verify timing columns numeric likely indicate milliseconds.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"","code":"check_time(column, name)"},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"column vector timing information name name column","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/check_time.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Verify that timing columns are numeric and likely indicate milliseconds. — check_time","text":"nothing, throws error column numeric warns column may indicate milliseconds","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_token.html","id":null,"dir":"Reference","previous_headings":"","what":"Plot individual tokens — geom_token","title":"Plot individual tokens — geom_token","text":"separate data frame containing tokenized data, plot individual tokens estimated time. Data must provided separately, contain column participant (y) column time (x).","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_token.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Plot individual tokens — geom_token","text":"","code":"geom_token( data, mapping = NULL, stat = \"identity\", position = \"identity\", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )"},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_token.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Plot individual tokens — geom_token","text":"data tokenized data frame (see `tokenize()`). mapping Set aesthetic mappings created aes(). specified inherit.aes = TRUE (default), combined default mapping top level plot. must supply mapping plot mapping. stat statistical transformation use data layer. using geom_*() function construct layer, stat argument can used override default coupling geoms stats. stat argument accepts following: Stat ggproto subclass, example StatCount. string naming stat. give stat string, strip function name stat_ prefix. example, use stat_count(), give stat \"count\". information ways specify stat, see layer stat documentation. position position adjustment use data layer. can used various ways, including prevent overplotting improving display. position argument accepts following: result calling position function, position_jitter(). method allows passing extra arguments position. string naming position adjustment. give position string, strip function name position_ prefix. example, use position_jitter(), give position \"jitter\". information ways specify position, see layer position documentation. ... arguments passed layer()'s params argument. arguments broadly fall one 4 categories . Notably, arguments position argument, aesthetics required can passed .... Unknown arguments part 4 categories ignored. Static aesthetics mapped scale, fixed value apply layer whole. example, colour = \"red\" linewidth = 3. geom's documentation Aesthetics section lists available options. 'required' aesthetics passed params. Please note passing unmapped aesthetics vectors technically possible, order required length guaranteed parallel input data. constructing layer using stat_*() function, ... argument can used pass parameters geom part layer. example stat_density(geom = \"area\", outline.type = \"\"). geom's documentation lists parameters can accept. Inversely, constructing layer using geom_*() function, ... argument can used pass parameters stat part layer. example geom_area(stat = \"density\", adjust = 0.5). stat's documentation lists parameters can accept. key_glyph argument layer() may also passed .... can one functions described key glyphs, change display layer legend. na.rm FALSE, default, missing values removed warning. TRUE, missing values silently removed. show.legend logical. layer included legends? NA, default, includes aesthetics mapped. FALSE never includes, TRUE always includes. can also named logical vector finely select aesthetics display. inherit.aes FALSE, overrides default aesthetics, rather combining . useful helper functions define data aesthetics inherit behaviour default plot specification, e.g. borders().","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_turn.html","id":null,"dir":"Reference","previous_headings":"","what":"Show turn-taking in visualized conversations — geom_turn","title":"Show turn-taking in visualized conversations — geom_turn","text":"Show turn-taking visualized conversations","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_turn.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Show turn-taking in visualized conversations — geom_turn","text":"","code":"geom_turn( mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", ..., na.rm = FALSE, height = 0.5, show.legend = NA, inherit.aes = TRUE )"},{"path":"https://github.io/elpaco-escience/talkr/reference/geom_turn.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Show turn-taking in visualized conversations — geom_turn","text":"mapping Set aesthetic mappings created `ggplot2::aes()`. Requires specification `begin` `end` turns. Inherits default mapping top level plot, `inherit.aes` set `TRUE` (default). data data displayed layer. three options: NULL, default, data inherited plot data specified call ggplot(). data.frame, object, override plot data. objects fortified produce data frame. See fortify() variables created. function called single argument, plot data. return value must data.frame, used layer data. function can created formula (e.g. ~ head(.x, 10)). stat statistical transformation use data layer. using geom_*() function construct layer, stat argument can used override default coupling geoms stats. stat argument accepts following: Stat ggproto subclass, example StatCount. string naming stat. give stat string, strip function name stat_ prefix. example, use stat_count(), give stat \"count\". information ways specify stat, see layer stat documentation. position position adjustment use data layer. can used various ways, including prevent overplotting improving display. position argument accepts following: result calling position function, position_jitter(). method allows passing extra arguments position. string naming position adjustment. give position string, strip function name position_ prefix. example, use position_jitter(), give position \"jitter\". information ways specify position, see layer position documentation. ... arguments passed layer()'s params argument. arguments broadly fall one 4 categories . Notably, arguments position argument, aesthetics required can passed .... Unknown arguments part 4 categories ignored. Static aesthetics mapped scale, fixed value apply layer whole. example, colour = \"red\" linewidth = 3. geom's documentation Aesthetics section lists available options. 'required' aesthetics passed params. Please note passing unmapped aesthetics vectors technically possible, order required length guaranteed parallel input data. constructing layer using stat_*() function, ... argument can used pass parameters geom part layer. example stat_density(geom = \"area\", outline.type = \"\"). geom's documentation lists parameters can accept. Inversely, constructing layer using geom_*() function, ... argument can used pass parameters stat part layer. example geom_area(stat = \"density\", adjust = 0.5). stat's documentation lists parameters can accept. key_glyph argument layer() may also passed .... can one functions described key glyphs, change display layer legend. na.rm FALSE, default, missing values removed warning. TRUE, missing values silently removed. height height turn-taking rectangles show.legend logical. layer included legends? NA, default, includes aesthetics mapped. FALSE never includes, TRUE always includes. can also named logical vector finely select aesthetics display. inherit.aes FALSE, overrides default aesthetics, rather combining . useful helper functions define data aesthetics inherit behaviour default plot specification, e.g. borders().","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":null,"dir":"Reference","previous_headings":"","what":"Get IFADV data — get_ifadv","title":"Get IFADV data — get_ifadv","text":"IFA Dialog Video corpus data Available public repository: https://github.com/elpaco-escience/ifadv","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get IFADV data — get_ifadv","text":"","code":"get_ifadv( source = \"https://raw.githubusercontent.com/elpaco-escience/ifadv/csv/data/ifadv.csv\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get IFADV data — get_ifadv","text":"source (default = \"https://raw.githubusercontent.com/elpaco-escience/ifadv/csv/data/ifadv.csv\")","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get IFADV data — get_ifadv","text":"data frame containing IFADV dataset","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/get_ifadv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get IFADV data — get_ifadv","text":"function requires internet connection.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/ggplot2-ggproto.html","id":null,"dir":"Reference","previous_headings":"","what":"GeomToken — GeomToken","title":"GeomToken — GeomToken","text":"GeomToken GeomTurn","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":null,"dir":"Reference","previous_headings":"","what":"Initialize a `talkr` dataset — init","title":"Initialize a `talkr` dataset — init","text":"dataframe object, generate talkr dataset. dataset contains columns used throughout talkr infrastructure visualize conversations language corpora. Initializing talkr dataset first step talkr workflow.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Initialize a `talkr` dataset — init","text":"","code":"init( data, source = \"source\", begin = \"begin\", end = \"end\", participant = \"participant\", utterance = \"utterance\", format_timestamps = \"ms\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Initialize a `talkr` dataset — init","text":"data dataframe object source column name identifying conversation source (e.g. filename; used unique conversation ID). different sources data, set parameter `NULL`. begin column name begin time utterance (milliseconds) end column name end time utterance (milliseconds) participant column name participant produced utterance utterance column name utterance format_timestamps format timestamps begin end columns. Default \"ms\", expects milliseconds. `%H:%M:%OS` format eg. 00:00:00.010 milliseconds (10). See `?strptime` format examples.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/init.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Initialize a `talkr` dataset — init","text":"dataframe object columns needed talkr workflow","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a density plot of a specific column — plot_density","title":"Make a density plot of a specific column — plot_density","text":"Make density plot specific column","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a density plot of a specific column — plot_density","text":"","code":"plot_density( data, colname, title = \"Density\", xlab = \"value\", ylab = \"density\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a density plot of a specific column — plot_density","text":"data data frame containing column colname column name density plotted title plot title xlab x-axis label ylab y-axis label","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_density.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Make a density plot of a specific column — plot_density","text":"recorded plot","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":null,"dir":"Reference","previous_headings":"","what":"Check source quality by plotting timing data — plot_quality","title":"Check source quality by plotting timing data — plot_quality","text":"Check source quality plotting timing data","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check source quality by plotting timing data — plot_quality","text":"","code":"plot_quality(data, source = \"all\", saveplot = FALSE)"},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check source quality by plotting timing data — plot_quality","text":"data talkr data frame source source checked (default \"\", source selected) saveplot save plot file (default FALSE)","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_quality.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check source quality by plotting timing data — plot_quality","text":"list recorded plots","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":null,"dir":"Reference","previous_headings":"","what":"Make a scatter plot of two columns — plot_scatter","title":"Make a scatter plot of two columns — plot_scatter","text":"Make scatter plot two columns","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make a scatter plot of two columns — plot_scatter","text":"","code":"plot_scatter( data, colname_x, colname_y, title = \"Scatter\", xlab = \"x\", ylab = \"y\" )"},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make a scatter plot of two columns — plot_scatter","text":"data data frame containing columns colname_x name column plotted x-axis colname_y name column plotted y-axis title plot title xlab x-axis label ylab y-axis label","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/plot_scatter.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Make a scatter plot of two columns — plot_scatter","text":"recorded plot","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":null,"dir":"Reference","previous_headings":"","what":"Report corpus-level and conversation-level statistics — report_stats","title":"Report corpus-level and conversation-level statistics — report_stats","text":"Basic conversation statistics reported console: - Corpus-level statistics, reporting dataset whole; - Conversation-level statistics, reporting per source.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Report corpus-level and conversation-level statistics — report_stats","text":"","code":"report_stats(data)"},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Report corpus-level and conversation-level statistics — report_stats","text":"data talkr dataset","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/report_stats.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Report corpus-level and conversation-level statistics — report_stats","text":"input function must `talkr` dataset, containing columns `source`, `participant`, `begin`, `end`. Time stamps columns `begin` `end` must milliseconds. easily transform dataset `talkr` dataset, consult `talkr::init()`.","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/theme_turnPlot.html","id":null,"dir":"Reference","previous_headings":"","what":"theme for the turn plot — theme_turnPlot","title":"theme for the turn plot — theme_turnPlot","text":"theme turn plot","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/theme_turnPlot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"theme for the turn plot — theme_turnPlot","text":"","code":"theme_turnPlot(base_size = 11, base_family = \"serif\", ticks = TRUE)"},{"path":"https://github.io/elpaco-escience/talkr/reference/theme_turnPlot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"theme for the turn plot — theme_turnPlot","text":"base_size int base_family chr ticks bool","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate a token-specific dataframe — tokenize","title":"Generate a token-specific dataframe — tokenize","text":"dataframe utterances, generate dataframe separates tokens utterances, assesses relative timing. returned data contains information original utterance (`uid`), well number tokens utterance (`nwords`), relative time token utterance (`relative_time`).","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate a token-specific dataframe — tokenize","text":"","code":"tokenize(data, utterancecol = \"utterance\")"},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate a token-specific dataframe — tokenize","text":"data talkr dataset utterancecol name column containing clean utterance (defaults \"utterance\")","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate a token-specific dataframe — tokenize","text":"dataframe details token utterance","code":""},{"path":"https://github.io/elpaco-escience/talkr/reference/tokenize.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Generate a token-specific dataframe — tokenize","text":"relative time calculated token utterance equal duration (duration utterance divided number words), first token utterance beginning beginning utterance. input column provided argument `utterancecol` used generate tokens. advised provide version utterance cleaned stripped special characters. Cleaning performed function. Spaces used separate tokens.","code":""}]