-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
60 changed files
with
1,427 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
[celex]: https://catalog.ldc.upenn.edu/LDC96L14 | ||
[elan]: https://archive.mpi.nl/tla/elan | ||
[htk]: https://htk.eng.cam.ac.uk/ | ||
[labb-cat]: https://nzilbb.github.io/labbcat-doc/ | ||
[labbcat-R]: https://nzilbb.github.io/labbcat-R/ | ||
[labbcat-py]: https://nzilbb.github.io/labbcat-py/ | ||
[sign up]: https://docs.google.com/forms/d/e/1FAIpQLSdFclWfbWZ-aM-h3Givrr4mH9T4MjyWaeQ-TpTMriC5mOcoqw/viewform | ||
[password reset]: https://docs.google.com/forms/d/e/1FAIpQLSdW9U912VhiZN2sjFk6jQFulhY82YNdqkQQRKVJT2LvAFvqnw/viewform?usp=sf_link |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,5 @@ | ||
<footer class="site-footer"> | ||
<p> | ||
<a href="https://apls.pitt.edu/labbcat" target="_blank">Sign in to APLS</a> | ||
| | ||
<a href="https://docs.google.com/forms/d/e/1FAIpQLSdFclWfbWZ-aM-h3Givrr4mH9T4MjyWaeQ-TpTMriC5mOcoqw/viewform?usp=sf_link" target="_blank">Sign up</a> | ||
</p> | ||
</footer> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# `_keyterms/` | ||
|
||
This directory holds: | ||
|
||
- A YAML file (`keyterms.yml`) with a glossary of key terms | ||
- A template YAML file (`keyterm-template.yml`) that is used for new key terms | ||
- `update-keyterm-list.R`, an R script that trawls `doc/` pages for key terms, populates `keyterms.yml` with new terms, and updates the `incontext` lists of back-links in `keyterms.yml` | ||
- `session-info.txt`, output of `sessionInfo()` within `sync-layers.R` | ||
|
||
`keyterms.yml`, in turn, will get used to populate the glossary page (`doc/glossary`). | ||
|
||
For the meaning of YAML attributes, see `keyterm-template.yml`. | ||
|
||
I might want to add a "category" attribute for sorting/separating key terms into glossary sections; | ||
currently I'm thinking "LaBB-CAT" (e.g., _transcript_) vs. "Linguistics" (e.g., _sociolinguistic interview_) vs. "Data science" (e.g., _unique identifier_). | ||
I'll hold off for now, though, since that'd tempt me to create definitions for lots of terms that are low-priority. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
transcript: | ||
short_definition: A single NP (which may benefit from further context that it doesn't get here). | ||
# category (MAYBE): Section of the glossary where the word will be found. Either "LaBB-CAT" (e.g., _transcript_), "Linguistics" (e.g., _sociolinguistic interview_), or "Data science" (e.g., _unique identifier_) | ||
definition: | | ||
Using multiple paragraphs if necessary. | ||
Can also include Markdown styling (incl. callouts) | ||
incontext: | ||
- links to **auto-generated** anchors | ||
- on pages | ||
- where the term appears | ||
- (once per page) | ||
related: | ||
- similar concepts | ||
- and/or | ||
- terms that could be easily confused (e.g., _transcript_ vs. _transcription_) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
Transcript: | ||
short_definition: A collection of time-aligned annotations across several layers corresponding to a single sound file, plus metadata about the sound file. | ||
definition: | | ||
In APLS, each transcript corresponds to part of a sociolinguistic interview with a single interviewee. | ||
Interviews are split into transcripts according to the original recording files. | ||
Transcripts are named with the interviewee's speaker code, the interview section, an optional numeric suffix if that interview section took up more than recording file, and `.eaf` (the [Elan][] transcription file format). | ||
Transcripts can be viewed on [transcript pages](doc/view-transcript). | ||
incontext: | ||
- links to **auto-generated** anchors | ||
- on pages | ||
- where the term appears | ||
- (once per page) | ||
related: | ||
- In APLS, each transcript has one [main participant](#main-participant) | ||
- "[Transcript attributes](#transcript-attributes): Metadata about the sound file" | ||
- Not to be confused with [transcriptions](#transcription), data files external to APLS | ||
- "[Layer](#layer)" | ||
- "[Annotation](#annotation)" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
library(tidyr) | ||
library(purrr) | ||
library(stringr) | ||
library(dplyr) | ||
library(yaml) | ||
|
||
##Parameters: Glossary & template files, whether or not to write updated glossary | ||
file_glossary <- "keyterms.yml" | ||
file_template <- "keyterm-template.yml" | ||
write_glossary <- TRUE | ||
|
||
##Get terms from pages | ||
##Get doc/ pages that have 1+ .keyterm | ||
pages <- system("grep -rl 'class=\"keyterm' ../doc", intern=T) | ||
##Construct dataframe of terms and the pages where they appear | ||
keyterms <- tibble(page = pages, | ||
pagefile = map(page, readLines), | ||
pagelink = map_chr(pagefile, | ||
~ .x |> | ||
str_subset("^permalink: ") |> | ||
str_remove("^permalink: ")), | ||
pagetitle = map_chr(pagefile, | ||
~ .x |> | ||
str_subset("^title: ") |> | ||
str_remove("^title: ")), | ||
term = map(pagefile, | ||
##Get keyterms as a character vector | ||
~ .x |> | ||
str_extract_all("(?<=<span class=\"keyterm\">).+?(?=</span>)") |> | ||
unlist() |> | ||
##Normalize for case and pluralization | ||
str_to_lower() |> | ||
str_remove("s$"))) |> | ||
##One row per term | ||
unnest(term) |> | ||
##Only unique combinations of page/term | ||
distinct() |> | ||
##Add link | ||
mutate(link = str_glue("[{pagetitle}]({pagelink}#keyterm-{str_replace_all(term, ' ', '-')})") |> | ||
as.character()) | ||
|
||
##Read template & empty out incontext | ||
template <- read_yaml(file_template)[[1]] | ||
template$incontext <- character(0L) | ||
|
||
##Add to glossary | ||
##Read current glossary | ||
glossary <- read_yaml(file_glossary) | ||
##Get terms that need to be added | ||
curr_terms <- names(glossary) | ||
new_terms <- setdiff(keyterms$term, str_to_lower(curr_terms)) | ||
##Repeat template with new_terms | ||
new_gloss <- new_terms |> | ||
sort() |> | ||
set_names() |> | ||
map(~ template) | ||
##Add new_terms | ||
glossary <- c(glossary, new_gloss) | ||
|
||
##Update terms' incontext entries | ||
##Get list of backlinks for each term | ||
backlinks <- | ||
keyterms |> | ||
select(term, link) |> | ||
chop(link) |> | ||
pull(link, term) | ||
##In glossary order | ||
backlinks <- backlinks[str_to_lower(names(glossary))] | ||
##Update incontext | ||
glossary <- glossary |> | ||
map2(backlinks, ~ assign_in(.x, "incontext", .y)) | ||
|
||
##Optionally write glossary | ||
if (write_glossary) { | ||
write_yaml(glossary, file_glossary) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# `_layers` | ||
|
||
This directory holds: | ||
|
||
- Markdown files with layer definitions (one file per layer), with | ||
- Long prose description in the body of the file | ||
- [Attributes in a YAML header](#yaml-attributes), including | ||
- Attributes from layer definitions saved to APLS (`synced`) | ||
- Manually-input attributes | ||
- `sync-layers.R`, an R script that creates files and populates/updates files' YAML headers based on layer definitions saved to APLS | ||
- `session-info.txt`, output of `sessionInfo()` within `sync-layers.R` | ||
|
||
These files, in turn, will get used to populate the layer reference pages in `doc/`. | ||
(Not _all_ the YAML fields will necessarily go into those pages.) | ||
|
||
|
||
## YAML attributes | ||
|
||
- `name`: Layer name | ||
- `synced`: Attributes from layer definitions saved to APLS, automatically synced by `sync-layers.R` | ||
- `parallel`: Whether there are parallel tags per annotation (e.g., multiple possible phonemic representations) | ||
- `notation`: Notation system used (links to `doc/notation-systems`) | ||
- `primary`: Main category of notation system (e.g., English, downcased English, Penn Treebank tags, DISC) | ||
- `additional`: Symbols that augment the primary notation system (e.g., transcription prosody symbols, morpheme marker, DISC syllabification/stress, foll_segment pause symbol) | ||
- `inputs`: Layers and/or other inputs (e.g., APLS custom dictionary) that go into the layer. In a bulleted list where each entry has: | ||
- `number`: Index for referring to the input in the body of the Markdown file (also sequential input) | ||
- `input`: Name of input | ||
- `type`: `layer` or `other` | ||
- `layer_manager`: If applicable | ||
- `versions`: APLS versions (once versioning begins in earnest), where layer... | ||
- `first_appeared` | ||
- `last_modified` | ||
- `last_modified_sync_date`: When the _layer config_ was last modified | ||
- `last_modified_date`: When the _Markdown_ file was last modified (may be after `last_modified_sync_date`). This works the same as `last_modified_date` in the `doc/` Markdown files. | ||
|
||
|
||
## Rules for use | ||
|
||
- **Don't create new Markdown files** for new layers. Instead: | ||
1. Create the new layer straightaway in APLS. This should include: | ||
- Any auxiliaries, if applicable | ||
- A short description suitable for: | ||
- Tooltip in APLS | ||
- The "quick reference card" table at `doc/quick-reference-card` | ||
1. Run `sync-layers.R` to create a Markdown file for the new layer and populate its YAML header | ||
- If you want to test out a layer config **without the layer showing up in `doc/`, add it to the `testing` project** (you're probably doing that anyway!). While all layers in APLS get a Markdown file, those with `project: testing` get ignored | ||
1. Fill the following YAML fields manually: `inputs`, `downstream layers`, `notation` (with children `primary`, `additional`) | ||
- `additional` | ||
1. Fill the body of the Markdown file with a long description | ||
- If you **change _anything_ about a layer config in APLS**: | ||
1. Re-run `sync-layers.R` to update that layer's YAML header | ||
1. It may be necessary to update `last_modified_sync_date` and/or `versions: last_updated` manually, in case it's a change that `sync-layers.R` can't detect | ||
- If you **delete a layer in APLS**, it won't be deleted here...yet | ||
- I like the idea of having `sync-layers.R` shunt deleted files to a `deleted/` subfolder, or adding a `deleted: yes` flag that tells `doc/` to ignore that Markdown file. But that's not a priority right now | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
name: dictionary_phonemes | ||
synced: | ||
<!-- Usual LaBB-CAT layer data, auto-populated by sync-layers.R --> | ||
parallel: yes | ||
notation: | ||
primary: disc | ||
inputs: | ||
- input: | ||
number: | ||
type: | ||
layer_manager: | ||
versions: | ||
first_appeared: 0.1.0 | ||
last_updated: 0.1.0 | ||
last_modified_sync_date: | ||
last_modified_date: 2024-10-16T11:47:55-04:00 | ||
--- | ||
|
Oops, something went wrong.