Skip to content

Commit

Permalink
added outline for website
Browse files Browse the repository at this point in the history
  • Loading branch information
KennethEnevoldsen committed Oct 23, 2023
1 parent 7604a23 commit 49f767c
Show file tree
Hide file tree
Showing 11 changed files with 92 additions and 4 deletions.
Binary file added docs/_static/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/datasheets/danews.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,8 @@ writing style which is unlikely to reflect the Danish language as a whole.

**Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?**

Data will only be available at the entity during the project. After the project the data will be archived for a period of five years to comply with the university [policy] for research integrity. After the five years, the data will be registered at the national archives as required by [executive order 514](https://www.retsinformation.dk/eli/lta/2020/514) for potential long-term deposit.
Data will only be available at the entity during the project. If you wish access to the dataset you will have to come to an agreement with the individuals
Danish newspapers potentially through Infomedia.

### Citation

Expand Down
2 changes: 1 addition & 1 deletion docs/datasheets/daradio.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ This dataset is static and does not evolve over time with the language, thus wil

**Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?**

Data will only be available at the entity during the project. After the project the data will be archived for a period of five years to comply with the university [policy] for research integrity. After the five years, the data will be registered at the national archives as required by [executive order 514](https://www.retsinformation.dk/eli/lta/2020/514) for potential long-term deposit.
Data will only be available at the entity during the project. An equivalent or updated dataset can be requested at the Royal Danish Library.


### Citation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ dataset going forward.

**Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?**

Data will only be available at the entity during the project. After the project the data will be archived for a period of five years to comply with the university [policy] for research integrity. After the five years, the data will be registered at the national archives as required by [executive order 514](https://www.retsinformation.dk/eli/lta/2020/514) for potential long-term deposit.
Data will only be available at the entity during the project. An equivalent or updated dataset can be requested at the Royal Danish Library.

### Citation
If you wish to cite this work please see our GitHub page for an up to date citation: https://github.com/centre-for-humanities-computing/danish-foundation-models
Expand Down
4 changes: 4 additions & 0 deletions docs/dcc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# DCC <sub>v1</sub>

The DCC is a composite corpus consisting of the following subcorpora.

2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

This website is under construction 🛠️
6 changes: 6 additions & 0 deletions docs/models_speech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This section contain references to models trained on speech

| Model | Model type |
| ----------------------------------------------------------------------------------- | ---------------------------- |
| [xls-r-300m-danish-nst-cv9](https://huggingface.co/chcaa/xls-r-300m-danish-nst-cv9) | Automatic speech recognition |
| [chcaa/xls-r-300m-nst-cv9-da](https://huggingface.co/chcaa/xls-r-300m-nst-cv9-da) | Automatic speech recognition |
9 changes: 9 additions & 0 deletions docs/models_text.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
This section contain references to models trained on text


| Model | Model type | Size (parameters) |
| --------------------------------------------------------------------------- | ---------- | ----------------- |
| [dfm-encoder-large-v1](https://huggingface.co/chcaa/dfm-encoder-large-v1) | Encoder | large (355M) |
| [dfm-encoder-medium-v1](https://huggingface.co/chcaa/dfm-encoder-medium-v1) | Encoder | medium (110M) |
| [dfm-encoder-small-v1](https://huggingface.co/chcaa/dfm-encoder-small-v1) | Encoder | small (22M) |

5 changes: 4 additions & 1 deletion makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,7 @@ validate: ## Run all checks

pr: ## Run relevant tests before PR
make validate
gh pr create -w
gh pr create -w

docs:
mkdocs serve
58 changes: 58 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
site_name: Danish Foundation Models
docs_dir: "docs/"
repo_url: https://github.com/centre-for-humanities-computing/danish-foundation-models
watch: [docs/]
theme:
name: material
favicon: _static/icon.png
logo: _static/icon.png
features:
- navigation.tracking
- navigation.tabs
- navigation.sections
- toc.integrate
- navigation.top
- search.suggest
- search.highlight
- content.tabs.link
- content.tooltips
- navigation.footer
- navigation.indexes
- toc.follow
- pymdownx.caret
- pymdownx.tilde
palette:
primary: white
accent: light blue
# automatic dark mode is

markdown_extensions:
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
- toc:
permalink: true

copyright: Copyright &copy; 2023 Danish Foundation Models Project

nav:
- About: index.md
- Models:
- Text: models_text.md
- Speech: models_speech.md
- Datasets:
- DCC: dcc.md
- Datasheets:
- DaNews: datasheets/danews.md
- HopeTwitter: datasheets/hopetwitter.md
- NAT: datasheets/netarkivet_text.md
- DaRadio: datasheets/daradio.md

plugins:
- mkdocs-jupyter
- search

extra:
social:
- icon: fontawesome/brands/github
link: https://github.com/centre-for-humanities-computing/danish-foundation-models
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ dependencies = ["pydantic==1.8.2"]

[project.optional-dependencies]
dev = ["black==23.9.1", "ruff==0.1.0", "pyright==1.1.331", "pre-commit==3.5.0"]
docs = [
"mkdocs-jupyter==0.24.2",
"mkdocs-material==9.1.21",
"mkdocstrings[python]==0.22.0",
]
test = ["pytest==6.2.5", "pytest-lazy-fixture==0.6.3", "pytest-cov==2.8.1"]

[project.license]
Expand Down

0 comments on commit 49f767c

Please sign in to comment.