Skip to content

Commit

Permalink
Merge pull request #23 from impresso/feature/sc_improve_language_iden…
Browse files Browse the repository at this point in the history
…tification

Feature/sc improve language identification
  • Loading branch information
simon-clematide authored Dec 28, 2020
2 parents 46233b3 + f19b037 commit 7570caf
Show file tree
Hide file tree
Showing 160 changed files with 1,092 additions and 2,009 deletions.
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ clean-documentation:
documentation:
# make sure to install the correct jsonschema2md tool:
# npm install -g @adobe/jsonschema2md
jsonschema2md -d json/newspaper/ -n -v 06 -o docs
jsonschema2md -d json/topic_model/ -n -v 06 -o docs
jsonschema2md -d json/language_identification/ -n -v 06 -o docs
jsonschema2md -d json/newspaper/ --header false -n -v 06 -o docs -x - -s propTable
jsonschema2md -d json/topic_model/ --header false -n -v 06 -o docs -x - -s propTable
jsonschema2md -d json/language_identification/ --header false -n -v 06 -o docs -x - -s propTable
#jsonschema2md -d json/linguistic_annotation/ -n -v 06 -o docs


Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The 'impresso - Media Monitoring of the Past' project is funded by the Swiss Nat

## License

Copyright (C) 2020 The *impresso* team. Contributors to this program include: [Simon Clematide](https://github.com/simon-clematide), [Maud Ehrmann](https://github.com/e-maud) and [Matteo Romanello](http://github.com/mromanello/) ).
Copyright (C) 2020 The *impresso* team. Contributors to this program include: [Simon Clematide](https://github.com/simon-clematide), [Maud Ehrmann](https://github.com/e-maud) and [Matteo Romanello](http://github.com/mromanello/).

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the [GNU Affero General Public License](https://github.com/impresso/impresso-schemas/blob/master/LICENSE) for more details.
Expand Down
13 changes: 0 additions & 13 deletions docs/contentitem-properties-cc.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled boolean in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/cc
```

True if image box coordinates are known to be correct, False otherwise


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## cc Type

`boolean`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-d.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled string in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/d
```

issue date (yyyy-mm-dd)


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## d Type

`string`
Expand Down
13 changes: 0 additions & 13 deletions docs/contentitem-properties-ft.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled string in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ft
```

the rebuilt fulltext


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## ft Type

`string`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-id.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled string in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/id
```

The unique identifier for a content item (CI)


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## id Type

`string`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-lb-items.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled number in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/lb/items
```




| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## items Type

`number`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-lb.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled array in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/lb
```

text offsets of physical line breaks (relative to 'ft' field)


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## lb Type

`number[]`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-lg.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled string in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/lg
```

two letter language code


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## lg Type

`string`
Expand Down
13 changes: 0 additions & 13 deletions docs/contentitem-properties-olr.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled boolean in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/olr
```

True if optical layout recognition was applied to the issue this content item originates from.


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## olr Type

`boolean`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-pb-items.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled number in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/pb/items
```




| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## items Type

`number`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-pb.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled array in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/pb
```

text offsets of physical paragraph breaks (relative to 'ft' field)


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## pb Type

`number[]`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-pp-items.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled number in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/pp/items
```




| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## items Type

`number`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-pp.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled array in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/pp
```

array of page numbers over which the CI spans; it's the physical page number issue-based, as we get it from the OCR.


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## pp Type

`number[]`
Expand Down
13 changes: 0 additions & 13 deletions docs/contentitem-properties-ppreb-items-properties-id.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled string in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/id
```

canonical ID


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## id Type

`string`
13 changes: 0 additions & 13 deletions docs/contentitem-properties-ppreb-items-properties-n.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled number in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/n
```

page number


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## n Type

`number`
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled array in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/t/items/properties/c
```

page coordinates of token


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## c Type

`array`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled number in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/t/items/properties/l
```

token length


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## l Type

`number`
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled number in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/t/items/properties/s
```

offset start (relative to ft field)


| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## s Type

`number`
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled undefined type in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/t/items/properties
```




| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ----------------------- | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | Unknown identifiability | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## properties Type

unknown
13 changes: 0 additions & 13 deletions docs/contentitem-properties-ppreb-items-properties-t-items.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# Untitled object in Content Item Schema

```txt
https://impresso.github.io/impresso-schemas/json/newspaper/contentitem.schema.json#/properties/ppreb/items/properties/t/items
```




| Abstract | Extensible | Status | Identifiable | Custom Properties | Additional Properties | Access Restrictions | Defined In |
| :------------------ | ---------- | -------------- | ------------ | :---------------- | --------------------- | ------------------- | ---------------------------------------------------------------------------------- |
| Can be instantiated | No | Unknown status | No | Forbidden | Allowed | none | [contentitem.schema.json\*](../out/contentitem.schema.json "open original schema") |

## items Type

`object` ([Details](contentitem-properties-ppreb-items-properties-t-items.md))
Expand Down
Loading

0 comments on commit 7570caf

Please sign in to comment.