Skip to content

Commit

Permalink
Merge pull request #32 from impresso/ocrqa
Browse files Browse the repository at this point in the history
Schema for OCR-QA
  • Loading branch information
simon-clematide authored Jun 22, 2024
2 parents 6cfa1d5 + ca61961 commit 8c8722e
Show file tree
Hide file tree
Showing 6 changed files with 79 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ We define schemas for:
- [Language Identification](docs/language_identification.md) (draft 06)
- Entities
- [Entities](docs/entities.md) (2020-12)
- [OCR Quality Assessment](docs/ocr_qa.md) (OCR-QA)


#### Processes
- Data processing manifests (todo)
Expand Down
3 changes: 3 additions & 0 deletions docs/ocr_qa-properties-ci_ref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## ci\_ref Type

`string`
3 changes: 3 additions & 0 deletions docs/ocr_qa-properties-ocrqa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## ocrqa Type

`number`
46 changes: 46 additions & 0 deletions docs/ocr_qa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
## OCR-QA JSON Schema Type

`object` ([OCR-QA JSON Schema](ocr_qa.md))

# OCR-QA JSON Schema Properties

| Property | Type | Required | Nullable | Defined by |
| :----------------- | :------- | :------- | :------------- | :------------------------------------------------------------------------------------------------------------------------------------------------ |
| [ci\_ref](#ci_ref) | `string` | Required | cannot be null | [OCR-QA JSON Schema](ocr_qa-properties-ci_ref.md "https://impresso.github.io/impresso-schemas/json/ocr_qa/ocr_qa.schema.json#/properties/ci_ref") |
| [ocrqa](#ocrqa) | `number` | Required | cannot be null | [OCR-QA JSON Schema](ocr_qa-properties-ocrqa.md "https://impresso.github.io/impresso-schemas/json/ocr_qa/ocr_qa.schema.json#/properties/ocrqa") |

## ci\_ref

Reference to canonical content item id, typically an article

`ci_ref`

* is required

* Type: `string`

* cannot be null

* defined in: [OCR-QA JSON Schema](ocr_qa-properties-ci_ref.md "https://impresso.github.io/impresso-schemas/json/ocr_qa/ocr_qa.schema.json#/properties/ci_ref")

### ci\_ref Type

`string`

## ocrqa

The estimated OCR quality, between 0 and 1

`ocrqa`

* is required

* Type: `number`

* cannot be null

* defined in: [OCR-QA JSON Schema](ocr_qa-properties-ocrqa.md "https://impresso.github.io/impresso-schemas/json/ocr_qa/ocr_qa.schema.json#/properties/ocrqa")

### ocrqa Type

`number`
4 changes: 4 additions & 0 deletions examples/ocr_qa/ocr_qa_example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"ci_ref": "actionfem-1939-05-15-a-i0022",
"ocrqa": 0.86
}
21 changes: 21 additions & 0 deletions json/ocr_qa/ocr_qa.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://impresso.github.io/impresso-schemas/json/ocr_qa/ocr_qa.schema.json",
"title": "OCR-QA JSON Schema",
"description": "A representation for the assessment of OCR quality of content items.",
"type": "object",
"properties": {
"ci_ref": {
"type": "string",
"description": "Reference to canonical content item id, typically an article"
},
"ocrqa": {
"type": "number",
"description": "The estimated OCR quality, between 0 and 1"
}
},
"required": [
"ocrqa",
"ci_ref"
]
}

0 comments on commit 8c8722e

Please sign in to comment.