Skip to content

Commit

Permalink
Merge pull request #2 from dsih-artpark/reorg
Browse files Browse the repository at this point in the history
feat: reorganised and cleaned up, added terminologies and attributes
  • Loading branch information
snehasaisneha authored Apr 30, 2024
2 parents 18edf7c + 3ddb553 commit ab14dac
Show file tree
Hide file tree
Showing 12 changed files with 143 additions and 23 deletions.
7 changes: 7 additions & 0 deletions docs/_static/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.wy-nav-content {
max-width: 60% !important;
}

.wy-table-responsive table td, .wy-table-responsive table th {
white-space: inherit;
}
9 changes: 9 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,12 @@
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"

# These folders are copied to the documentation's HTML output
html_static_path = ['_static']

# These paths are either relative to html_static_path
# or fully qualified paths (eg. https://...)
html_css_files = [
'custom.css',
]
7 changes: 7 additions & 0 deletions docs/csvs/flag.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
flag,expansion,description
-i,immutable,used when a given attribute is immutable in a given database/resource
-m,mandatory,this attribute must be filled for each entry and any entry that does not have this attribute must be rejected by any application's database
-v,validate,"used when an attribute is auto-collected or computed from other collected information, but is also, usually optionally, collected manually as well, for validation on the back end. Example of this is accepting the name of the subdistrict or village during sample collection, which is normally imputed from lat-long or from the user's sample collection site or assigned area."
-p,private,"used for any sensitive information, also referred to as Personally Identifiable Information. This information is not available to users accessing the data through RESTFul APIs, or any other alternative means of access. (Retention Policy: TODO)"
-a,automatic,data is autofilled
-ao,automatic but can be overriden,for information that can be overriden by the user
48 changes: 48 additions & 0 deletions docs/csvs/sample.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
attribute category,attribute,flags,"description, example, and binding",type
meta,uuid,"-i, -m",UUID based on UUID4 from RFC4122. Example: urn:uuid:f71dab9c-d12a-400d-b49a-5f4972bb4c23,PrimitiveType.uri.uuid
,sampleCollectorUUID,"-i, -m",User/meta.uuid,
,barcode,"-i, -m, -a","Number scanned from the barcode, assigned to the database entry automatically",
,accessionIDs,,"Accession to various other standard databases. Stored in JSON, XML, or YAML format.

{
{ system: ""https://gisaid.org/""
code: ""MyGISAID_AccessionID"",
display: ""GISAID Accession ID""
},
{ system: ""https://www.ncbi.nlm.nih.gov/genbank/""
code: ""MyGenBank_AccessionID"",
display: ""Genbank Accession""
}
}

extensible to support any database ID.",
,diseaseOfInterest,-m,"Code based on existing standards, e.g. ICD11 and SNOMED-CT. Like with accession IDs, multiple codes are supported

{
""system"": ""https://icd.who.int/browse/2024-01/mms/en#/http://id.who.int/icd/entity/1959883044"",
""value"": ""1F05.3"",
""display"": ""Foot and Mouth Disease""
}",
,CollectionDate,-a,Date of Sample Collection. Autofilled when barcode is scanned.,
location,country,"-m, -v","Two letter Code based on ISO3166-1 A-2 (https://www.iso.org/iso-3166-country-codes.html).

{
""system"": ""https://www.iso.org/standard/72482.html"",
""code"": ""IN"",
""display"": ""INDIA""
}",
,geoLatLong,-m,"(longitude, latitude). It's recommended to retain 6 decimal points, but at least two are required.",
,geoadmin,-v,"Store the highest Resolution ID associated with the sample, along with the hierarchy system used to collect it. (Wards vs Villages for Urban vs Rural, for e.g.)

{
""system"": ""https://lgd.gov.in"",
""hierarchy"": ""ulb"",
""code"": ""ward_276600-12"",
""display"": ""DODDA BOMMASANDRA"",
""parents"": [""zone_276600-10"", ""ulb_277600"",""state_29""]
}",
,pinCode,-v,e.g. 560012,
collectionInfo,siteName,,string,
,siteType,,,
,sampleType,,"Terminology Binding (Milk/Soil/Feed/Water Runoff/Air, Slurry)",
,storage,-ao,Terminology Binding (Room Temperature vs Cold Chain),
8 changes: 8 additions & 0 deletions docs/csvs/user.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
attribute category,attribute,flags,"description, example, and binding",type
meta,uuid,"-i, -m",UUID based on UUID4 from RFC4122. Example: urn:uuid:f71dab9c-d12a-400d-b49a-5f4972bb4c23,PrimitiveType.uri.uuid
,id,,Alphanumerical ID assigned after database reconciliation.,
info,name,,,
,email,,,
,phoneNumber,,,
,organisation,,,
,pinCode,,,
1 change: 0 additions & 1 deletion docs/data_standards/getting_started.md

This file was deleted.

7 changes: 4 additions & 3 deletions docs/data_standards/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Welcome to the Data Standards section.
:maxdepth: 2
:glob:

getting_started.md
roles/index.md
users/index.md
terminologies.rst
users.rst
samples.rst

11 changes: 0 additions & 11 deletions docs/data_standards/roles/index.md

This file was deleted.

8 changes: 8 additions & 0 deletions docs/data_standards/samples.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Resource: Sample
=================


.. csv-table:: Sample Attributes
:file: ../csvs/sample.csv
:widths: 20,20,10,40,10
:header-rows: 1
42 changes: 42 additions & 0 deletions docs/data_standards/terminologies.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Terminologies, Concepts and Conventions
========================================

.. warning::
This section is under active development and is subject to change.


.. note::
These data standards, and the terminologies therein, are based on and heavily inspired by `HL7 FHIR <https://www.hl7.org/fhir/license.html#2.1.23>`_, which is made available using the `Creative Commons License <https://creativecommons.org/publicdomain/zero/1.0/>`_.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in `RFC 2119 <https://datatracker.ietf.org/doc/html/rfc2119>`_.


#. Data Types:
#. Using SQL-esque terminology - any dataset would be made of multiple tables. Each table has fields, or attributes, and often, these fields have a specific type. These are called data types.
#. **Primitive Data Types**: These include things like boolean, string, float, int, and so on. Refer `here <https://hl7.org/fhir/datatypes.html#primitive>`_ for further details.
#. **General Purpose Data Types**: These are slightly more complex data types - this could include date and time as per `ISO8601 <https://www.iso.org/iso-8601-date-and-time-format.html>`_, range of floating point variables, or codeable concepts, as defined below.
#. Terminology Bindings:
* Terminology Bindings are used a specific field is bound to a set of terminologies. This is also useful when there is no existing standard library to rely on - or the standard library is too limited for our purposes.
* Gender is an example of a concept that is bound to terminologies. Similar to `HL7 FHIR's Administrative Gender <https://hl7.org/fhir/valueset-administrative-gender.html>`_, our standards use a similar set of options - ``female | male | other | unknown``.
* Terminology Bindings will also come into place when we are defining sample collection sites for environmental surveillance. For e.g., there may be a smaller Terminology Binding Set for FMD, where bird sanctuary sites might not be relevant.
* A crucial example of an internally defined terminology binding - is user roles. Environmental Surveillance has a detailed supply chain, all the way from sample collection to delivery, processing, sequencing, and bioinformatics analysis. The different players here will have different roles and different levels of access to the various resources.
.. note::
The terminology binding for the list of roles is currently under development.
#. Codeable Concepts:
* Codeable Concepts are used when existing public repositories or standards are used to create terminology binding sets.
* The Indian Union Government's `LGD <https://lgd.gov.in>`_ for standard geospatial area codes at various resolutions is a good example of a codeable concept.
* `WHO's ICD 11 Classification of Diseases <https://www.who.int/standards/classifications/classification-of-diseases>`_ is also a good example.
#. Resources
* A resource is an entity that has a known value and contains a set of structured data complying with the definition of that resource within this documentation.
* An example of a resource is a `User <../users.html>`_. Every user within an application developed using these standards, would be a row in a database entry that complies with the standards mentioned here.
* Resources can be looked at as tables - each table has a data dictionary and a schema, and rule sets and terminology bindings to ensure inter-operability and data integrity.
* Resources can link to each other, thereby reducing data duplication. For example, each entry in the table for the `Samples <samples.html>`_ resource will have a 'Sample Collector' - which need not mention all the details of the sample collector but simply their ID within the Users table.
* All resources defined within ``dses`` have an entry for UUID (``User.uuid``, ``Sample.uuid``, and so on). All these uuids comply with the uuid4 specification mentioned in `RFC 4122 <https://www.ietf.org/rfc/rfc4122.txt>`_. All applications built on this spec SHALL use the uuid alone as the unique reference, regardless of any other unique alphanumeric ids available within the databases.
#. Flags and Attributes
* TODO: (Camel Case. Conventions.) Categories and subcategories are for readability (for now).


.. csv-table:: Flags
:file: ../csvs/flag.csv
:widths: 20,30,50
:header-rows: 1
10 changes: 10 additions & 0 deletions docs/data_standards/users.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Resource: User
==============


The User resource represents individuals involved in the environmental surveillance process. This table is meant to capture the various users, and the metadata associated with the user that must be collected. This should be considered to be distinct from accounts, which capture the separate authentication layer to access any applications.

.. csv-table:: User Attributes
:file: ../csvs/user.csv
:widths: 20,20,10,40,10
:header-rows: 1
8 changes: 0 additions & 8 deletions docs/data_standards/users/index.md

This file was deleted.

0 comments on commit ab14dac

Please sign in to comment.