Skip to content

Latest commit

 

History

History
130 lines (83 loc) · 16.4 KB

publishable.markdown

File metadata and controls

130 lines (83 loc) · 16.4 KB
layout previous next
default
governance.html
prepublication.html

III. PUBLISHABLE STATE DATA

Executive Order No. 95 provides a specific definition of “Publishable State Data” to guide covered State agencies. Publishing data on OPEN-NY involves a collaborative multi-step agency process (see Figure 2 below). In identifying Publishable State Data, agencies should include analyses from their executive and program staff, data coordinators, FOIL officers, data stewards, public information officers, security and privacy officers, and legal counsel.

High Level Guidence Summary

Covered State entities (and entities not covered by Executive Order 95) vary widely in terms of size, personnel, functions, responsibilities, mission, and data collected and maintained. As such, the identification and prioritization processes may vary across agencies and entities. These guidelines serve to provide assistance across a broad spectrum of agencies, with the stipulation that agencies look to their governing laws, rules, regulations, and policies in identifying and publishing “publishable state data.”

###A. Dataset Identification

In creating a data catalogue, agencies should identify those datasets that are high value and are in accordance of the definition of "Publishable State Data" within Executive Order 95.

The questions in Figure 3 below are neither exhaustive nor may be applicable to all agencies, but serve to provide a framework to identify potential data for publication on Open.ny.gov. For each question, agencies should assess whether the data falls within the definition of “Publishable State Data.”

Illustrative Identification Questions

  1. General questions:
  • What data does the agency collect? What “high value” data are currently publicly available?
  • What data are reported to the federal government; or frequently requested by other government entities (federal, state local)?
  • What data does the agency policy and planning unit use for trending and statistical analysis?
  • What data, including historical data, does the agency maintain?
  • What underlying data populates aggregate information in published reports?
  • What data is the subject of frequent or recent FOIL requests?
  • What data have not been previously published but meet the definition of “high value” - publishable state data that can be used to increase the covered State entity’s accountability and responsiveness, improve public knowledge of the entity and its operations, further the mission of the entity, create economic opportunity, or respond to a need or demand identified after public consultation.
  1. Do the datasets represent discrete, usable information?

In identifying datasets, government entities may be concerned that users of OPEN-NY will not understand their raw data or, if distilled to its rawest form, might lose utility. For example, state and local rules might differ, such that publishing raw, separate datasets of the two may reduce the value of the raw data being combined into a single dataset.

There are no hard and fast rules about what level of detail is sufficiently granular to add value to a government dataset. Whenever possible, government entities should resist the temptation to limit datasets to only those the agency believes might be understood or useful. Entities should be wary of underestimating the users of OPEN-NY. OPEN-NY users may come from a variety of fields and specialties, including academic and other government users who can envision a use for the raw data not anticipated by the originating entity. A better practice is for the agency to ensure its metadata describing the dataset is complete, including comprehensive overview documents describing the data, data collection, data fields, and presentation of research questions to maximize the utility and usefulness of the data.

###B. Prioritization

Executive Order 95 states: “Prioritization of publication of data based on the extent to which the data can be used to increase the covered State entity’s accountability and responsiveness, improve public knowledge of the entity and its operations, further the mission of the entity, create economic opportunity, or respond to a need or demand identified after public consultation...”

Executive Order 95 further states: “Data shall not be Publishable State Data if making such data available on the Open Data Website would…impose an undue financial, operational or administrative burden on the covered State entity or State.”

Prioritizing and creating a schedule for initial and ongoing publication: When creating a schedule for publication of a particular dataset, agencies need to make a judgment based upon a number of different factors.

Prioritizing initial and ongoing publication will entail balancing high value with data quality, data availability, and data readiness in setting forth a schedule for publication. Each covered State entity shall create schedules and prioritize data publication in a timely manner and in accordance with guidelines set forth herein. Agencies should assess and plan - accounting for time to: identify data, review and approve data, and prepare data for publication.

Prioritizing initial and ongoing publication will entail balancing high value with data quality, data availability, and data readiness in setting forth a schedule for publication. Each covered State entity shall create schedules and prioritize data publication in a timely manner and in accordance with guidelines set forth herein. Agencies should assess and plan - accounting for time to: identify data, review and approve data, and prepare data for publication.

Figure 4 – Prioritization Prioritization

Below are suggested questions, the answers to which can assist agencies in prioritizing publication of high value “publishable state data” consistent with Executive Order 95:

  1. Does the data highlight agency performance?
  2. Has the data ever been published or made publicly available in a machine-readable format so that it can be processed, analyzed, or re-used? Is the data “high value?”
  3. Might publication of the data benefit the public by setting higher standards? The agency might be in the forefront of standards for government performance, where exposing the data might cause other agencies to raise their performance
  4. Does availability of the data align with new State and/or Agency initiatives? The ordering publication of any relevant datasets accordingly might be of great value.
  5. Does availability of the data align with federal initiatives or exposures of federal data? There may be higher value in the agency's data if synergies can be created.
  6. Does the data support decision making at the state, local, internal agency or other external agency's level, or contain information that informs public policy?
  7. Is the data timely? What is the dataset refresh and maintenance cycle?
  8. Does availability of the data align with legal requirements for data publication? For example, there might be statutorily-required reporting which can be satisfied by publishing datasets, without necessarily needing an extensive narrative report. If the data is collected and compiled by the agency to fulfill statutory reporting requirements, then the agency's governing laws have already determined that the data is of high value for that agency.
  9. Would availability of the data improve agency-to-agency communication? Certain government functions may involve multiple agencies requiring access to similar data.
  10. Could availability of the data create specific economic opportunity? In many cases, this will be unknown to the agency in advance. Some of the greatest successes of the open data movement have involved government data being commercially appropriated in useful ways, such as weather data. To the extent the agency can anticipate significant commercial use of the data, the agency may wish to order publication of such data more highly as it creates its schedule.
  11. Could the data be useful for the creation of novel and useful third-party applications, mobile applications, and services?
  12. Does the data further the core mission of the agency or multiple government entities?
  13. Does the data support the agency’s strategic direction?
  14. Does the data enable accountability and efficiency?
  15. Does the data have depth and breadth of years of coverage? Release of data with high information content and quality can improve accountability and responsiveness and/or improve public knowledge of the agency and its operations.
  16. Does the data have accompanying metadata and a data dictionary? Metadata and any accompanying overview documents should be comprehensive so as to provide a full understanding of the data and data elements to an end-user. This ensures version control, availability of contact information, and descriptive information sufficient for end-users to be able to use and interpret the data. In addition, where applicable, agencies should append disclaimers to highlight limitation of the data and/or prevent use of the data in misleading ways.
  17. Is the data accurate/complete? The dataset must be sufficiently final or complete, such that it is currently publishable. If there is a trigger allowing the agency to publish the data at some time in the future, then scheduling publication of the data should be set accordingly.
  18. Is the dataset in a format that is machine-readable or can be easily transformed? The data should be organized or formatted in a manner which is machine-readable and that can be re-used, and capable of being digitally transmitted or processed. It should be in tabular or geo-spatial form. Agencies should consider the level of effort required to transform the data to a machine-readable format and maintain it in such a format.
  19. Is the data frequently requested? As demand is known and quantifiable, this should raise the value of this data for publication. If the dataset is the type that is requested through FOIL on a recurring basis, then the agency may reduce duplication and obtain efficiencies by posting data on OPEN-NY.
  20. Is the data needed by the public after-hours? As demand maybe known and quantifiable, such datasets should be ranked, where applicable, of higher value.
  21. Does the data have a direct impact on the public? The data is likely of higher value if it is already apparent there is a deep impact and interest by the public (e.g., hospital infection rates, food establishment inspection results, etc.)
  22. Is the data in strong demand from constituencies? The data might be of higher value to specific, narrow interest groups which may be the agency's core constituency for those issues.
  23. Is the data of timely interest?

###B. Disclosure Guidance

The following guidelines regarding disclosure provide sample questions for consideration, as agencies begin to identify and review datasets.

  1. Do the datasets raise any Security or Privacy concerns?
  • Will public posting of the data violate any laws, rules or regulations?
  • Will public posting of the data pose any information security risks, either alone or if the data are combined with other publicly available data?
  • Will public posting of the data violate individual privacy, or contain individually identifying in-formation that could be used in harmful ways?

Practical effects:

Even if there are no legal impediments to publishing the data, might publication result in potentially harmful effects? Example: Would posting arrest patterns inadvertently reveal where police are concentrating efforts?

  1. Disclosure thresholds: Various statutes and regulations, such as HIPAA and its privacy regulations, have very exacting requirements for determining whether data have been sufficiently de-identified so as not to compromise individual privacy. For example, the presence of medical conditions per geographic location might constitute high-value, useful, and sought-after data; however, exposing it might identify individuals and their medical conditions.

Even in the absence of specific legal prohibitions, government entities should watch for outlier publication conditions. For example, identifying a single arrestee who is a minor of a certain age in a certain county without providing any other information, might nonetheless serve to identify that particular individual.

For particular datasets that pose such issues agencies may consider providing aggregated data based upon their laws, rules, regulations, and policies. Alternatively, agencies may set disclosure thresholds for the dataset (many agencies already adhere to such standards). For example, if a cell in a particular dataset field goes below a certain number of individuals, the value in that particular cell should be hid-den. Government entities will need to balance their desires to publish accurate, complete, and valuable tabulations against the need to guard against unwarranted invasions of personal privacy, in specific situations.

  1. How does FOIL apply to each dataset?
  • Does the data type fit within any one of the Freedom of Information Law's (FOIL) narrow exceptions that would allow it to be withheld? Should it be withheld?

Under the NYS Public Officers Law, Article 6 (the NYS Freedom of Information Law, or "FOIL"), the pre-sumption is that government records shall be open to the public, unless excludable under a narrow set of specific exemptions including such concerns as invasion of personal privacy, impairment of contractual or collective bargaining negotiations, exposure of protected trade secrets, interference with law enforcement or judicial proceedings, endangering life or safety, and others. Government entities should confer with their FOIL officers for publication of data on OPEN-NY, and exclude any datasets which, because their publication would cause the harms described in the FOIL law, would not constitute "Publishable State Data."

  1. Does the agency have sufficient property rights to publish the data?
  • Does the agency possess all rights to publishing the data, or publishing it in a particular form? For example, was the data collected or compiled by a third party under a contractual limitation on its publication?
  • Is the public posting of the data in compliance with any intellectual property rights held by third parties to any of the data? Has your agency secured appropriate permissions, and/or provided required disclaimers, registration markers, etc.?

Government entities should exclude those datasets as warranted either in part or in full from their catalogues of data.

###D. Narrative Data

Closed and proprietary file formats (e.g., PDF, PPT, DOC, DOCX, etc.) are not appropriate formats for publishing on the open data platform. The benefits of standardized open file formats that can be re-used are that they permit access to the widest range of users accessing the widest range of application systems. Datasets must be released in open file formats which are machine-readable and can be re-used (see Section V/Subheading C).

OPEN-NY serves as a platform to present machine-readable data, so that end-users may process, access, discover, extract and combine data elements to discover new insights, observations, and utility regarding the data. Still, it may sometimes be useful for limited narrative data to accompany data sets to:

  • help the end-user better understand what the government entity's intentions were in collecting and publishing the data;
  • avoid duplication of effort; or
  • pose research questions and queries and expose ways that an end-user might add new interpretive value to the data.

Narrative documentation associated with datasets on OPEN-NY should be kept to the minimum neces-sary for an end-user of the platform to gain an understanding of the agency's interpretation of the data. Limit documentation to 1-2 pages for most datasets. If an agency develops extensive narrative reports about the data, then those reports should be published on the agency's website with a link provided in the agency's metadata associated with that particular dataset. It is important to keep this link current.