layout | title | nav_order | published | parent |
---|---|---|---|---|
default |
Glossary |
4 |
true |
How to use Data Commons |
{: .no_toc} This page contains a selection of key terms important to understanding the structure of data within Data Commons.
{: .no_toc}
- TOC {:toc}
Cohort{: target="_blank"}
{: #cohort}
A group of entities sharing some characteristic. Interchangeably referred to in a Data Commons context as Cohort
and CohortSet
. Examples include the CDC's list of the United States' 500 largest cities{: target="_blank"}.
The type
Cohort
{: target="_blank"} is a legacy type not used by the Sheets methodDCCOHORTMEMBERS()
.
Date{: target="_blank"}
{: #date}
The date of measurement. Specified in ISO 8601 format. Examples include 2011
(the year 2011), 2019-06
(the month of June in the year 2019), and 2019-06-05T17:21:00-06:00
(5:17PM on June 5, 2019, in CST).
{: #dcid}
Every entity in the Data Commons graph has a unique identifier, called "DCID" (short for "Data Commons Identifier"). So, for example, the DCID of California is geoId/06
{: target="_blank"} and of India is country/IND
{: target="_blank"}. DCIDs are not restricted to entities; every node in the graph has a DCID. Statistical variables have DCID, for example the DCID for the Gini Index of Economic Activity is GiniIndex_EconomicActivity
{: target="_blank"}.
To find a DCID for an entity or variable, see the Key concepts page.
{: #entity}
An entity represented by a node in the Data Commons knowledge graph. These can represent a wide range of concepts, including cities{: target="_blank"}, countries{: target="_blank"}, elections{: target="_blank"}, schools{: target="_blank"}, plants{: target="_blank"}, or even the Earth{: target="_blank"} itself.
{: #facet}
Metadata on properties of the data and its provenance. For example, multiple sources might provide data on the same variable, but use different measurement methods, cover data spanning different time spans, or use different underlying predictive models. Data Commons uses "facet" to refer to a data's source and its associated metadata.
Measurement Denominator{: target="_blank"}
{: #measurement-denominator}
The denominator of a fractional measurement.
Measurement Method{: target="_blank"}
{: #measurement-method}
The technique used for measuring a variable. Describes how a measurement is made, whether by count or estimate or some other approach. May name the group making the measurement to indicate a certain organizational method of measurement is used. Examples include the American Community Survey{: target="_blank"} and WorldHealthOrganizationEstimates
{: target="_blank"}. Multiple measurement methods may be specified for any given node.
Observation (Statistical Variable Observation){: target="_blank"}
{: #observation}
A measurement of a variable for a particular place and time. For example, a StatVarObservation
of the StatisticalVariable
Median_Income_Person
for Brookmont, Maryland, in the year 2018 would be $126,199. A complete list of properties of statistical variable observations can be found in the Knowledge Graph{: target="_blank"}.
Observation Period{: target="_blank"}
{: #observation-period}
The time period over which an observation is made. Specified in ISO 8601 formatting for durations{: target="_blank"}.
{: #place}
Entities that describe specific geographic locations. Use the search box in Place Explorer{: target="_blank"} to search for places in the graph, or view the Knowledge Graph entry for Place{: target="_blank"} for a full view of the node. To learn more about place types, take a look at the place types page.
{: #preferred-facet}
When a variable has values from multiple facets, one facet is designated the preferred facet. The preferred facet is selected by an internal ranking system which prioritizes the completeness and quality of the data. Unless otherwise specified, endpoints will default to returning values from preferred facets.
{: #property}
Attributes of the entities in the Data Common knowledge graph. Instead of statistical values, properties describe unchanging characteristics of entities, like scientific name{: target="_blank"}.
Scaling Factor{: target="_blank"}
{: #scaling-factor}
Property of variables that measure proportions, used in conjunction with the measurementDenominator property to indicate the multiplication factor applied to the proportion's denominator (with the measurement value as the final result of the multiplication) when the numerator and denominator are not equal.
As an example, in 1999, approximately 36% of Canadians were Internet users{: target="_blank"}. Here the measured value of Count_Person_IsInternetUser_PerCapita
is 36, and the scaling factor or denominator for this per capita measurement is 100. Without the scaling factor, we would interpret the value to be 36/1, or 3600%.
Statistical Variable{: target="_blank"}
{: #variable}
Any type of metric, statistic, or measure that can be measured for a specific entity (most typically a place, but could be any other entity in the graph, such as a school or power plant) and time. Examples include median income of persons older than 16{: target="_blank"}, number of female high school graduates aged 18 to 24{: target="_blank"}, unemployment rate{: target="_blank"}, or percentage of persons with diabetes{: target="_blank"}. A complete list of variables can be found in the Knowledge Graph{: target="_blank"}.
Statistical Variable Group{: target="_blank"}
{: #variable-group}
Represents a grouping of variables that are conceptually related. For example, variable group Person With Gender = Female{: target="_blank"} consists of variables like Female Median Age{: target="_blank"}, Female Median Income{: target="_blank"} and etc. A variable group could also have child variable groups, which describe a subset of the parent variable group. For example, variable group Person With Age, Gender = Female{: target="_blank"} is a child of Person With Gender = Female{: target="_blank"}. It contains variables that have age constraints.
{: #triple}
A three-part grouping describing node and edge objects in the Data Commons graph.
Given tabular data such as the following:
country_id | country_name | continent_id |
---|---|---|
USA | United States of America | northamerica |
IND | India | asia |
You can represent this data as a graph via subject-predicate-object "triples" that describe the node and edge relationships.
USA -- typeOf ------------> Country
USA -- name --------------> United States of America
USA -- containedInPlace --> northamerica
Unit{: target="_blank"}
{: #unit}
The unit of measurement. Examples include kilowatt hours{: target="_blank"}, inches{: target="_blank"}, and Indian Rupees{: target="_blank"}. A complete list of properties can be found in the Knowledge Graph{: target="_blank"}.