Semantics for Interoperability of Agricultural Data

Agricultural data may be on various topics (yield, environment, climate, socio-economic factors, ...) and formats (texts or numbers), often geo-located or operated by models (for nowcasting and forecasting agricultural outcomes). 

Interoperability, the ability of reusing the data produced by others in your own information system, or vice versa, largely depends on how well and explicitly the “meaning” of the data is described - semantic interoperability. (It also depends on the ability to understand the formats used for data encoded, which we call “syntactic interoperability”).
 
With textual documents, it is common practice to index/classify them so that they can be retrieved independently of the language or specific terminology they use - multilingual controlled vocabularies, or thesauri are made for that purpose. Still, different thesauri lead to different classification and silos are generated. 
 
The problem is even more prominent with “hard” data (e.g., spatially and temporally referenced data, trial data, etc.). Consider the following examples: 
  1. Data described with local codes - their meaning is easily lost when passing on the data
  2. Unclear identity of the object being observed. What is “corn”? And what species do we mean when talking about “cereals”?
  3. Implicit definitions of mode of observation and measurement. E.g., the height of corn implies that one knows what is “corn”, at what point of its development the “height” is measured (how many weeks after seeding?), what is taken as a points of reference in the measurement (first leaves or the tip of the plant?), and obviously the unit of measurements (cm or inches?)  
  4. Implicit temporal and spatial contexts and resolutions. 

Actors and initiatives in semantics & agricultural data

Agrisemantics working group @RDA. A working group (Jan 2017-Jun 2018) within the Research Data Alliance. The ultimate output of the group will be a set of recommendations for components supporting semantics. Intermediate outputs: a landscape of the use of semantics for agricultural data (May 2017), and a collection of use cases illustrating the problems that semantics is (September 2017).  
 
GACS Working Group (2014-2016) - a self-funded project with FAO, CABI, NAL. The three organizations worked to identify a set of concepts common to their three thesauri (“concept schemes”, in SKOS parlance). The output was a concept scheme in BETA VERSION - see below.  
 
GACS Beta (May 2016) - the output of the GACS working group as of May 2016. The output was released in a BETA VERSION and it is not currently maintained. GACS Beta is a SKOS concept scheme (i.e., a thesaurus) consisting of ~15,000 concepts with a merge of the information contained in the three thesauri of origin. Read details of the methodology & the BETA produced
 
GACS working group (Start: April 2017) - an expanded group of ag- & data-related institutions are now forming a new working group under the umbrella of GODAN. Goal of the group is to enable semantic interoperability of agricultural data, building on the experience of the previous edition of the GACS working group. The group is currently defining its plan of activities for the next few years (in practice - a vision, technical roadmap and business model). 
 

Some basics notions

Agriculture is not a single domain, but rather an interdisciplinary . For example, plant breeding, agricultural practices, fisheries and forestry, rural sociology, agricultural economics, soil, water, climate sciences, food production, are all part of agriculture, to mention only a few. 
 
Agricultural data. By this phrase we mean the data and metadata produced in any domain related to agriculture. Agricultural data may be structured, semi- and unstructured data. Statistics, georeferenced data, textual documents, observations, as long as they contain information relevant to agriculture, they would all be examples of agricultural data. 
 
Concept Scheme. In general terms,  a set of concepts and their organization. In technical terms, a class defined within the W3C standard SKOS, introduced to provide a standard way to express thesauri in RDF. In SKOS, skos:ConceptScheme is an instance of an owl:Class.
 
Interoperability. The property of systems, as being able to process information originally generated by and for third party applications.
 
Semantic interoperability refer to the ability to understand the meaning of the data to reuse. In particular, it assumes that one is able to automatically detect (a) if similar items from distinct datasets or information systems actually refer to the same objects of the world, and if not, (b) in what they differ, and (c) how they relate to each other.
 
Semantic structures / resources - a collective way to refer to any resource used to define the “meaning” of a piece of data, independently of their structural complexity (e.g., controlled vocabularies, taxonomies, hierarchies, ontologies), publication format (e.g., machine readable or not, open or not), multilingual content (or not), purpose (), coverage (domain specific or general purpose), or accessibility (global URI vs local codes only).