Abstract

Many of the available biomedical ontologies are rich in human understandable labels, but are often less rich in axiomatic descriptions. Thus, their effectiveness for supporting advanced data analysis processes is limited. Our past work included the presentation of a method for the analysis of lexical regularities in biomedical ontology labels. We showed that biomedical ontologies can present a high degree of regularity in their labels; this regularity can be used for the application of automatic axiomatic enrichment, which is the process of using textual content of the ontology to create new relationships between classes. The Gene Ontology (GO) is an ontology that is widely used for the functional annotation of genes and proteins. The class labels of the GO have a high regularity. Recently, the GO Consortium enriched the ontology by using the so-called cross-products extensions (CPE). Cross-products are generated by establishing axioms that relate a given GO class with classes from GO or other biomedical ontologies. In this paper, we study how our method for lexical analysis can identify and reconstruct the cross-products defined by the Gene Ontology Consortium. Our results show that our method is able to partially detect the GO cross-products extensions with an average recall of 61% and in some scenarios over 70%.

What could you find in this web page?

In this web page you could find the files with information about the lexical analysis and the calculation of the CPE. We have used this information to create the tables and figures used in the paper.

Source Ontology Files

  • Ontology files: owl ontology files used for doing the experiment. This files were originally downloaded in OBO format. We removed the obsolete classes and transform them to the OWL format. We have in total 5 ontology files that play the role of Source Ontology or Enriching Ontology.
    • Source Ontologies:
      • Gene Ontology (go.owl)) and its three sub-ontologies:
    • Enriching Ontologies:
      • Chemical Entities of Biological Interest (chebi.owl): is a freely available dictionary of molecular entities focused on "small" chemical compounds.
      • Cell Ontology (cl.owl):is a candidate OBO Foundry ontology for the representation of cell types.

The lexical analysis and cross-products: using the CPE metric