Data Integrator
|
The GOGraphBuilder tool generates a graph (data structure consisting of nodes and edges), in GraphML file format (an easy to comprehend and intuitive file format for graph representation based on XML syntax), furnished with annotation data. The graphs can be built retrieving information for nodes and edges from an original GraphML file only containing the Gene Ontology (GO) term ID and name, or an OBO file. Both the GraphML and the OBO file contain terms representing gene product properties under the three Gene Ontologies (also known as domains):
The OBO file can be directly downloaded from Gene Ontology, and it is the text file format used to view and edit gene ontologies. This file consists of a header and a series of stanzas. There are three types of stanzas indicated in square brackets in the first line:
Header and stanzas contain fields, represented in the form tag : value
. The term stanza, contains the GO terms with their description and relations. This is the only stanza considered by this tool.
The GraphML file, which can be also used as input file, is an internally used format file, obtained running the –export-graph
option in the GOAnnotator - Access Gene Onotology Annotation module. This original file consists only of nodes representing the GO terms with their attributes
t_id
name
and edges that represent the relations between GO terms contain the relationship
attribute. There are five types of relationships which relate GO terms:
Running the tool, for each node count (number of gene products annotated with a term in the database) and frequency (term count contributed by more specialized terms (ie. child terms)) are calculated considering the annotation file provided (GO annotation mapping available through the GO web site).
GOGraphBuilder is a command line tool which takes two inputs: the gene ontology (GO) file containing the GO terms with related information, provided by option –ontology-graph-file
, and its type, specified by option –ontology-graph-type
, that can be either a GraphML file or a GO OBO file. In the GraphML file the nodes contain the GO IDs stored in the t_id
field and term names stored in the name
field. Edges encode for the directionality by referencing source and target to node IDs and the type of relationship between two nodes is given by the relationship
field. If the option –remove-edges
is supplied, edges of the specified relationship type will be removed. The option –ontology
is set to specify the GO ontology (CC, MF, BP). This option also implies extraction of data items from the second input, the GO annotation file, which must be in GAF format and which is specified by option –annotation-file
. The set filtering has currently only a single choice (–annotation-filter-set
), which is to exclude electronically inferred annotations (IEA). More specialized filters can be added by altering the source code only.
Options applicable to more than a single tool are summarized in common command line options.
The tool returns a GraphML file which consists of nodes that represent the GO terms. Every node contains the following attributes:
t_id
: GO term identifier name
: GO term name cnt
: Number of gene products annotated with a term in the database freq
: Number of gene products annotated with a term in the database and of all its children Every edge represents the relationship between GO terms, as given by node attribute relationship
.