Data Integrator (Python API)
|
Base class for implementation specific graph library. More...
Public Member Functions | |
def | __init__ (s) |
Initialize the internal graph implementation. More... | |
def | GetRoot (s) |
Get the root data node. More... | |
def | GetAllNodes (s) |
Get all data nodes. More... | |
def | GetAncestors (s, goID, include=True) |
Get ancestor nodes for a certain node. More... | |
def | GetAllAncestors (s, include=True) |
Get ancestor nodes for all nodes in the graph. More... | |
def | GetChildren (s, goID, include=True) |
Get child nodes for a certain node. More... | |
def | GetDistance (s, srcGOID, dstGOID) |
Get the shortest distance between two nodes. More... | |
def | IgnoreRelationships (s, is_a=False, regulates=False, part_of=False, positively_regulates=False, negatively_regulates=False) |
Do not take into account certain relationships. More... | |
def | SetCountsFromAC (s, ac) |
Populate graph with counts from an annotation corpus. More... | |
def | UpdateNode (s, goID) |
Transfer content of data node to the internal graph. More... | |
def | UpdateAllNodes (s) |
Transfer content of all data nodes to the internal graph. More... | |
def | ReadGraphML (s, ffn) |
Read a graphml-formatted gene ontology from a file. More... | |
def | WriteGraphML (s, ffn) |
Write the graph to a file. More... | |
def | ReadOBO (s, ffn) |
Read an OBO-formatted gene ontology from file. More... | |
def | IsPopulated (s) |
Has the graph been populated? More... | |
Base class for implementation specific graph library.
Defines the API to keep away from the caller the implementation details of the underlying graph library. Handles access to nodes and provides vital access methods for implementing semantic similarity measures without deep knowledge of the underlying graph. Conceptionally, we use a Python data object to hold all relevant data for semnatic similarity computation, which is attached to the internal graph's nodes. This data node is used to represent graph nodes outside of the graph. Data nodes can be altered outside this class, the changes will be reflected in the internal graph. Terminology: - @b GO @b ID. GO identifier as used by the Gene Ontology, eg @c GO:0006865. - @b Term @b name. GO term name, eg 'amino acid transport'. - @b Root. The root is the most unspecific annotation. All nodes' paths will end up at the root. This is due to the directionality of the GO graph. - @b Ancestors. Nodes that are less specific and are on a path from the node to the root. - @b Data @b node. A node attached to the implementation specific graph node that carries all relevant information used for semantic similarity computing. - @b Populated. At least one node has a count that is non-zero. A graph becomes populated if an annotation corpus has been attached to it. - @b Node @b count. Number of proteins that have been associated with a particular node. - @b Normalization. Process of computing probabilities and information content. Must be done after attaching an annotation corpus or after loading a populated graph. - @b Node @b frequency. Number of proteins that have been annotated with this GO ID or a more specific one. - @b Node @b probability. Node frequency / Root node frequency.
def cls.GOGraphBase.CGOGraphBase.__init__ | ( | s | ) |
Initialize the internal graph implementation.
Internal graph object is stored in @c self.G and may be accessed from the outside.
def cls.GOGraphBase.CGOGraphBase.GetAllAncestors | ( | s, | |
include = True |
|||
) |
Get ancestor nodes for all nodes in the graph.
@param include (@c Bool) Indicates whether to include the start data node (@c True) or not (@c False). @return A @c Dictionary. Key is the GO ID, and its value is a @c list of references to @ref cls.GOGraphNode.CGOGraphNode
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.GetAllNodes | ( | s | ) |
Get all data nodes.
@return A @c List of references to @ref cls.GOGraphNode.CGOGraphNode instances. By this way, data can be accessed and modified and remains attached to the graph.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.GetAncestors | ( | s, | |
goID, | |||
include = True |
|||
) |
Get ancestor nodes for a certain node.
@param goID (@c String) GO ID, eg. 'GO:0006865'. @param include (@c Bool) Indicates whether to include the start data node (@c True) or not (@c False). @return A @c List of references to @ref cls.GOGraphNode.CGOGraphNode. This list is empty if @c goID was not found.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.GetChildren | ( | s, | |
goID, | |||
include = True |
|||
) |
Get child nodes for a certain node.
@param goID (@c String) GO ID, eg. 'GO:0006915'. @param include (@c Bool) Indicates whether to include the start data node (@c True) or not (@c False). @return A @c List of references to @ref cls.GOGraphNode.CGOGraphNode This list is empty if @c goID was not found.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.GetDistance | ( | s, | |
srcGOID, | |||
dstGOID | |||
) |
Get the shortest distance between two nodes.
Notice that the graph is directed and that edges emerge from more specific nodes to less specific nodes. See the example GO IDs below. @param srcGOID (@c String) Source GO ID, eg. 'GO:0006860'. @param dstGOID (@c String) Destination GO ID, eg. 'GO:0008150'. @return The number of edges of the shortest path from the source to the destination node. This number equals -1, if any of the two GO IDs is not present in the graph.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.GetRoot | ( | s | ) |
Get the root data node.
The root node is the most general term and eventually each node ends up in the root node when traveling along the graph's edges. @return @ref cls.GOGraphNode.CGOGraphNode instance of the root.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.IgnoreRelationships | ( | s, | |
is_a = False , |
|||
regulates = False , |
|||
part_of = False , |
|||
positively_regulates = False , |
|||
negatively_regulates = False |
|||
) |
Do not take into account certain relationships.
@b Removes from the graph the edges marked with the relationships indicated by the parameters listed below. @b Attention: Then the graph is saved, these edges will be missing.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.IsPopulated | ( | s | ) |
Has the graph been populated?
A graph is considered to be populated, if some of its nodes are associated with counts greater than zero. This also means that the root will have a non-zero frequency. @return (@c Bool) @c True, the graph has been populated. @c False, the graph is still empty in terms of counts.
def cls.GOGraphBase.CGOGraphBase.ReadGraphML | ( | s, | |
ffn | |||
) |
Read a graphml-formatted gene ontology from a file.
Loads a graphml-formatted file into this object. The graph represents the ontology and may have only nodes with @c id and @c name vertex attributes, but it can in addition be furnished with @c cnt and @c freq attributes. The latter is a shortcut to avoid reading an annotation corpus and populating the graph. @param ffn Full path file name. @return (@c Bool) @c True, if the graph was read successfully. @c False, if the file could not be opened.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.ReadOBO | ( | s, | |
ffn | |||
) |
Read an OBO-formatted gene ontology from file.
@return (@c Bool) @c True, if the graph was read successfully. @c False, if the file could not be opened.
def cls.GOGraphBase.CGOGraphBase.SetCountsFromAC | ( | s, | |
ac | |||
) |
Populate graph with counts from an annotation corpus.
The annotation corpus delivers information how many GO IDs are associated with proteins. For each GO ID, the number of associated proteins is added to each of the ancestor nodes. After completion, the graph is normalized by _Normalize and is ready for semantic similarity calculations. @param ac @ref cls.AnnotationCorpus.CAnnotationCorpus instance. @return (@c Bool) @c False, if the annotation corpus name does not match the ontology's root name defined by the graph. An error message is written. @c True, if the graph has been populated successfully.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.UpdateAllNodes | ( | s | ) |
Transfer content of all data nodes to the internal graph.
See @ref UpdateNode.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.UpdateNode | ( | s, | |
goID | |||
) |
Transfer content of data node to the internal graph.
Data is made persistent by transferring the content of the attached data node to the internal graph data structure. By this, information will be saved to files when writing graphs. Currently, the following information is transferred: - GO ID as @c id. - Term name as @c name. - GO ID-associated protein counts as @c count. - Frequency of occurrence of this node or any more specialized nodes as @c freq. @param goID (@c String) GO ID, eg. 'GO:0006865'.
Reimplemented in cls.GOGraphIGraph.CGOGraphIGraph.
def cls.GOGraphBase.CGOGraphBase.WriteGraphML | ( | s, | |
ffn | |||
) |
Write the graph to a file.
Writes the graph to a file using the internal node and edge attributes. Since data nodes may have changed after reading, a call to @ref UpdateAllNodes is performed here as an operation common to all subclasses. Currently, the exactly those attributes listed in @ref UpdateAllNodes are written to the graph file. @param ffn Full path file name. @return (@c Bool) @c True, if the graph was read successfully. @c False, if the file could not be opened.
Reimplemented in cls.GOGraphNetworkX.CGOGraphNetworkX, and cls.GOGraphIGraph.CGOGraphIGraph.