« Self Organizing Maps « || Inicio || » Algoritmo A* »

Topics Navigator

Última modificación: 7 de Julio de 2015, y ha tenido 62 vistas

Etiquetas utilizadas: || || || ||

Topics Navigator is a prototype model to experiment with some uses of graphs and hypergraphs for storing, recovering and analyzing connected information.

Note: In this document the terms Topic and Node are used with the same meaning, and so with the terms Relation and Edge.

                                                                                                                   Table of Contents
  About the tool...
      Modules and Extensions
      Download
  Graphs and Hypergraphs
  Playing around with the tool
      How it works
      Working with files
      Contextual Menus
      Action on Families
      Searchs
      Layouts
  Modules Operations
      Visual options
      The Schema of Connected Information
        Retrieving Information: Query Traversals
      Exporting
      Networks with Time dependency
      Measures on graphs
  NLG Format File

IMPORTANT NOTE

Please, remember that this tool is only a prototype to test some ideas on semantic networks and hypergraphs. I am not trying to provide a final tool and I am changing it frequently to adequate it to my personal/research needs. Consequently, it is not finished, not full documented (sometimes documentation and program are not in the same version), some features can be out of work if I changed something some other part where, etc.

I'm trying to mantain the source code documented, but you know...


About the tool...

This tool is being made with the wonderful software NetLogo. If you want to know more about this software click on this link. So, strictely speaking, this tool is a NetLogo Model (and needs NetLogo to run) that makes use of the agents filosophy of ABMs to deal with nodes/edges (topics/relations) in a semantic network.

MODULES AND EXTENSIONS

In order to work, this NetLogo Model requires the next extensions:

  • Table: to manage the attributes of the nodes and edges.
  • Goo: to provide a more compact and clean interface.
  • Pathdir: to facilitate the access to local files.
  • Bitmap: only to show the logo in the main screen of the navigator.
  • Network: to manage some functions on Measures module.
  • String: for several string manipulation.

All these extensions can be downloaded from the extensions' NetLogo page. Also, they are distributed on the ZIP file that you can find here together the Topics Navigator source. In that page you can find all the references and information about authors of these extensions.

The code is divided into several source files:

  • Main Code: Abstract layer to manipulate topics and relations.
  • Visual.nls: contains the procedures in charge of visual features (layouts of graphs, for example).
  • Measures.nls: contains some general procedures for measuring graphs.
  • Files.nls: contains the procedures for exporting information and manage NLG format.
  • Query2.nls: contains all the procedures for Schema and Queries operations.
  • Auxiliar.nls: for procedures not contained in the other ones.
  • ContextMenu.nls: contains the procedures for Context Menu features.
  • Creation.nls: procedures to Data manipulation (add/remove nodes/edges).
  • Timeline.nls: procedures to work with time dependent graphs.

The model is an "in development" work... so it is for sure full of errors, and a lot of things could be done better and more robustly. It has been growing from an "ad hoc" primitive tools to manage some connected databases and it lacks of a long term project, that is the reason (and the limitations of the author) there is no perfect matching between all of its pieces.

DOWNLOAD

Sources + Extensions + Testgraph1: 03/Aug/2012


Graphs and Hypergraphs

Informally, a graph is a ser of objects called vertices or nodes connected by linkss called edges or arcs, that allow to represent binary relations between elements of the set. Usually, a graph can be represented by a set of points joined by segments in tha plane.

From a practical point of view, the graphs allow us to study the relationships between units that interact with each other. For example, a network of concepts in a domain can be represented and studied by a graph in which the vertices represent the various concepts that are studied and the edges represent connections (which are considered appropriate in the domain, for example, synonymy, antonymy, comes from, etc.). Virtually any problem can be represented by a graph, and their study goes beyond the various areas of sciences and social sciences.

Depending on how we consider the edge joining the vertices, the graphs can be classified as directed (when the edge starts from one of the vertices and reaches the other, ie the role of both nodes in the relationship is not same) or undirected (where the edge simply connect the vertices together). Typically, the directed edges are represented as an arrow (as seen in the figure below) and the undirected ones as segments.

A path in a graph is a sequence of vertices such that from each of them there is an edge to the next vertex. In the figure below we highlight a path joining the vertices 3 and 4.

Additionally, you can consider weights on the edges that may indicate some kind of feature of it (length, flow of information, connection strength, etc.).. In this case, we say that the graph is weighted.

Sometimes the kind information that relates the nodes not just connect the vertices in a binary way, ie, two by two, but the relationships are between larger sets of vertices. For example, suppose we have a network of tourism in which tourists consider traveling and visiting areas, in this case, relate only that tourist v1 visited the area v2 may be insufficient because it depends on the year in which he did it, in this case we introduce a third type of vertex, temporal, and the relationship is between the three simultaneously: the tourist v1 visited the area v2 in the year v3.

In this case, we have more expressive possibilities, for example: v1 and v4 tourists visited the area v2 together in the year v3. Of course, the repersentation of this kind of relationships is more complicated and we cannot simply represent segments by connecting the dots.

This type of graphs with stronger edges are called hypergraphs, and the edges of this type hyperedges.

Because representations of hyperedges are much more complex (and confusing) from a technical point of view, and as a demonstration of the versatility you have graphs, we will see there is no need to use hyperedges although our relations are not binary, it is possible to "create" intermediate types of vertices, representing those hiperaristas (artificial vertices, not belonging to the real world being modelled) so that if v1, v2 and v3 are vertices that share an hyperedge, we create the vertex h1 and binary edges linking it to those vertices. Thus we don't lose the information we want to store and continue to work with graphs without having to develop theoretical tools or new representation.

In the previous figure we see how an hyperedge linking the three circular nodes is represented by an independent vertex (gray, and highlighted in red) that links together.


Playing around with the tool

The look of the interface (with the subinterfaces hidden) of the Navigator is:

Here we introduce some of the main features it provides:

HOW IT WORKS

This tool can work with hypergraphs that mix edges of several types: undirected, directed and hyperedges in the same hypergraph.

The hyperedges will be represented by auxiliary nodes connecting all the topics in the hyperedge.

Topics and relations are classified in Families or Types, and can hold as many attributes as you want, but some of them are mandatory. See the NLG format file at the end of this document for more information about the structure of the data.

Topics Navigator allows you to browse the graph inuitivelly, providing several layouts to represent your connected data. It will also give you contextual menus to add/modify/erase information from your graph and retrieve information from big graphs by using a kind of query traversals to explore automatically the data. It also provides different analysis tools for your information, as well as exporters for other external tools.

WORKING WITH FILES

The Topic Navigator comes with the usual file options:

  1. New: Makes a global reset and unloads everything from memory.
  2. Add Data...: Loads an existing NLG file on memory without cleaning the current state. In this way you can load several graphs at a time. In this process, new nodes with the same IDs that nodes in current memory will be ignored (but it will add the edges between nodes).
  3. Save...: Saves the current graph (in memory graph, not only the visible part) on a file with NLG format.

Note: Pressing on "Files->" button the files menu will be shown/hidden .

CONTEXTUAL MENUS

The system provides two different contextual menus acting on the nodes:

  1. Active Browser: By pressing on this button you can act on every node with the main browser operations you can do over it:
    Icon Name Action
    Expand show all the nodes connected with it
    Leave hide all nodes except this
    Hide hide this node from the current representation
    Select select this node
    Fix fix this node (layout process will not affect it)
    Label hide/show the label of this node
  1. Active Data: By pressing on this button you can act on every node with the main data operations you can do over it:
    Icon Name Action
    Add Node allow to add a new node to the graph
    Delete Node remove this node from the graph
    Add Link add a new link between this node and an existing node
    Delete Link remove one of the links of this node
    Edit allows to modify the attributes of this node

ACTION ON FAMILIES

Using the next controls we can act globally on sets of nodes:

Together the families (or Nodes Types) defined in the structure of the loaded graph (in its file), there exists the following subset selections:

  • All: The action will be be applied on all the nodes of the graph (visible or not).
  • Visible: The action will be applied only on the visible nodes.
  • Selected: The action will be applied on the selected nodes.
  • Fixed: The action will be applied on the fixed nodes.

And the actions that can be applied on the selected family are:

  • Show: The nodes of the family will be shown.
  • Hide: The nodes of the family will be hidden.
  • Expand: They will be expanded, showing all the nodes connected to them.
  • Leave: The selected family will be the only shown.
  • Fix: They will be fixed.
  • List: They will be listed on the Topic Information box.
  • Select: They will be selected.
  • Deselect: They will be Deselected.
  • Show Label: Their labels will be shown.
  • Hide Labels: Their labels will be hidden.

Pressing on Execute! button, the action will be applied on the nodes of the selected family.

SEARCHS

With the Search Topic button you can look for any node in your network by using any piece of information. It will show you all the nodes that have your input string as part of any of theirs attributes.

Once you have selected the one of your interest, it will be shown on the world.

LAYOUTS

Topics Navigator provides some useful layouts to represent your graphs:

  • Spring: It is the layout that comes with NetLogo in the standard distribution.
  • ARF: Attractive and Repulsion Forces algorithm (http://www.sg.ethz.ch/research/graphlayout).
  • Hyperbolic: (http://graphics.stanford.edu/papers/munzner_thesis/html/node8.html).
  • ... any pair combination of the previous layouts.
  • Circular
  • Radial: using as root the node you choose by clicking on it.
  • Circular Bipartite: When the graph is a result of a simple query (what implies it is a bipartite graph) this layout can be useful to see the result in a grouped way.

Modules Operations

Using these buttons:

you can access to more complex features and controls for analyzing and representing your graph. In the next sections we will give a detailed description on these features.

VISUAL OPTIONS

  • Selected-color: opens a color dialog to choose how the selected nodes will be highlighted.
  • Customize labels: opens a dialog to choose the attribute to be used as label.
  • Zoom: Percentage of zoom on the representation (only on topic sizes).
  • Show-weights?: Decide if the weights of the relations obtained from the queries are shown or not.
  • Hide-Orphans?: If true, when a node has no visible relations, it will be hidden.
  • Show-all-relations?: If true, all the relations between visible nodes will be shown.
  • Show-relation-labels?: If true, the labels (edgetype) of all the relations will be shown.
  • Refresh: This button will update the representation of the graph when their features are changed.
  • Topic-Information?: It will show/hide a floating piece of information when the mouse is over the topic. It needs Active Browser or Active Data to be pressed.
  • Fix?: If true, fixed nodes are not affected by the layout.
  • R-Sort, V-Sort, H-Sort: Selected nodes will be arranged radially, vetically, or horizontally.
  • spring-length, spring-constant, repulsion-constant: parameters to be used by Spring and Hyp (Hyperbolical) layouts.
  • tension, radius: parameters to be used by ARF layout.
  • Gravity: It produces the nodes to be attracted to the origin. In this way, when there are several islands, they will not be going away in the world.

THE SCHEMA OF CONNECTED INFORMATION

In order to retrieve information from the graph using queries it is necessary to build previously the Graph Schema from the loaded data. Since this schema is obtained from the data you must do it after loading the graph and after main changes in your graph.

Obtaining this schema is really simple from the data:

  • One Schema-Node is added for every NodeType of your graph (and every hyperedge) and
  • Two Scheme-Nodes are connected if there exists at least one connection between nodes of these types in your graph.

Press on Build button to generate ths schema, Layout Schema to represent it in a more confortable way, and Show/Hide to show/hide the schema.

Remember that if you change the data graph (node or edges) maybe you nedd to generate the schema again.

RETRIEVING INFORMATION: QUERY TRAVERSALS

Just as in the Relational DataBases (RDB) there is relational language that allows us to have a query system (SQL language) in the Graph DataBases (GDB) is in development a set of languages that allow querying and transform graph. The traversals are becoming the most common and useful operations for querying these kind of databases, becoming so important to the GDB as the SQL for the RDB.

A traversal is formed by 3 main parts:

  1. A path on the schema, that indicates the traversal that has to follow the query to connect the two ends of the path. In Topics Navigator, this path is achieved as the sequence of nodes that comprise it.
  2. A constraint for each node in the path. That is, a condition that must verify the nodes such that, even existing a path as indicated in 1 that passes through them, is considered a valid path. This is what is called local conditions affecting each node of the traversal individually.
  3. A global constraint that can link the different nodes inside a valid path and that is the last filter that must pass to be considered as a valid result for the query.

The only mandatory part of a query is part 1, the other two are optional and provide additional filtering processes. An example that uses the 3 parts is the following query:

Connect Obra with Clas.Calvo if there exists a path verifying: 
(&1) Obra -> 
(&2) Localización ->
(&3) Jornada, verifying nombre = "I", ->
(&4) Localización ->
(&5) Acto ->
(&6) Verbo ->
(&7) Clas.Calvo
Where &2 = &4

Let us see step by step how to generate the previous query:

  1. To start a new compound query (you can simultaneously launch multiple queries of the above type) press the button New Q. Then click on Build P to start creating a single query. With this button pressed will go on clicking on the nodes of the path to be constructed; for each selected node a popup dialog appears where you can enter the node's local constraint, which may use the attributes of the topic. For example, in the above query will leave all blank except for the node of type Jornada, for which we introduce the text of the condition:
  2. Once we have finished generating the path, we must add it to the current compund query by clicking the Add P and then the system will ask for the global constraint. To reference the various nodes of the path the system provides a numeric label for each one, &n in the order they are generated. If, as in the present case, we look for nodes 2 and 4 in the path to be the same, just enter the following text in the dialog:

If you want to see the query in a more natural format generated just click on the button Show Query:

The result is shown below, and it is achieved by launching the query with the Launch! button:

Keep in mind that if you perform another query without clearing previous one the reult is added (and then the weights of the possible new connections if they exist) to the existing one. If you want to run the query on a clean state, you must press the "Clear.

The visualization of the results of the query is performed automatically after the execution of the query. Using the standard Layout button you can control the representation of the resulting graph.

For mathematical definition about traversals, click here:

EXPORTING

Topics Navigator allows (right now) three exporting methods in the queries (to be chosen after pressing Export button):

  1. Query Format: Saves the query (not the result) in a text file to be retrieved later.
  2. Table for Excel CSV (bipartite): Exports a table in CSV format in which nodes of origin / target are distributed by rows / columns storing the corresponding edge weights. Tools like Excel can import it in order to obtain more representations for easy interpretation of the results of the query.
  3. CSV Files for Gephi: Export nodes and edges in two CSV files that are ready to be imported into Gephi for a higher quality representation of the graph and to perform some standard measures of network theories.

NETWORKS WITH TIME DEPENDENCY

In order to work with graphs that change with time, Topics Navigator include a module to manage relations with an special attribute (Timeline(X)) storing time intervals where the relation is "active".

Once you have this attribute in your relations, you can store theses active intervals as a list of pair lists (see the example at the end of this document):

[... [t1 t2] ...], with t1 <= t2

The controls under the module Timeline will allow you to obtain a dinamical representation of your graph in time:

  • Start-Interval - End-Interval: Defines the active interval inside the total time of live of your graph (mnimum and maximun times are extracted automatically from the file)
  • Visualize: Show the active relations (and associated nodes) regarded to the actve interval.
  • Play (>), Play-Back (<): These buttons moves automatically the active interval forward and backward in an endless loop.
  • Speed: Set the speed of reproduction.
  • |<, >|: moves respectivally to the beginning and the end of the time.

MEASURES ON GRAPHS

THe system provides some common measures on graphs. For every visible node/topic:

  • Degree: measures the number of undirected relations connecting the node
  • Indegree: measures the number of directed relations entering the node
  • Outdegree: measures the number of directed relations exiting from the node
  • Clustering Coefficient: A measure of neighborhood indicating the type of environment in which the node situates. A node with low order (degree) may be surrounded by a variety of other nodes, small or large, which has a direct influence on its own centrality and growth potential. A network is assortative or disassortative depending on the similarity of the order (degree) among neighboring nodes, which can be tested by means of Pearson correlation (assortativity coefficient). Neighbor connectivity is the correlation between the order (degree) of nodes and the average order (degree) of their neighbors.
  • Rank: by applying the Page Rank algorithm.
  • Shimbel Index: (or Shimbel distance, nodal accessibility, nodality). A measure of accessibility representing the sum of the length of all shortest paths connecting all other nodes in the graph. The inverse measure is also called closeness centrality or distance centrality.
  • Eccentricity: A measure of farness based on the number of links needed to reach the most distant node in the graph.
  • Closeness: Closeness can be regarded as a measure of how long it will take to spread information from s to all other nodes sequentially.

Also with this module we can give several representations of these measures:

  • Histogram
  • Resizing the nodes according to their values.
  • X Versus Y to show relations between measures (the system ask for the second measure).

NLG Format File

The format for the NLG (NetLogo Graph) files are strictly the following:

% Types of Nodes of the graph. 
<NodesTypes>
Name(string) color shape(string) size ...
<EndNodesTypes>
% Types of Edges: Name color shape thickness Type(0: Undirected, 1: Directed, 2-Hyperedge)
% if Hyperdege: Name color shape size Type
<EdgesTypes>
Name color shape thickness Type ...
<EndEdgesTypes>
% Nodes of the graph. Structure: ID NodeType Attributes...
<Nodes>
"ID(X)" "NodeType(S)" "Att1(S)" "Att2(X)" ...
id1 "Type1" "value1" value2 ...
<EndNodes>
% Edges & HyperEdges of the graph.
<Edges>
"ID-List(X)" "EdgeType(S)" "Attr1(X)" "Att2(S)" ...
[id1 id2 id3] "Type1" value1 "value2" ....
<EndEdges>

Following you have an example:

% Types of Nodes of the graph. Structure: Name color shape size 
<NodesTypes>
"Mammal" brown "circle" 1
"Bird" blue "square" 1
"Reptile" red "triangle" 1
<EndNodesTypes>
% Types of Edges: Name color shape thickness Type(0: Undirected, 1: Directed, 2-Hyperedge)
% if Hyperdege: Name color shape size Type
<EdgesTypes>
"enemy" red "straight" 0 0
"eats" blue "curve1.0" 0.05 1
"escapes" green "curve-3.0" 0.1 1
"group" gray "conector" 0.5 2
<EndEdgesTypes>
% Nodes of the graph. Structure: ID NodeType Attributes...
<Nodes>
"ID(X)" "NodeType(S)" "label(S)"
1 "Mammal" "Dog"
2 "Mammal" "Cat"
3 "Mammal" "Horse"
4 "Bird" "Sparrow"
5 "Reptile" "Crocodile"
6 "Bird" "Eagle"
<EndNodes>
% Edges & HyperEdges of the graph. Structure: [ID-list] EdgeType Attributes...
% Special Attribute "Timeline"
<Edges>
"ID-List(X)" "EdgeType(S)" "Peso(X)" "Timeline(X)"
[1 2 3] "group" 1 0
[4 5] "group" 1 10
[3 5] "group" 2 20
[1 2] "enemy" 1.0 30
[2 4] "eats" 1 50
[3 1] "enemy" 1 60
[5 1] "eats" 1 70
[5 3] "eats" 1 100
[4 5] "escapes" 1 110
[6 4] "eats" 1 190
<EndEdges>

some things to be considered in the file:

  • You can comment lines (the parser will not consider them) by using the % symbol.
  • If you want to explicite the type of data an attribute will storage, you must add in the name:
    • (S): for strings.
    • (X): for evaluable expressions.
  • NodeTypes and EdgeTypes blocks must be before than Nodes and Edges blocks.
  • NodeTypes and EdgeTypes blocks has fixed structure, while Nodes and Edges only have the first two attributes ("ID" and "NodeType" for Nodes, "ID-List" and "EdgeType" for Edges) fixed, after them you can add as many attributes as yo want.

« Self Organizing Maps « || Inicio || » Algoritmo A* »