The Helenos Project
KDD Workbench for the Semantic Web

The XML_OWL Module

For this section it is assumed that the reader is familiar with the XML and OWL module. The XML_OWL module basically combines the XML and OWL module, but incorporates more background knowledge into the data-mining process by linking XML and OWL data sources. Instead of going through each workflow step, we will just focus on those steps that are relevant for this module.

DataSelection

Create a new project named xml_owl and add the following files from the examples/trains folder to the project:

Workflow

Create a new experiment. The workflow configuration will appear. Choose XML_data as operation for the DataSet task. This way, we will choose nodes from the XML input file and add background knowledge from the ontology to them.

Create an instance of the workflow and start it.

DataSelection

We will use the same dataset as we did for the XML module. Your dataset should look like this:

DataLinking

This task is the crucial element of the XML_OWL module. We will add background knowledge to our selected XML nodes from the ontology. To accomplish this, we need to map XML nodes to OWL individuals with the following proceeding:

Afterwards, continue the worklow.

DataMining

Take a look at the modes. You will notice that now the modes, which were generated for the XML and OWL data, both appear.

Progol uses a technique called "mode directed inverse entailment". On the one hand the idea of the modes is to guide the search through the input data. On the other hand the input data generated by Helenos take the mapping between XML nodes and OWL individuals into account. Hence, XML and OWL are integrated for the data-mining task.

Run Progol by clicking "Run" and sending the "generalise(node/1)?" command.

Remove the "modeb(*,trains_Eastbound(+object))?" mode to guide the search for a general rule into another direction. Run Progol again.

Removing the "modeb(*,trains_next_car(+object,-object))?" mode yields the following result:

Conclusions

Consider all the results we have obtained:

  1. node(A) :- trains_Eastbound(A).
  2. node(A) :- trains_next_car(A,B), trains_shape(B,trains_shape_2).
  3. node(A) :- has_elem_direction(A,B), has_text(B,east).

Our selection was based on some criterions that only were available in the XML input. Incorporating background knowledge from the OWL input by mapping XML nodes and OWL individuals, the first and second result show that the background knowledge can serve to derive better rules. Without the ontological background the data-mining task would have only resulted in the third rule.

Of course, this is only a starting point. Especially the mapping process is tedious. But if you have a domain described by some XML input, and there exists an ontology about it, and you are able to come up with a proper mapping, Helenos provides a way of integration in order to assist your data-mining needs.

© 2003-2006 AIFB - OntoWare Team