Mining Implied Semantics of Database

To mine richer semantics from relational database data, method of mining was discussed. Aiming at specific type of semantics, thirteen rules were proposed. Applying these rules some implied semantics was found naturally. The research shows that these rules have good maneuverability and high efficiency.


Introduction
To solve the problem, Information Island, semantic technology is used to realize interoperation between information systems [1], which helps to reach consistent data interpretation between information systems.During the course, relational database data will be transformed into semantic data.Whether the interoperation is successful depends largely on the quality of transformation.Some technologies have realized automatic transformation, which are on the basis of database schema.But most semantics of relational database data are implied in application programs, database schema cannot embody them.They are implied semantics.So how to mine implied semantics of relational database data cannot depend on existing automatic transformation technologies.
Traditional methods of building ontology are scientific methods, not engineering methods, which have poor maneuverability.With these methods, quality of ontology depends highly on ability and experience of knowledge engineer.To mine implied semantics of relational database data, a set of mining rules are proposed in this paper.The method has high maneuverability and helps to mine semantics with high quality.

General Methods and Principles of Building Ontology
How to build ontology?There is no general method widely accepted.Some methods are often referenced [2].For example, Stanford University puts forward seven steps of building ontology; METHONLOGY method has initially been proposed to design chemical ontology, later, it's applied into other domain, which has three stages.They are only rough steps or stages, not detail methods.
Gruber [3] proposes five rules of building ontology which are widely accepted.
(1) Clear.Definition should be objective, not affected by background.(2) Consistency.Conclusion from existing knowledge should not be contradicted with existing knowledge.(3) Scalability.Basic concept should be provided for foreseeable new task.If there is new concept, it can be defined on the basis of existing concept.(4) Minimum code preference.Code should not depend on some single representation method.(5) Minimum ontology commitment.Constraints on axiom should be the weakest, basic vocabularies should be only necessary.
In fact, these five rules are contradicted.We can only place emphasis on part rules during building ontology.

Ontology Learning
The work building ontology is complex, exhausting and low efficiency, although some tool software can ease the workload.To solve the problem, some algorithms are designed to generate ontology automatically or semi-automatically based on existing data.These technologies are called ontology learning.According to data which have different structure degree, ontology learning methods are classified into three categories.
Because unstructured data have no fixed structure, ontology learning is hard relatively.Free text is typical unstructured data, and is also important data source of ontology learning.There are three learning methods based on free text, concept learning, relationship recognition and axiom generating.
Concept learning is mainly based on linguistics and statistics.More methods can realize relationship recognition, such as method based on dictionary, method based on concept clustering, method based on association rule, method based on model driving, formal concept analysis, machine Bayesian classification, decision tree learning, etc.
Axiom generating has fewer researchers.Hasti system can extract axiom from free text automatically.It analyzes structure of sentences and apply predefined model to extract axiom matched the model.
Semi-structured data have certain structure, but not strict, such as HTML document, XML document and knowledge base.Typical technologies are as follow.
HTML document, XML document, RDF document are semi-structured data, invisible structure hidden in documents can be found and transformed automatically.
Suryanto and his colleagues have designed algorithm to extract ontology from existing knowledge base.
Structured data have fixed structure; can be described with unified data model.Relational database data are biggest structured data now.So how to generate ontology from relational database data play a major role in the study [4].There are two main methods: static method and dynamic method.With static method, database schemas are transformed into concepts and properties of ontology, database records are transformed into ontology instances once and for all.With dynamic method, database schemas are mapped into concepts and properties of ontology, while database records have not been transformed; according to every semantic query, SQL statements are generated and executed, matched records are encapsulated into ontology.

Mining Implied Semantics
If we build ontology observing methods and principles mentioned above, we will find there aren't specific steps, only has spirit.Every designer does work according to his own understanding.If we use existing automatic learning technology, only superficial semantics of data can be found.
To mine more semantics from given relational database, four kinds of mining rules are putted forward.

( 1 )
Concept subdivision Rule1: Assume that relation R corresponds to ontology concept C, field f of R corresponds to property p.If value of field f might be empty, then next can be concluded: NewC rdfs:type owl:class p rdfs:domain NewC Advanced Engineering Forum Vol. 1