Presentation
Natural Language Processing and Knowledge Engineering to Extract Models From Text
Publication Date: 9/21/2022Start Date: 2022-09-21
End Date: 2022-09-22
Event: AI4SE & SE4AI Workshop 2022
Event: Stevens Institute of Technology, Howe Building, Hoboken, NJ
Lead Authors:
Dr. Carlo Lipizzi
Knowledge and experience and their virtuous relationship play an essential role in any form of engineering and system engineering more than others for its nature of dealing with complexity.
A proper representation of knowledge about a given topic is essential to build systems able to "understand" the topic and the system working on it. Without a model/formal representation of the domain knowledge, potentially useful applications - like digital twins, for example - would not have any practical application.
While the philosophical representation of knowledge has been debated for centuries, it is just decades since AI/ML is trying to have an automatic representation of it. The approaches have been along the 2 lines of the philosophy: the "symbolic" approach - following the rationalist like Descartes - and the machine learning, along the "empiricists", like Hume.
Symbolic approach - like in rationalism - is based on preset, top-down statements about the knowledge, using symbolic representations like taxonomies/ontologies or rules. In systems, this is so far the most common approach, collecting information about the specific system from subject matter experts, representing their insights in a symbolic way and then test the resulting model on relevant cases.
Systems can constantly change, experts may not be available or may not provide a complete/unbiased representation of the system. In an environment with a growing quantity of factual data about the system, this potential knowledge is basically wasted.
Machine Learning approach - like in empiricism - presume we have data providing a complete coverage of the knowledge of the domain. Once we have the right data, we "understand" the environment using a sort of pattern recognition.
We may not have data to fully cover the domain knowledge. Also, most of the times, we humans understand the environment using knowledge that is not strictly in the domain knowledge but can come from related topics or from generic common knowledge.
In most of the engineering applications, "understanding" a domain means having a formal representation of it. It could be an Entity-Relationship or a Systemigram or a causal chain. Those formal representations are along the lines of systems modeling. Once we have this representation, we can maintain, evolve or digitally duplicate the system as needed. Basically, the process would be to have a model to represent domain knowledges and then apply it to extract the knowledge of a specific/target domain.
When we talk about data related to a given domain, we talk primarily about text, being a good portion of the knowledge-rich information in this form. Because of the vast diffusion of digital communication, the majority of both communication and documentation is now available in digital format, making a body of knowledge potentially significant.
This study is in the domain of Natural Language Processing/Understanding, one of the fastest growing areas of AI.
Last year we presented a research based on a review of what has been done in terms of extracting formal representations from text, introducing a proof of concept for one of the possible approaches.
This year, we present a comparison between the 2 main approaches - the symbolic and the ML - and expand the knowledge graph approach introduced last year.
Preliminary results are showing that ML approach is providing great results when there is a large availability of data, and the domain is generic. Symbolic approach seems to provide good results with lower data and more narrow/specific but static domain.
Knowledge graphs are becoming an overused and too generic term to convey a specific value. We will present a sample.