IREENE – ML-based tool for topic modeling and documents similarity measurement and visualization
IREENE is a system for analysing the semantic similarity of texts in a repository. It displays the most relevant articles based on a set of supplied keywords, a metric to assess the similarity between the papers, and the similarity between all the documents in a latent space. IREENE additionally includes data extraction, recognition of essential keywords utilised throughout texts, and frequency of recurrence analysis.
Advantages of IREENE system
IREENE adds significant capability to product knowledge management in modern manufacturing enterprises.
Topic modelling and semantic representation of existing documents’ knowledge graphs might minimise the time necessary for the manual processing of essential documentation by individuals involved in product management across several organisational verticals.
It has already achieved great success in offering insights for product management, including but not limited to operations, compliance, R&D, and intellectual property rights.
State-of the art
Every manufacturing organisation must deal with a substantial volume of external documents. Intellectual property and fundamental standard compliance must be studied and analysed before development. Every day, patents (including Standard Essential Patents), technological standards, and scientific papers are searched across all sectors.
However, the amount of relevant textual materials available is huge. Over 3.4 million patent applications were filed globally in 2021, with the number growing by 5-9% each year since 2011. In addition, the average word count in patent applications has increased throughout the 1990s, surpassing 7,000 in 2007. An average reader would need 200 years to read (nonstop) 3.4 million patent applications without titles, abstracts, or references.
A quick glance at the most prominent standards bodies demonstrates the breadth of accessible sources. There are 22,538 ISO standards, for example, and over 1,300 IEEE standards. With the rising digitisation of the sector and current technical advancement, we anticipate that numbers will rise. Over 50 million scientific articles have been published by 2010, and the overall quantity of scholarly papers is doubling every nine years.
Patent information is used in a variety of contemporary organisations, including strategic management as a foundation for competitive environment monitoring, technology assessment, or even R&I portfolio management, design and engineering, to name state-of-the-art research, and legal when functionality, design, and implementation technique are studied in the context of the so-called “Freedom to Operate” analysis to determine whether the development and marketing of a product is permissible.
IREENE (Information Retrieval Engine) is a solution to this need of providing methods of processing unstructured text documents in order to create a knowledge graph representing the contents of available sources.
The Solution: How does it work?
In the case of the digital industry, data-driven engineering and manufacturing refer not only to machine-generated data fed through IIoT but also to the vast accumulation of unstructured data, including textual content written in natural languages. The volume of available data is even bigger as virtual organisations build on the free flow of information and knowledge between direct partners and third parties.
Design, engineering, manufacturing, and other processes of industrial enterprises are deeply embedded in textual data, usually human-generated content such as patent files, scientific publications or industrial standards like IEEE or IEC. In order to embrace both the volume and potential of pertinent but heterogeneous data, it is necessary to make it machine-readable first. This is where IREENE comes in.
Input files to IREENE could include a wide range of inputs such as patents, user requirements sheets, customer feedback, troubleshooting descriptions, failure, and fault reports, insights from previous projects, regulatory considerations, engineering standards such as those defined by ISO, IEEE, or IEC, not to mention product-relevant scientific publications.
IREENE processes input files of different formats (e.g. text documents, spreadsheets, presentations) in order to create a knowledge graph representing the contents of processed sources. The data sets used in the development have been subjected to topic modeling, which as an unsupervised machine-learning technique to detect similarities between documents and cluster expressions that statistically define the contents of a document in the most accurate manner.
IREENE uses a topical model to enable functionalities of (a) smart semantic search and (b) visual knowledge-graph browsers. Smart semantic search and visual knowledge-graph browser are the enablers to apply the Business Platform for Distributed and Decentralized Data Exchange Ecosystems not only to the traceability use case but for Electronics and ICT as an enabler for the digital industry and optimised supply chain management covering the entire product lifecycle in large ecosystems.
The ambition is to analyse documents and find similarities in a way that search engines like Google are possible in a B2B environment and, by that enabling a Product Life Cycle Management.