Idener Multi-Algorithmic Linker

Project Description

“Imalinker” system developed for the Organisation for Economic Co-operation and Development (OECD) by IDENER aims at enabling a wide range of analysis based on the multiple data combinations that may be obtained by linking any data source, such as patents, trademarks, design, scientific publications, enterprise data, etc.

Indeed, the OECD Directorate for Science, Technology and Industry (STI) has for some years developing a data infrastructure to support the methodological development and to enable the analytical work required by a number of Committees. This aims to provide further insights on innovation-related investments, outputs and activities; enterprise dynamics, and science and scientific productivity. To this end, STI has been gathering and linking several databases from public as well as private sources, containing information about patents, trademarks, and scientific papers. It has further been involved in an OECD-wide effort aimed to integrate and exploite private source data in the OECD data infrastructure.

Within this framework, “Imalinker” is being used by the OECD-STI as a tool for matching company names and linking patents to scientific knowledge . On the one hand, the names of firms included in the IP documents and the ORBIS(c) databases are being matched using a series of algorithms contained in the “Imalinker” system. Names are harmonised using country-specific “dictionaries” before running a series of string matching algorithms (mainly token- & string-metric -based) that compare the harmonised names from alternative datasets and provide a matching accuracy score for each pair.  On the other hand, the references to “non-patent literature” cited in patent documents are being parsed into distinct fields that are matched to the bibliometric records contained in the SCOPUS database.

Insights into Imalinker

Current “Imalinker” implementation (v2.0) is based on optimal combination of different string metric and other linking-oriented optimized algorithms. Finding adequate weights and workflow for these algorithms is the key of “Imalinker” outstanding efficiency when working with very large databases. As a result, “Imalinker” engine beats even the performance of  the most spread off-the-shell software packages. To reach these results, “Imalinker” has the following capabilities:

  • An exhaustive refinement of the string metric algorithms implementations on C# by a novel combination of diverse implementations.
  • Improved SQL DB communication capabilities suited for handling very large information sets.
  • Complex statistical pattern analysis of algorithm suitability for each particular linking case in order to peform weights and algorithm sequence optimization.
  • Automated distributed multi-user result storage server for process results and statistical related data, which allows better scalability.
  • Multi-threading and multi-user optimization of the framework, which allows even faster availability of results.
  • Multi-database and complex multi-field comparison system in order to optimize result refining tasks.
  • Easy and quick-response user applications in order to allow faster execution of the otherwise non-avoidable manual operation related to linking and matching tasks.

Project Details

  • Date 29 September, 2013
  • Tags ICTs, Private, Software Engineering
  • Start date December, 2010
  • End date April, 2013
Back to Top