5 November 2018
New article SEMANTIC SEARCH BASED ON NATURAL LANGUAGE PROCESSING – A NUMISMATIC EXAMPLE
Iconographic representations on ancient artifacts are described in many existing databases and literature as human readable text. We applied Natural Language Processing (NLP) approaches in order to extract the semantics out of these textual descriptions and in this way enable semantic searches over them. This allows more sophisticated requests compared to the common existing keyword searches. As we show in our experiments based on numismatic datasets, the approach is generic in the sense that once the system is trained on one dataset, it can be applied without any further manual work also to datasets that have similar content. Of course, additional adaptions would further improve the results. Since the approach requires manual work only during the training phase, it can easily be applied to huge datasets without manual work and therefore without major extra costs. In fact, in our experience bigger datasets generate even better results because there is more data for training. Since our approach is not bound to a certain domain and the numismatic datasets are just an example, it could serve as a blueprint for many other areas. It could also help to build bridges between disciplines since textual iconographic descriptions are to be found also for pottery, sculpture and elsewhere.
Author: Ulrike Peter
Further resources: A ZIP file containing a) SQL dump of the tables representing the lists and hierarchies b) mapping files for the translation into RDF c) an MS Excel file containing the results of our evaluations on OCRE data.