26. March 2018
CNT took part in CAA Tübingen
Karsten Tolle gave a talk on 'Data quality experiences within the project Corpus_Nummorum Thracorum'. He reported on the import process and its problems as well as on our project goals and the current status of realization. Here's the abstract: The DFG funded project Corpus Nummorum Thracorum (CNT) has collected and published ancient coin data for a specific area (Thrace) and timespan. The goal of the project is to unify the entire known data from this space in order to analyse it and to propose and publish a type system. Within the CNT-database coin data from some 120 collections are merged. While importing data, we encountered various error-prone cases. Sources might come with their own weaknesses and errors. The implementation is done by IT-experts who cannot judge each case, but manually checking every imported entry by domain experts would be too expensive. We implemented some tests in order to avoid known problems, but this is far from being complete. We mainly rely on visualisations and query interfaces that can be handled by the domain-experts in order to approve new data. Were everyone to model and publish their data according to the norms of Nomisma.org, many problems on our side could be avoided. However, these LOD sources are still in the process of development. They also contain errors and duplicates; concepts might not yet exist and others become deprecated. We are about to finish mapping our data to the Nomisma.org ontology and will implement our quality checks on this level. The advantage would be that these checks are independent of our database structure and therefore could be used by others. We will report on our experiences and first attempts to improve our data quality
Author: Angela Berthold