Maria Cristina Marinescu, coordinator of the Saint George on a Bike project and postdoc researcher at Barcelona Supercomputing Center (BSC), along with Joaquim Moré López, senior researcher and expert in Computational Linguistics also at BSC, will participate in the DCMI panel which is taking place online on 13 October 2021 at 14:00 CEST.
Automated indexing is only as good as the training set, or rules that are available for the domain. It’s important to learn what type of content a pre-trained algorithm has been trained on. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. Scholarly and historical content is not available in consumable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine where large well documented collections are available.
This panel will discuss the current state of automated categorization covering domains including research data, art history, and scientific publishing. The goal is to provide practical advice on how to take meaningful steps towards building the infrastructure needed for sustainable automated indexing.