AI & het archief

Using media for machine learning, using machine learning for media

The Netherlands Institute for Sound & Vision is one of the largest audiovisual archives in Europe with around one million hours of content and still growing. For parts of this media collections, there are also descriptions available, ranging from manually generated topic labels to subtitles for the hearing impaired. Using these labels (and combinations of labels) interesting data sets can be created for machine learning applications. Also, NISV is experimenting with 'crowdsourcing' approaches to label content, either via a Mechanical Turk approach or by eliciting visitors of the Museum to provide ground-truth information in a play-full manner.

NISV would be very interested to collaborate with researchers and research groups to investigate if the archive can be used for training models for applications such as:

  • entity extraction
  • topic segmentation and modelling
  • sentiment analysis
  • sign language recognition
  • child-speech recognition
  • violent scene detection

Also, together with our partners in the Dutch media industry, we are developing benchmark evaluations to access quality and adequacy of AI tools such as speech recognition and computer vision.

Please contact us if you are interested to work with us in our data LABS!