METU-MMDS | Multimedia Database Research Group

Semantic Content Extraction, Storage and Querying of Visual, Audio, and Text Data in Videos (METU-MMDS)

Dataset

We download news videos (total 18000 seconds) from NTV news archives and categorize them into the accident, military, natural disaster, sport, and politics categories. We also create a concept list that is a subset of LSCOM concepts (Following table). Then the shot boundaries, keyframes, visual objects/concepts, audio concepts, and subtitle texts are annotated manually for all of the video clips. Briefly saying, each video is split into some shots so that each shot represents almost the same scene. Then some Key-frames are extracted for each shot. For each Key-frame, concepts related to the scene are annotated using the previously created concepts list. Furthermore, for each audio segment, a proper audio concept is assigned. In the end, the subtitles of shots are manually extracted and converted to the named entities. In addition, some words are selected from subtitles as important words. Also, by integrating various modules and analyzing these shots automatically, we create a dataset. This dataset contains the shots that concepts’ scores are computed automatically for each modal.

Visual Concepts
Basketball_Ball	Football_Ball	Airplane	Bus	Fire	Gun	Motorcycle	Tennin_Net
Basketball_Field	Football_Field	Ambulance	Camera	Fire_Truck	Helicopter	Mountain	Tennis_Ball
Basketball_Hoop	Football_Player	Bicycle	Car	Flag	Ice_pist	Person	Tennis_Court
Basketball_Player	Football_Refree	Bridge	Cloud	Goalpost	Ice_Skater	Person_Front	Tennis_Player
Basketball_Refree	Armed_Person	Building	Desert	Greenery	Missile	Person_Side	Tennis_Racket
Race_Car	Radar	Road	Sky	Smoke	Snow	Tank	Water
Tree
Audio Concepts
Emergency_Alarm	Car_Horn	Gun	Bomb	Automobile	Motorcycle	Helicopter	Wind
Water	Rain	Applause	Crowd	Laughter	Outdoor	Nature	Meeting
Violence
Text Concepts
Brazil	UN	Injury	Voting	Operation	Impossible	Traffic	Basketball
USA	Kaddafi	Accident	Erdogan	Cease fire	Casualty	Disaster	Football
11 September	Bahrain	Car	International	Conflict	Terror	Homeless	Victory
Alcohol	Japan	Contest	Intervene	War	Suicide	Minister	Derbi
United Nations	TSK (Turkish Armed Forces)	Rocket	Agreement	Target	Volcano	Goal	Valencia
Libya	CHP(Party name)	Enemy	Confirmation	Destroy	Flood	Tennis	Arsenal
China	AKP (Party name)	Violence	Flying	Fire	Earthquake	Tournament	Real Madrid
Germany	Besiktas (Football team)	Vehicle	Forbidden	Missile	Person	NBA	Formula 1
Iran	MHP (Party name)	Death	Aid	Bomb	Politic	Smash	Power
Italy	Hidayet (Player name)	Bus	Precaution	Defense	Police	Match	Fly
Russia	BDP (Party name)	Attack	Country	Headquarter	Selection	Star	America
France	Fenerbahce (Football team )	Army	Civilian	Champion	Region	League	Barcelona
England	Galatasaray (Football team)	Final	Parliament

Project scopes

In this project, by using visual, audio, and text data of videos (multi-modal), the semantic contents are extracted automatically, stored in an appropriate format, and then a prototype system is developed that can answers the queries efficiently. A new video that is uploaded to the developed system primarily is pre-processed to obtain the corresponding visual, audio, and text data. In order to extract the semantic content of the visual, audio, and text data, three separate modules are developed for each modal. Then, the information obtained from these three modules is analyzed and integrated. Afterword, the incomplete data are concluded and the duplicate data are cleaned. These steps prepare the data to be stored in the database. Finally, the fusion process is applied to this data. The fused data obtained from the video are stored in the Intelligent Fuzzy Object-Oriented Database System which is previously developed by the researchers in a TUBITAK 1001 project. The intelligent Fuzzy Object-Oriented Database System mainly is consisted of a fuzzy knowledge base and a fuzzy object-oriented database. In the domain of this project, large multimedia data are stored in the object-oriented database. Furthermore, by employing some domain-specific rules in the knowledge base and using the data which is stored in the database, new semantic information is extracted. Additionally, in order to answers the queries regarding both the semantic content and the low-level features, an index structure is developed. In the proposed system, fuzzy and uncertain data also can be processed.

The main contribution of this project is fussing the different modals (visual, audio, and text) which are obtained from a video and thereby, creating a more complete semantic data structure that can be stored in a database and queried effectively.

In addition, it is evaluated that the obtained results of the project fill a big gap in the academic literature. During the project, 7 journal papers and 21 conference papers (19 international, 2 national), which make 28 in total, are published. An opportunity is provided for 4 Ph.D. and 6 Ms students, who took responsibility during different terms of the project, to work on and accomplish their thesis.

This project is supported under the SCIENTIFIC AND TECHNOLOGICAL RESEARCH PROJECTS SUPPORT PROGRAM by TUBITAK with the grant number 109E014.

The above demo video shows how to use METU-MMDS for (i) extracting semantic content from videos, and (ii) querying multimedia data using various types of queries.