Visual Search in Research Data

Due to technical advances in acquisition, processing and storage of primary research data, in domains such as Meteorology, Earth Observation, and Climate Simulation, increasing amounts of primary research data are collected. Data sets being of high value for current and future research are already often stored and made available by data center organizations, and are indexed by scientific libraries. While these trends improve the transparency and availability of scientific research data, access to these data sets is to date mainly based on metadata information. The goal of this project is to research content-based visual search and presentation techniques for user-friendly accessing and exploration of large collections of time-oriented research data sets. The aim is together with established metadata-based access methods to improve the overall accessibility of the indexed data sets. To this end, concepts such as a visual cataloguing and query-by-example and -sketch will be developed for collections of scientific time series. A specific challenge is seen in the appropriate combination of content- and metadata-based searching modalities.

Contact: Tatiana von Landesberger,Maximilian Scherer
Grant by: the Leibniz Gemeinschaft under the SAW program.
Collaboration partners: German National Library of Science and Technology (TIB) and Fraunhofer Institute for Computer Graphics Research.
Data repository: PANGAEA® - Data Publisher for Earth & Environmental Science.
Data type: Time-oriented research data, provided by the Baseline Surface Radiation Network (BSRN).

A visual catalog of a scientific time series dataset. Using a visual cluster algorithm, thousands of daily temperature curves from different stations all over the world can be arranged in one visualization.

Time series sketch editor. the user can select example patterns or create individual sketches of time series patterns to formulate a content-based query. The application executes the user-defined query and provides a result set of the most similar time series patterns from our index structure.

Visual catalog with a search result visualization. the colormap indicates similar time series patterns with blue colors, unsimilar patterns are denoted with red color values. The color gradient is highly homogeneous because the used cluster algorithm preserves the topologic ordering of the time series patterns.

Filtering the visual catalog. Visualization of occurrence of certain keywords from associated metadata. The white-yellow colormap indicates the number of patterns of each cluster cell that correspond to the filter query (also called density histogram). Yellow colors denote cluster cells with high density.


