Multilingual and Thesaurus-Based Search Tools for ILTER Data
Data in ILTER data archives are only useful if it can be found by researchers. At the international-level the challenge of searching datasets is compounded by the need to deal with multiple languages. A series of workshops in China explored the challenges of creating an information management system for the International LTER. During a 2008 workshop at Lake Taihu, participants recommended that each ILTER region host a Metacat to house network metadata as Ecological Metadata Language documents. Since that time, Metacat-based metadata catalogs have been established in Taiwan, Japan, Spain, Brazil, and Malaysia. A second workshop in Shanghai, China in 2012 explored the options for using a multilingual controlled vocabulary that would allow researchers to discover ILTER data on an international scale. The latter workshop led to development of prototype search tools that incorporate both translation and search enrichment services. The search enrichment services allow automated search on more specific terms (i.e., “narrower terms”). Thus a search on “forest ecosystems” would include datasets whose metadata included "boreal forests", "clearcuts", "forests", "old-growth forests" and "old growth” as well. Adding a translation layer adds additional search terms ("Bosque", “ Foresta", "Forst", "Forêt", "Las", "Metsä", "Skog", "Wald", "森林", and "皆伐"). The prototype tools use EnvThes (an existing multilingual thesaurus that already fully incorporates the U.S. LTER controlled vocabulary) as their thesaurus, but are web-service based, allowing them to be incorporated into a wide array of customized searching applications. Prototype tools can be seen at: http://vocab.lternet.edu/ILTER