The situation is the same for everyone: Manager, patent attorneys, product developer, scientists and whole R&D departments are drowning in documents, but starving for relevant ones.
Distinguishing relevant from irrelevant documents is a time-consuming process and can in some cases even be a key success factor. Especially for R&D departments it is important as e.g. synonyms, a growing number of publications and ambiguous nomenclature impede the search for relevant documents. Therefore, keyword-based search engines often lead to unsatisfying results. The currently available technologies make it necessary to repeat this search at regular intervals to include recently published materials.
Approach
The objective is to develop a prototype of a document recommender system, focusing on research papers, called Scienstein. This recommender system supports e.g. the R&D department in identifying relevant documents by combining various existing and new technologies.
In the Scienstein project, existing search and recommendation engine approaches are combined and further developed. Some of these newly researched approaches are advanced citation analysis, collaborative document evaluation and document usage mining. Although some of the utilized approaches have been known for decades, they have not been applied in the context of recommender systems. Other approaches such as the ‘citation proximity index’ or ‘in-text impact factor’ were developed exclusively.
Since the start one year ago, the theoretical foundation for the project has been elaborated and published. New approaches for automatically classifying documents like ‘Citation Proximity Analysis’ and In-Text Impact Factor’ have been developed. The corresponding software development process of the prototype has reached a development stage that already allows for a public beta test in the second quarter of the year 2009.
Conclusion
The project addresses the fundamental problem of identifying relevant documents. It increases effectiveness and efficiency especially of R&D and eases collaboration beyond organization borders by promoting information flow amongst participants. This reduces for instance the time-to-market for new products and saves resources.




