Article ; Online: AUTOMATIC SUMMARIZATION OF WEB FORUMS AS SOURCES OF PROFESSIONALLY SIGNIFICANT INFORMATION
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki, Vol 16, Iss 3, Pp 482-
2016 Volume 496
Abstract: Subject of Research.The competitive advantage of a modern specialist is the widest possible coverage of informationsources useful from the point of view of obtaining and acquisition of relevant professionally significant information. Among these sources ... ...
Abstract | Subject of Research.The competitive advantage of a modern specialist is the widest possible coverage of informationsources useful from the point of view of obtaining and acquisition of relevant professionally significant information. Among these sources professional web forums occupy a significant place. The paperconsiders the problem of automaticforum text summarization, i.e. identification ofthose fragments that contain professionally relevant information. Method.The research is based on statistical analysis of texts of forums by means of machine learning. Six web forums were selected for research considering aspects of technologies of various subject domains as their subject-matter. The marking of forums was carried out by an expert way. Using various methods of machine learning the models were designed reflecting functional communication between the estimated characteristics of PSI extraction quality and signs of posts. The cumulative NDCG metrics and its dispersion were used for an assessment of quality of models.Main Results. We have shown that an important role in an assessment of PSI extraction efficiency is played by requestcontext. The contexts of requestshave been selected,characteristic of PSI extraction, reflecting various interpretations of information needs of users, designated by terms relevance and informational content. The scales for their estimates have been designed corresponding to worldwide approaches. We have experimentally confirmed that results of the summarization of forums carried out by experts manually significantly depend on requestcontext. We have shown that in the general assessment of PSI extraction efficiency relevance is rather well described by a linear combination of features, and the informational content assessment already requires their nonlinear combination. At the same time at a relevance assessment the leading role is played by the features connected with keywords, and at an informational content assessment characteristics of the post text in general come to the fore, ... |
---|---|
Keywords | professionally significant information ; web forums summarization ; relevant post ; informative post ; machine learning ; classification models ; regression models ; linear methods ; nonlinear methods ; latent Dirichlet allocation ; text connectivity ; social graph ; Optics. Light ; QC350-467 ; Electronic computers. Computer science ; QA75.5-76.95 |
Subject code | 006 |
Language | English |
Publishing date | 2016-07-01T00:00:00Z |
Publisher | Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) |
Document type | Article ; Online |
Database | BASE - Bielefeld Academic Search Engine (life sciences selection) |
Full text online
More links
Kategorien
Inter-library loan at ZB MED
Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.