Getting the Most Out of Social Annotations for Web Page Classification

DocEng. 2009.

User-generated annotations on social bookmarking sites can provide interesting and promising metadata for web document management tasks like web page classification. These user-generated annotations include diverse types of information, such as tags and comments. Nonetheless, each kind of annotation has a different nature and popularity level. In this work, we analyze and evaluate the usefulness of each of these social annotations to classify web pages over a taxonomy like that proposed by the Open Directory Project. We compare them separately to the content-based classification, and also combine the different types of data to augment performance. Our experiments show encouraging results with the use of social annotations for this purpose, and we found that combining these metadata with web page content improves even more the classifier's performance.

Download PDF file