Dataset: Social-ODP-2k9

Social-ODP-2k9 is a dataset created during December 2008 and January 2009 with data retrieved from the social bookmarking sites Delicious and StumbleUpon, the Open Directory Project and the Web. It is available for research purposes.


This dataset is made up by 12,616 unique URLs, all of them with their corresponding social annotations:

Moreover, the category for each URL, extracted from the Open Directory Project, is also available.

If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.

Metadata Format

All the metadata for the dataset documents is provided in XML format, following this pattern:

    <hash>MD5 hash for document's URL</hash>
    <url>Document's URL</url>
    <category>ODP Category</category>
    <usercount>Number of users annotating it</usercount>
        <name>Tag name</name>
        <count># of users who annotated the tag</count>
      <review>A review from StumbleUpon</review>
      <note>A note from Delicious</note>
        <tag>Tags assigned by a user</tag>

Legal Information

By downloading and using this dataset you acknowledge that:


Please, consider citing the following paper if you make use of this dataset for your research work:

Arkaitz Zubiaga, Raquel Martínez, and Víctor Fresno. Getting the Most Out of Social Annotations for Web Page Classification. Proceedings of DocEng 2009, the 9th ACM Symposium on Document Engineering, pp. 74-83, Munich, Germany. 2009.

  title={Getting the Most Out of Social Annotations for Web Page Classification},
  author={Zubiaga, Arkaitz and Mart{\'\i}nez, Raquel and Fresno, V{\'\i}ctor},
  booktitle={Proceedings of the 9th ACM symposium on Document engineering},