Movie Faceted Search Dataset


This dataset contains faceted metadata describing contemporary American films, along with relevance judgements by actual human users. This dataset can be used for developing personalized faceted searcg interfaces, among other projects requiring rich, structured metadata. This dataset was used for the experiments presented at WWW 08.

This dataset is a mashup of three seperate datasets:

Due to licensing restrictions, the mashup may not be redistributed as a single piece. Instead, you must download and agree to each dataset's license seperately. You must cite as this mashup as:
J. Koren, Y. Zhang, and X. Liu. Personalized Faceted Search. In Proceedings of the 17th International Conference on the World Wide Web (WWW '08). Beijing, China.
Additionally, you may need to cite:
Internet Movie Database Inc. Internet movie database., 2006.
Netflix. Netflix prize., 2006.
GroupLens. Movielens., 2006.
depending on which components you use.


Download each component of the mashup separately, and agree to each dataset's license. These components are available from:

Download the build scripts, and read the HOWTO_CLEAN file for instructions. After following the instructions, you should have several UTF-8 encoded XML files containing the metadata for multiple movies. Finally, create convert the Netflix Prize and/or MovieLens user ratings for use with these XML files. This is done by running the make_ratings scripts.

WARNING: These scripts may be temperamental, and feedback is greatly appreciated.

If you have any questions, email jøñåthàn © sõê · üçsç · èdù