Collaborative Filtering Resources

来源:互联网 发布:淘宝客恶意刷佣金 编辑:程序博客网 时间:2024/05/21 17:17

Generally, collaborative filtering (CF) is any algorithm that filters information for a user based on a collection of user profiles. Users having similar profiles may share similar interests. For a user, information can be filtered in/out regarding to the behaviors of his or her similar users.

Users profiles can be collected either explicitly or implicitly. One can explicitly ask users to rate what they have used/purchased. Such a profile is filled explicitly by the users ratings. An implicit profile is based on passive observation and contains users historic interaction data.

The most common usage of CF is to make recommendation. That's why collaborative filtering is strongly correlated to recommender system in literature, although CF is only one of the methods for recommender system.

In this page, I collected some useful online materials for collaborative filtering research.


Content

  • Research Software
  • Data Sets
  • CF Bibliography

Research Software

  • CoFE: a java based collaborative filtering engine. http://eecs.oregonstate.edu/iis/CoFE/
  • Suggest Top-N recommendation engine: it implements the item-based and user-based collaborative filtering algorithms. Only lib files, no source codes included. http://www-users.cs.umn.edu/~karypis/suggest/
  • C/Matlab Toolkit: a Matlab implementation of some collaborative filtering algorithms, including memory-user-based, personality diagnosis method (see Pennock et FL., 2000) etc. http://www-2.cs.cmu.edu/~lebanon/IR-lab.htm
  • Matlab code for Canny's factor analysis based collaborative filtering. www.cs.berkeley.edu/~jfc/'mender/.
  • Taste is a collaborative filtering engine for Java. http://taste.sourceforge.net/

Data Sets

Explicit Rating Data Sets:

  • Movielens Movie Rating Data Set. http://www.grouplens.org/
  • Jester Joke Rating Data Set. http://www.ieor.berkeley.edu/~goldberg/jester-data/
  • Book-Crossing Book Rating Data Set. http://www.informatik.uni-freiburg.de/~cziegler/BX/
  • Parliament Voting. http://ucdata.berkeley.edu:7101/new_web/VoteWorld/voteworld/datasets.html
  • Online Dating Data Set. http://www.ksi.ms.mff.cuni.cz/~petricek/data/ It contains user ratings from an online dating we site: libimseti.cz. Courtesy of Vaclav Petricek.

Implicit Rating Data Sets:

  • Audioscrobblers Music Play-list Data-sets.The Audioscrobbler dataset collects the play-lists of the users in a one-line community (http://www.audioscrobbler.com/) by using a plug-in in the users' media players such as Winamp, iTunes, XMMS etc. The plug-ins send the title and artist of every song users play to the Audioscrobbler server, which updates the user's musical profile with the new songs. In the database, the user's profile is recorded as a form of co-occurrence pair like {userID,itemID} pair. The pair means a user {userID} has played a/ song {itemID}. The dataset can be obtained at http://www.audioscrobbler.com/data/
  • AOL Web search query: http://www.gregsadetsky.com/aol-data/

Collaborative Filtering Bibliography

1. Pure Collaborative Filtering
Memory-based
  • Unifying User-based and Item-based Collaborative Filtering Approaches by Similarity Fusion (2006). Appear in SIGIR 2006. http://ict.ewi.tudelft.nl/pub/jun/sigir06_similarityfuson.pdf
  • Scalable collaborative filtering using cluster-based smoothing (2005). http://doi.acm.org/10.1145/1076034.1076056
  • An automatic weighting scheme for collaborative filtering (2004). http://doi.acm.org/10.1145/1008992.1009051
  • Item-based Collaborative Filtering Recommendation Algorithms (2001). http://www10.org/cdrom/papers/519/
  • Evaluation of Item-Based Top-N Recommendation Algorithms (2001). http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/itemrs.pdf
  • A regression-based approach for scaling-up personalized recommender systems in e-commerce (2000). http://nas.cl.uh.edu/boetticher/ML_DataMining/vucetic.pdf
  • Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach (1999). http://research.microsoft.com/~horvitz/cfpd.htm
  • An algorithmic framework for performing collaborative filtering (1999).
  • Empirical Analysis of Predictive Algorithms for Collaborative Filtering (1998). http://research.microsoft.com/research/pubs/view.aspx?tr_id=166
  • Grouplens: Applying Collaborative Filtering to Usenet News (1997). http://www.ics.uci.edu/~pratt/courses/papers/p77-konstan.pdf
  • Social Information Filtering: Algorithms for Automating "Word of Mouth" (1995). http://citeseer.ist.psu.edu/195430.html
  • Grouplens: an open architecture for collaborative filtering of netnews (1994). http://doi.acm.org/10.1145/192844.192905
  • Using collaborative filtering to weave an information tapestry (1992). http://citeseer.ist.psu.edu/context/1727112/0
Relevance Models
  • A User-Item Relevance Model for Log-based Collaborative Filtering (2006). http://ict.ewi.tudelft.nl/pub/jun/ecir06.pdf
  • Relevance Feedback Models for Recommendation (2006). http://acl.ldc.upenn.edu/W/W06/W06-1653.pdf
Latent Class Models
  • A study of Mixture Models for Collaborative Filtering (2006). http://www.cs.cmu.edu/~lsi/Paper_JIR_Si.pdf
  • Two-way latent grouping model for user preference prediction (2005). http://eprints.pascal-network.org/archive/00001005/01/uai05.pdf
  • The Multiple Multiplicative Factor Model For Collaborative Filtering (2004). http://www.machinelearning.org/proceedings/icml2004/papers/363.pdf
  • Collaborative filtering: a machine learning perspective (2004). http://citeseer.ist.psu.edu/marlin04collaborative.html
  • Flexible mixture model for collaborative filtering (2003). http://www.hpl.hp.com/conferences/icml2003/papers/183.pdf
  • Latent class models for collaborative filtering (1999). http://portal.acm.org/citation.cfm?id=687583
Matrix Factorization
  • Fast Maximum Margin Matrix Factorization for Collaborative Prediction (2005). http://people.csail.mit.edu/jrennie/papers/icml05-mmmf.pdf
  • Eigentaste: A constant time collaborative filtering algorithm (2001). (Using PCA) http://www.ieor.berkeley.edu/~goldberg/pubs/eigentaste.pdf
  • Application of Dimensionality Reduction in Recommender System -- A Case Study (2000). http://citeseer.ist.psu.edu/sarwar00application.html
  • Collaborative filtering with privacy via factor analysis (1999). (Using factor analysis) http://www.cs.berkeley.edu/~jfc/papers/02/SIGIR02.pdf
  • Learning collaborative information filters (1998). (using SVD) http://www.ics.uci.edu/~pazzani/Publications/MLC98.pdf
Clustering
  • A maximum entropy approach to collaborative filtering in dynamic, sparse, high dimensional domains (2002). http://research.yahoo.com/publication/OR-2003-007.pdf
  • Clustering Methods for Collaborative Filtering (1998). http://citeseer.ist.psu.edu/ungar98clustering.html
  • A Formal Statistical Approach to Collaborative Filtering (1998). http://citeseer.ist.psu.edu/387035.html
  • A Scalable Collaborative Filtering Framework based on Co-clustering (2005). http://hercules.ece.utexas.edu/~srujana/papers/icdm05.pdf
  • Model-based Overlapping Co-Clustering. http://www.siam.org/meetings/sdm06/workproceed/Text%20Mining/shafiei16.pdf
Transitive Associations
  • Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering (2004). http://doi.acm.org/10.1145/963770.963775
Trust Inference
  • Improving Collaborative Filtering with Trust-based Metrics (2006). http://doi.acm.org/10.1145/1141277.1141717
  • Alleviating the Sparsity Problem of Collaborative Filtering Using Trust Inferences. http://www.ics.forth.gr/isl/publications/paperlink/LNCS_Formatted_iTrust_34770228.pdf
Perception-based
  • Online ranking/collaborative filtering using the perception algorithm (2003).
2. Combining Content-based and Collaborative Filtering
  • A Unified Recommendation Framework Based on Probabilistic Relational Models (2005). http://www.stern.nyu.edu/ciio/WorkOnline/IS20042005/0217-01.pdf
  • Unifying Collaborative and Content-Based Filtering (2004). http://www.cs.brown.edu/people/th/publications.html
  • Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes (2003). http://www.dbs.informatik.uni-muenchen.de/~yu_k
  • Content-Boosted Collaborative Filtering (2001). http://citeseer.ist.psu.edu/507656.html
3. Distributed Collaborative Filtering
  • Personalization of a peer-to-peer television system (2006). http://ict.ewi.tudelft.nl/pub/jun/euroitv06.pdf
  • Distributed Collaborative Filtering for Peer-to-Peer File Sharing Systems (2006). http://ict.ewi.tudelft.nl/pub/jun/sac06.pdf
  • Pocketlens: Toward a Personal Recommender System (2004). http://doi.acm.org/10.1145/1010614.1010618
4. Other issues
  • Being Accurate is Not Enough: How Accuracy Metrics have hurt Recommender Systems (2006). http://www.grouplens.org/papers/pdf/mcnee-chi06-acc.pdf
  • A collaborative filtering algorithm and evaluation metric that accurately model the user experience (2004). http://doi.acm.org/10.1145/1008992.1009050
  • Evaluating collaborative filtering recommender systems (2004). http://doi.acm.org/10.1145/963770.963772

Related Information Retrieval Papers

In general, collaborative filtering is formulated as a self-contained problem, apart from classic approaches for text retrieval, e.g. RSJ models and language models. However, the collaborative filtering problem can be treated as a prediction problem - a prediction of the relevance between user and item (see user-item relevance models). Under this veiw, the instant benefits are gained from the current advances in these text retrieval models. We found the following papers are pretty interesting and are related to the collaborative filtering problem.

  • Query Chains: Learning to Rank from Implicit Feedback (2005). http://www.cs.cornell.edu/%7Efilip/papers/Radlinski05QueryChains.pdf
  • On Event Spaces and Probabilistic Models in Information Retrieval (2005).
  • Probabilistic relevance models based on document and query generation (2003).
  • Novelty and redundancy detection in adaptive filtering (2002). http://doi.acm.org/10.1145/564376.564393
  • Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term (2002).
  • Exact Maximum Likelihood Estimation for Word Mixtures (2002). http://www-2.cs.cmu.edu/~yiz/research/paper/icml2002.ps
  • A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval (2001).
  • Document language models, query models, and risk minimization for information retrieval (2001).
  • Information Retrieval as Statistical Translation (1999). http://www.informedia.cs.cmu.edu/documents/irast-final.pdf
  • Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval (1998).
  • Relevance weighting of search terms (1976).

Related Machine Learning Papers

  • On Combining Classifiers (1998). http://ieeexplore.ieee.org/iel4/34/14695/00667881.pdf
  • On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions (1976).
  • Spectral clustering for multi-type relational data (2006). http://portal.acm.org/citation.cfm?id=1143918
  • Hierarchical Bayesian Models for Applications in Information Retrieval (2003). http://www.cs.berkeley.edu/~jordan/papers/jordan-valencia.pdf
  • A Hierarchical Latent Variable Model for Data Visualization (1998). http://citeseer.ist.psu.edu/bishop98hierarchical.html
  • Combining Labeled and Unlabeled Data with Co-Training (1998). http://citeseer.ist.psu.edu/47625.html
  • Enhancing Supervised Learning with Unlabeled Data (2000). http://citeseer.ist.psu.edu/goldman00enhancing.html
原创粉丝点击