Recommendation System

Recommender System

2018-01-20. Category & Tags: Recommender System, Recommendation System, Literature, Book, Text, News

This is a detailed reproduction of ref.

Sunny Summary #

3 steps:

  1. preprocessing to extract: author, average sentence length, average word length, punctuation profile, sentiment scores, part-of-speech profiles/tags (only in code, not taken into the csv).
  2. content-wise k-means clustering (on TFIDF scores) to get: 3 levels/degrees of clustering/classification results.
  3. knn search on the results of step 1 and 2 to get: recommendations (k=15 by default).

Preprocessing #

pip2 install nltk
pip2 install twython  # optional ? got warning, not error.
git clone
cd gutenberg/data

Download the books txt data (e.g. the 404M 3k data on Google Drive) and unzip. Then set this folder as the txt_path in WARN: do rename 's/:/-/g' *; rename 's/,/\./g' *; rename "s/\"/'/g" *; in the txt files folder. The : char in the filenames will cause problems for spark TFIDF path system. The , may introduce issues for csv files as well as " occuring together with ' in the same files.
