Currently reading articles under label: data-mining-machine-learning

NLP Framework Comparison


SpaCy for take-and-use in production. NLTK to try new things.

ref en

en backup

in cn

cn backup

Machine Learning ML Books


MLAPP (Kevin Murphy), Machine Learning - A Probablistic Perspective, is more comprehensive, insightful and interesting, and contains more "real" examples/problems. However, the presents are kinda out of order, which can be difficult to follow for a first book.

Recommender System

This is a detailed reproduction of ref.

Sunny Summary

3 steps: preprocessing to extract: author, average sentence length, average word length, punctuation profile, sentiment scores, part-of-speech profiles/tags (only in code, not taken into the csv). content-wise k-mean......

Caffe Installation, Hello World

Note: tested with Ubuntu 16.04.1 using /root, for newer Ubuntu version (>= 17.04), check here.

Installation & Self-Tests

Use the installation script here.

//(Sunny only added conditional USE_CUDNN=1, the rest is the same as: ref. You may wanna set USE_CUDNN to 0, if no GPU is used).


Hadoop HDFS

(Update: tested v2.7.2 on Ubuntu 18)


OBS: security warning !

Note: change the core-site.xml and hdfs-site.xml content before running.

Note: change the HDUSER username before running.

# w/ java

curl | bash


Install R in Ubuntu

UBUNTU 16.04

Option 1:

sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev libssl-dev build-essential

Option 2 (to get newer R version):

with PPA. ref digitalocean

sudo apt install apt-transport-https && \

sudo apt-key adv --keyserver --recv-keys E298A3A825......