Data Mining

Data Mining

2018-04-26. Category & Tags: Data Mining, DM, Machine Learning, ML

DEFs Relation Overview #

Most terminologies have been defined well, except “data mining” as the biggest concept.

All those terms have both a broad definition (which contains its newly deveoped sub-areas) and a narrow definition (which contains only its tradtional area).

Data Mining (DM) #

Data mining is the process of analyzing data to discover patterns and/or take reactions, provide solutions etc.

Data -> Patterns/Reactions/Solutions etc.

[pic]

Artificial Intelligence (AI) #

AI (aka Machine Intelligence (MI)) [DEF wider]: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. [[[Wikipedia_ai]].

AI [DEF narrowed]: a set of traditional AI methods, such as multi-agent systems, robotics, automated planning etc.

“AI” is now becoming a “catch all” term that captures the long term ambition of the field to build machines that emulate and then exceed the full range of human cognition [[stateof.ai-2018]].

Machine Learning (ML) #

Traditional math model: $$y = ax^2 + bx+ c$$

Machine learning model: $$y = ?$$

Commonly used three aspects to categorize machine learning algorithms are: label status, model training dynamics, model training locations.

supervised learning
unsupervised learning
semi-supervised (classification) learning
Typically, there is little labelled data [ref], usually used for classification.
active learning (algorithms actively query an oracle)
reinforcement learning (see more on /rl)
Reinforcement learning tries to balance between exploration (of unknown knowledge) and exploitation.

[ref]

online/offline learning
Online algorithms update the model dynamically while doing inference/evaluation.
Offline model will not get updated while processing the input.
centralized / edged / federated learning
These terms are currently used inconsistently, their definitions should be made clear before talking about them.

Centralized learning: both training and inference/evaluation are done on center servers or clusters.
Edge computing: (usually) means edge devices, such as in IoT, have inference/evaluation capabilities, but the training part is still done centralized.
Federated learning: (usually) mean part of the training will also be done on edge devices, which leads to partial model updates that are either sent to centralized servers to update the global model, or sent among the devices to be shared to update their models.

[pic]