Data Mining
Read also:
- A 20-Year Community Roadmap for Artificial Intelligence Research in the US - 109 pages, AAAI. (CN intro: 美国人工智能研究的 20 年社区路线图(讨论稿))
- Best Paper Awards in Computer Science (since 1996) (cn: DataWhale)
- AI Benchmark: web, cn intro
- Cheatsheets AI 最全干货超级大列表,100+ 张速查表全了! (github)
- 热心网友推荐真正有价值的机器学习课程
- Papers with code/data:
- AI 研习社 paper.yanxishe.com
- PapersWithCode.com (highly automated), esp. SotaBench.com (CN intro)
- g/zziz/pwc
- DatasetList.com by Nikola for CV, NLP, self-driving.
DEFs Relation Overview #
Most terminologies have been defined well, except “data mining” as the biggest concept.
All those terms have both a broad definition (which contains its newly deveoped sub-areas) and a narrow definition (which contains only its tradtional area).
Data Mining (DM) #
Data mining is the process of analyzing data to discover patterns and/or take reactions, provide solutions etc.
Data -> Patterns/Reactions/Solutions etc.
[pic]
Artificial Intelligence (AI) #
AI (aka Machine Intelligence (MI)) [DEF wider]: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. [[[Wikipedia_ai]].
AI [DEF narrowed]: a set of traditional AI methods, such as multi-agent systems, robotics, automated planning etc.
“AI” is now becoming a “catch all” term that captures the long term ambition of the field to build machines that emulate and then exceed the full range of human cognition [[stateof.ai-2018]].
Machine Learning (ML) #
Traditional math model: $$y = ax^2 + bx+ c$$
Machine learning model: $$y = ?$$
Commonly used three aspects to categorize machine learning algorithms are: label status, model training dynamics, model training locations.
- supervised learning
- unsupervised learning
- semi-supervised (classification) learning
Typically, there is little labelled data [ref], usually used for classification. - active learning (algorithms actively query an oracle)
- reinforcement learning (see more on /rl)
Reinforcement learning tries to balance between exploration (of unknown knowledge) and exploitation.
[ref]
- online/offline learning
Online algorithms update the model dynamically while doing inference/evaluation.
Offline model will not get updated while processing the input. - centralized / edged / federated learning
These terms are currently used inconsistently, their definitions should be made clear before talking about them.
Centralized learning: both training and inference/evaluation are done on center servers or clusters.
Edge computing: (usually) means edge devices, such as in IoT, have inference/evaluation capabilities, but the training part is still done centralized.
Federated learning: (usually) mean part of the training will also be done on edge devices, which leads to partial model updates that are either sent to centralized servers to update the global model, or sent among the devices to be shared to update their models.
[pic]
[pic-ref]
Deep Learning (DL) #
Deep learning is using deep (artificial) neural network (DNN) to perform machine learning tasks.
See also:
Venn Diagram #
Pic is from [ref] and modified by Sunny.
Algorithm Categorizations/Relations #
pic ref
pic ref
pic ref
More #
- Federated Learning, a step closer towards confidential AI
- How To Backdoor Federated Learning
- (This note was originally for the course DL for CV as introduction slides.)