Machine Learning

Time Series Dataset to Tabular Dataset for Machine Learning Deep Learning 时间序列表格化,用于机器学习/深度学习

2026-03-03. Category & Tags: Machine Learning, Tabular Dataset, Time Series

引言 # 阅读本文的前提:已知机器学习相关基本概念,详见 /ml-do-it )。 由于表格形式的数据集(tabular dataset,例如下图),是大部分机器学习算法所接受的格式,因此各种数据的表格化是非常重要的思维范式。 ref 表格化的方法关键词:时滞 OR 前导,搜索关键词 lag OR lead,对应 python 函数 shift。 而时间序列数据(time-series data 简称 时序数据),如下图所示,一般包括时间戳和当时的数据数值。大部分数据集内相邻的采样点是时间间隔相等,例如 10 分钟(更低频率采样的数据集可能是 1 小时 、1 天等等)为一个采样时间单位。 时间戳或时序序号 对应数据 2026-01-01 00:00 601 2026-01-01 00:10 602 2026-01-01 00:20 603 2026-01-01 00:30 604 2026-01-01 00:40 605 2026-01-01 00:50 606 2026-01-01 01:00 607 2026-01-01 01:10 608 2026-01-01 01:20 609 2026-01-01 01:30 610 2026-01-01 01:40 611 2026-01-01 01:50 612 2026-01-01 02:00 613 2026-01-01 02:10 614 2026-01-01 02:20 615 2026-01-01 02:30 616 2026-01-01 02:40 617 2026-01-01 02:50 618 2026-01-01 03:00 619 对于这个时序数据,现在的任务是根据已有的数据,预测未来一个时间单位后的时间点的数值。现在是 03:00,要预测 03:10 的数值。当然,之后时间前进了,例如到 3:30,要预测的时间点也前进到 3:40。那就要建立一个各时间点通用的模型(不只是用于预测 03:10),即映射关系,从“某个时刻的状态 (state) x“ 到 ”未来 1 个时间单位后的时间点的数值 y“ 的映射关系。 ...

English-Chinese-Terminology-常用词汇专用名词英汉对照翻译

2021-12-15. Category & Tags: Writing, English, Chinese, Translation, Engineering, Artificial Intelligence, Machine Learning, Terminology

常见翻译(含识别错误的翻译软件格式): Claude=Cloud Claude=clawed Superpower=Superpower Superpowers=Superpowers 插件=plugin supabase=superbase supabase=super base 对话=session 前斜线=forward slash 斜线=slash 最大 token 长度=max token 终端 Terminal=Terminal 智能体 Agent=agent 更新=update 自动补全=auto complete VS Code=vsco dor 临时处理 ad-hoc=ad-hoc 临时处理 ad-hoc=ad hoc 训练=train 训练数据=training data 训练集=training set 训练集=training dataset 训练集=training set 验证=evaluate 验证数据=evaluation data 验证集=evaluation dataset 验证集=evaluation set 测试=test 测试数据=testing data 测试集=testing dataset 测试集=testing set NPM run dev=npm run dev NPM run build=npm run build English Chinese ad-hoc, ad hoc (fix/method) 特设的(修改/方法) training (dataset) 训练(集) n. ...

AI Cloud

2019-11-01. Category & Tags: Cloud Platform, Machine Learning, ML, Deep Learning, DL

“if you plan to use deep learning extensively (>150 hrs/mo), building your own deep learning workstation might be the right move.” [medium] Baidu AI Studio (only for PaddlePaddle) Paperspace (cooperating with fast.ai) Google Colab (cooperating with fast.ai) vast.ai (C2C/P2P sharing, very cheap, a lot of time to init/load/unload) Kaggle (max 6h, good GPU but complex steps to use) MS Azure Amazon FloydHub (special CLI interface) ref: CN intro: Paperspace vs. Colab, 2019 Best Deals in Deep Learning Cloud Providers, 2018 比较云GPU平台

ML Books Machine Learning (non-DL)

2018-07-07. Category & Tags: Machine Learning, Artificial Intelligence, Book

See also: (hands on) Basics - Machine Learning - Just-do-it, inc. books with codes Math Books Theory, Papers of Deep Learning DL > DEEP LEARNING BOOKS ( & CODES ) Favoured # DSML (Kroese, Botev et.al. - Chapman Press 2019-11) 《Data Science and Machine Learning: Mathematical and Statistical Methods》. With public datasets, code and pdf online. ISBN 9781138492530. 官方英文PDF 中文版:(澳) 迪尔克·P. 克洛泽 等 著,Dirk P. Kroese,于俊伟, 刘楠 (译),《数据科学与机器学习: 数学与统计方法》,机械工业出版社, ISBN 9787111711391, 2023 NNDL, Neural Networks and Deep Learning. ...

Data Mining

2018-04-26. Category & Tags: Data Mining, DM, Machine Learning, ML

Read also: A 20-Year Community Roadmap for Artificial Intelligence Research in the US - 109 pages, AAAI. (CN intro: 美国人工智能研究的 20 年社区路线图(讨论稿)) Best Paper Awards in Computer Science (since 1996) (cn: DataWhale) AI Benchmark: web, cn intro Cheatsheets AI 最全干货超级大列表,100+ 张速查表全了! (github) 热心网友推荐真正有价值的机器学习课程 Papers with code/data: AI 研习社 paper.yanxishe.com PapersWithCode.com (highly automated), esp. SotaBench.com (CN intro) g/zziz/pwc DatasetList.com by Nikola for CV, NLP, self-driving. DEFs Relation Overview # Most terminologies have been defined well, except “data mining” as the biggest concept. ...

Machine Learning (ML) (the main item, all in one index)

2018-04-17. Category & Tags: Machine Learning

See also: /ml-do-it: Machine Learning - Just-do-it (hands on) Basics /ml-books Learning Machine Learning, ML Books (may w/ Codes), after some hands-on experiences /ml-understand: ML Understandability / Interpratation / Comprehensibility & Causality dl-theory /dl-do-it: Deep Learning Hands On /Math-Books: Theory, Papers of Deep Learning DL

ML Interpratation, Comprehensibility & Causality

2018-04-17. Category & Tags: Machine Learning, Interpretability, Comprehensibility, Understandability, Explain

See also: /ml-: Machine Learning (ML) (the main item, all in one index) DL-theory > ## cnn visualization/comprehensibility 可解释的机器学习 (What) 可解释性的重要性 (Why) 具体如何解释 (How) Insights which can be extracted from the models Permutation Importance Partial Dependency Plots SHAP Values SHAP Values in Advance LIME (万金油), Tree interpreter, etc. 凭什么相信你,我的CNN模型?关于CNN模型可解释性的思考 inc.:CAM, Grad-CAM, Lime. Book by Christoph Molnar: Interpretable Machine Learning – A Guide for Making Black Box Models Explainable (GitHub), (CN) ...

ML - Just-do-it (hands on) Basics, of Machine Learning

2017-03-13. Category & Tags: Machine Learning

See also: ML books. This article collects some useful materials for non-theory learners like engineers. 快速开始 QUICK START # ref 对于一般意义的建模,如上图,从给定的蓝色数据点采样数据集,通过一些计算步骤(例如最小二乘算法的各个步骤),可以拟合出一个从 自变量 x 到 因变量 y 的映射关系/函数,例如 $y=0.5 x^2 + 15000$。在机器学习中,这个具体的能够直接从输入的数据 (自变量) x=(x1, x2 … xn) 计算出预测值 y 的映射关系/函数,称为模型,而从给定的采样数据集,计算出这个模型的步骤,称为算法。(输入、输出都可能是多个变量,例如 y=(y1, y2…yn) ) 大部分机器学习算法接受的数据是表格式数据集(Tabular Dataset),例如房价数据表格: ref 模型可以用一个事物(此处为房子)的多个容易直接获得的 " 特征 (features)",例如: 面积、卧室数量、浴室数量、地段分、房龄 等等,来预测不容易直接获得的一个或多个属性,例如: 房价,被预测的属性,称为 " 标签 label 或 目标 target “。特征可以理解为自变量 x,标签/目标可以理解为因变量 y。 如果标签是连续数值(例如房屋价格),通过此数据集计算出模型的算法即为回归算法,如果是类别(例如鸢尾花的种类),算法即为分类算法。(ps: 标签是带有顺序的分类的有序分类算法、不用标签无监督学习算法等等,暂时不用在意。) 经典机器学习(Classic Machine Learning) # Hello-world: 泰坦尼克幸存预测(分类) 简单算法,搜索:泰坦尼克 机器学习 sklearn sklearn 即 sci-kit learn 包,内有有很多常用算法,例如(选一个复现即可) Github code , Gitee bak AaronJny/simple_titanic: scikit-learn在kaggle Titanic数据集上的简单实践。 还可以用 graphviz 画图1 画图2 进阶算法,搜索:xgb 房价预测(回归) XGB,即 xgboost,通用(分类回归通吃、允许缺失值)、高性能,因此经久不衰。例如: Regression by XGBoost 基于XGBoost的回归预测实践 , bak XGBoost算法回归任务:房价预测 - Heywhale. ...

DMML Tools Trend & Relationship 2016

2016-06-16. Category & Tags: DMML, Tools, Data Mining, Machine Learning, Artificial Intelligence

This is a summary of KD-nuggets blogs: here and here. Pictures are modified for my own notes. Tools Associations # sunny’s conclusion # Possible framework 1: Hadoop + Spark + Python + scikit. Possible framework 2: SQL+ Excel + Tableau. Try NOT use: RapidMiner, KNIME (whatever situation). Big Data Related Tools # Deep Learning Related Tools # Big_Data- / Deep_Learning-Related Tools # Conclusion 1: Big_Data & DL are positively related. ...