Kubernetes (k8s)
See also 使用国内资源安装 K3s 全攻略
Install Single-Node for Test #
nix #
# server
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start
# client, via snap (snap will NOT work in Docker-like containers)
sudo snap install kubectl --classic
# or, client via apt
sudo apt-get update && sudo apt-get install -y gnupg2 apt-transport-https && \
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - && \
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list && \
sudo apt-get update && \
sudo apt-get install -y kubectl
[ref]
win (dated, plz use WSL2)
#
Note that the code below is old (2019 spring), we can use WSL2 w/ docker-desktop instead.
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')) # install choco
choco install minikube -y # "server", which can be replaced by k8s in Docker-Desktop.
choco install kubernetes-cli -y # "client" / "controller".
minikube start --vm-driver hyperv --hyperv-virtual-switch "Default Switch" # needs admin. it will use virtualbox by default, we want to use hyperv.
// Can also create a new V-switch ref.
Note: Docker-Desktop is easier to use (just by one click), but then we do not have minikube
command.
config #
There is a config
file on the kube master, in home root dir as $HOME/config
. Copy and put into local machine (client) as $HOME/.kube/config
.
basic tests #
Tip: most docker ...
cmds can be changed to kubectl ...
[ref].
kubectl config get-contexts
kubectl get po
kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.10 --port=8080
kubectl get po
kubectl expose deployment hello-minikube --type=NodePort
# obs: kubectl run --> kubectl delete deployment, instead of delete pod.
kubectl delete deployment DEPLOYMENT [-n NAMESPACE]
Handy Commands #
port-forwarding to pods #
kubectl port-forward <pod-name> <local-port>:<pod-port>
get shell/bash access to pods #
kubectl exec -it <pod-name> -- /bin/bash
autocompletion #
Assuming already installed “bash-completion” (apt-get install bash-completion
), we can do either of:
# by normal users:
printf 'if [ -x "$(command -v kubectl)" ]; then\n source <(kubectl completion bash)\nfi' >>~/.bashrc
# or by sudo:
kubectl completion bash |sudo tee /etc/bash_completion.d/kubectl
Storage #
configmap #
automated dynamic mounting
JimmySong.io - kubernetes-handbook
matthewpalmer.net - ultimate-configmap-guide
kubectl edit configmap
...
kubectl scale deployment/<my_deployment> --replicas=0 && \
kubectl scale deployment/<my_deployment> --replicas=1
persistent volumes (pv) / mounted storage #
Here uses NFS as an example, more examples can be found in Redhat docs.
First, create PV & PVC by kubectl create -f pv_pvc.yaml
.
See an yaml example in a kube user issue which combines PV (PersistentVolumes) & PVC (PersistentVolumeClaims) [def].
A detailed explanation on Redhat’s own system doc (bak).
To mount automatically when pods start, a kind: PodPreset
[def] can be used, see example in a kube user issue and kube doc example (though it is a pod yaml).
A prerequirement is that the pod is started with a label, e.g. preset-working: enabled
, so that spec: selector: matchLabels
will work.
Mount points are specified in volumeMounts
.
In spec: volumes:
, the name
should match the name in self.volumeMounts
and the persistentVolumeClaim: claimName:
should match the name in kind: PersistentVolumeClaim -> metadata: name:
(PVC’s name).
Move / Transfer Data #
kubectl cp
First, usekubectl get po
to see theNAME
of the target container, e.g.jupyter-xxxx
if created by a JupyterHub.
Thenkubectl cp /relative/or/abs/path/to/local/file.ext jupyter-xxxx:/abs/path/to/remote/pod/dir/
, which has the same format asscp
.
Tip: ifkubectl cp /dir1/ remote:/dir2/
, dir1’s content will be transferred, but MAYBE NOT dir1 itself.- FTP server/daemon in pods, e.g. from stilliard.
This method may have troubles when the k8s cluster is protected by a firewall. - FTP client in pods.
scp
, similar tokubectl cp
above.
Usually runs in pods (as a client), due to possible firewalls.rsync
, suggested especially given firewalls, due to possible cuts from firewalls.
Web UI #
Run anywhere which has kubectl
available:
kubectl proxy
Open url:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
Note 1: ssh tunnel may be needed.
Note 2: changing the running machine of “kubectl proxy” may require re-auth by the config file (or token).
+Jupyter #
Main Terms/Functions #
Pod, deployment, services, ingress.
A deployment is kind of an enhanced pod.
A deployment is then can be used as a service by creating a cluster service.
Later, ingress is used to route related HTTP requests to the specific service.
Real Examples #
keras + flask + k8s #
cnvrg.io ML Deploy models with Kubernetes (bak)
Spark on K8s #
See more on Memory Leaks on BrainLounge.
container images #
(for spark 2.2, see FAQ below.)
First, build or find a suitable container image (run the cmd in spark distribution folder top level.)
Tip: binding (Py/R) versions are good to tag, as the workers should use the same version with the driver.
Scala/Java
./bin/docker-image-tool.sh -t spark-scala build
Py:
./bin/docker-image-tool.sh -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile -t spark-py368 build
R:
./bin/docker-image-tool.sh -p ./kubernetes/dockerfiles/spark/bindings/R/Dockerfile -t spark-r351 build
I have problems to build images using docker-image-tool.sh, but used the spark Dockerfile to build some images here:
# py
sunnybingome/spark8s:pyspark240py368
# r
sunnybingome/spark8s:official-spark240-r351
# jar
# jar can run using either the py or r image above.
images pull auth: #
To let k8s be able to pull images from a protected registry (docker images hub), we need to create a “kube secret” which shields sensitive info such as tokens or passwords.
It can contain the username and password (that are created by the registry [ref]) to access the registry.
For Gitlab, we can create the pair of username and password from “Deploy Tokens” settings, then create kube secret for docker-registry by this command:
kubectl create secret docker-registry <secret_name> --docker-server=<registry-server> --docker-username=<username> --docker-password=<pword> [--docker-email=optional-your-email]
Later, the “secret_name” can be used by k8s to pull images. E.g:
spark-submit ...\
--conf spark.kubernetes.container.image.pullSecrets=<secret_name> \
or:
kubectl run ...
--overrides='{ "op": "add", "spec": { "imagePullSecrets": [{"name": "my-secret-name"}] } }' # seems not working???
Tip: the available scope of a secret is the specific namespace.
k8s master api address #
kubectl cluster-info | grep 'Kubernetes master'
# out e.g.: Kubernetes master is running at https://192.1.1.1:443
Security #
rbac (rule-based access control) #
Check permisisons (using Spark’s doc as an example):
kubectl auth can-i list pods
kubectl auth can-i create pods
kubectl auth can-i edit pods
kubectl auth can-i delete pods
see also #
Kong on Kubernetes (authenticattion & authorization)
(To check & harden (make it secure, fix security holes) k8s more, see ref 1 and ref 2. For more stuff, search “Kubernetes security tools”)
Multi-Cluster Switching #
contexts #
kubectl config get-contexts
kubectl config use-context <NAME_of_context>
namespaces #
kubectl config set-context --current --namespace=<my_namespace>
Advanced Topics #
Stress testing using Chaos Mesh 压测:更好地识别系统漏洞并提高弹性:混沌网格(Chaos Mesh by Netflix)的设计和工作原理 by K8S 中文社区
FAQ of Spark on K8s: #
Problem: Error: forbidden: User "system:serviceaccount:serv-account-1:default" cannot get resource "pods" in API group "" in the namespace "ns-1".
Reason: RBAC does not allow driver containers to creat workers yet (cluster mode).
Solution:
# config RBAC
kubectl create serviceaccount <service-account-name> # service-account itself (e.g. myspark, spark1) is across spacenames.
kubectl create clusterrolebinding <role-name> --clusterrole=edit --serviceaccount=<namespace>:<service-account-name> --namespace=<namespace>
# then run spark-submit cluster mode with the RBAC config:
--conf spark.kubernetes.authenticate.driver.serviceAccountName=<service-account-name>
Ref: official doc
Problem: Exception: Python in worker has different version X.x than that in driver X.x, PySpark cannot run with different minor versions
Reason: Spark checks the major version number and the first minor version number to match the driver with its workers. When using spark-on-k8s, the mis-match can only happen when using client-mode (see also spark.
Solution: Use cluster mode, or use pipenv --python=3.x;
in driver to ensure the same py version on the driver and the workers.
Problem: Spark version 2.2 is old and it is hard to find container images.
Solution: See Spark 2.2 doc (py 2.7.13). (since Spark 2.3, only Dockerfiles are provided).
Problem: having problems to run the official spark integration test with k8s.
Reason: (not sure).
Solution: I use my own simple script. (TODO: github url)
Problem: Spark 2.2 cannot recognize k8s as the master. (ERROR: … must be yarn … etc.)
Reason: Spark 2.2 default distribution does not include k8s support.
Solution: download the k8s-supported Spark 2.2 distribuion, or wget directly from https://is.gd/OVewKa. Then the official example can be run directly by copy-paste (w/ correct k8s IP & corrected ns).
leases), or wget directly from https://is.gd/OVewKa. Then the official example can be run directly by copy-paste (w/ correct k8s IP & corrected ns).