Install Single-Node for Test

nix

# server
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube

minikube start

# client, via snap (snap will NOT work in Docker-like containers)
sudo snap install kubectl --classic

# or, client via apt
sudo apt-get update && sudo apt-get install -y gnupg2 apt-transport-https && \
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - && \
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list && \
sudo apt-get update && \
sudo apt-get install -y kubectl

[ref]

win (dated, plz use WSL2)

Note that the code below is old (2019 spring), we can use WSL2 w/ docker-desktop instead.

Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')) # install choco

choco install minikube -y  # "server", which can be replaced by k8s in Docker-Desktop.
choco install kubernetes-cli  -y  # "client" / "controller".

minikube start --vm-driver hyperv --hyperv-virtual-switch "Default Switch"  # needs admin. it will use virtualbox by default, we want to use hyperv.

// Can also create a new V-switch ref.

Note: Docker-Desktop is easier to use (just by one click), but then we do not have minikube command.

config

There is a config file on the kube master, in home root dir as $HOME/config . Copy and put into local machine (client) as $HOME/.kube/config.

basic tests

Tip: most docker ... cmds can be changed to kubectl ... [ref].

kubectl config get-contexts
kubectl get po
kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.10 --port=8080
kubectl get po
kubectl expose deployment hello-minikube --type=NodePort

# obs: kubectl run --> kubectl delete deployment, instead of delete pod.
kubectl delete deployment DEPLOYMENT [-n NAMESPACE]

Handy Commands

port-forwarding to pods

port-forwarding

kubectl port-forward <pod-name> <local-port>:<pod-port>

get shell/bash access to pods

shell/bash

kubectl exec -it <pod-name> -- /bin/bash

autocompletion

Do either of:

# by normal users:
printf 'if [ -x "$(command -v kubectl)" ]; then\n  source <(kubectl completion bash)\nfi' >>~/.bashrc
# or by sudo:
kubectl completion bash |sudo tee /etc/bash_completion.d/kubectl

(assuming already installed "bash-completion" apt-get install bash-completion)
[kubernetes.io]

Storage

JimmySong.io

configmap

automated dynamic mounting
JimmySong.io - kubernetes-handbook
matthewpalmer.net - ultimate-configmap-guide

kubectl edit configmap
...
kubectl scale deployment/<my_deployment> --replicas=0 && \
kubectl scale deployment/<my_deployment> --replicas=1

persistent volumes (pv) / mounted storage

Here uses NFS as an example, more examples can be found in Redhat docs.
First, create PV & PVC by kubectl create -f pv_pvc.yaml.
See an yaml example in a kube user issue which combines PV (PersistentVolumes) & PVC (PersistentVolumeClaims) [def].
A detailed explanation on Redhat's own system doc (bak).

To mount automatically when pods start, a kind: PodPreset [def] can be used, see example in a kube user issue and kube doc example (though it is a pod yaml).
A prerequirement is that the pod is started with a label, e.g. preset-working: enabled, so that spec: selector: matchLabels will work.
Mount points are specified in volumeMounts.
In spec: volumes:, the name should match the name in self.volumeMounts and the persistentVolumeClaim: claimName: should match the name in kind: PersistentVolumeClaim -> metadata: name: (PVC's name).

Move / Transfer Data

  • kubectl cp
    First, use kubectl get po to see the NAME of the target container, e.g. jupyter-xxxx if created by a JupyterHub.
    Then kubectl cp /relative/or/abs/path/to/local/file.ext jupyter-xxxx:/abs/path/to/remote/pod/dir/, which has the same format as scp.
    Tip: if kubectl cp /dir1/ remote:/dir2/, dir1's content will be transferred, but MAYBE NOT dir1 itself.

  • FTP server/daemon in pods, e.g. from stilliard.
    This method may have troubles when the k8s cluster is protected by a firewall.

  • FTP client in pods.

  • scp, similar to kubectl cp above.
    Usually runs in pods (as a client), due to possible firewalls.

  • rsync, suggested especially given firewalls, due to possible cuts from firewalls.

Web UI

Run anywhere which has kubectl available:

kubectl proxy

Open url:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
Note 1: ssh tunnel may be needed.
Note 2: changing the running machine of "kubectl proxy" may require re-auth by the config file (or token).

+Jupyter

JupyterHub on K8S

Main Terms/Functions

Pod, deployment, services, ingress.

A deployment is kind of an enhanced pod.
A deployment is then can be used as a service by creating a cluster service.
Later, ingress is used to route related HTTP requests to the specific service.

Real Examples

keras + flask + k8s

cnvrg.io ML Deploy models with Kubernetes (bak)

Spark on K8s

See more on Memory Leaks on BrainLounge.

container images

(for spark 2.2, see FAQ below.)
First, build or find a suitable container image (run the cmd in spark distribution folder top level.)

Tip: binding (Py/R) versions are good to tag, as the workers should use the same version with the driver.

Scala/Java

./bin/docker-image-tool.sh -t spark-scala build

Py:

./bin/docker-image-tool.sh -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile -t spark-py368 build

R:

./bin/docker-image-tool.sh -p ./kubernetes/dockerfiles/spark/bindings/R/Dockerfile -t spark-r351 build

I have problems to build images using docker-image-tool.sh, but used the spark Dockerfile to build some images here:

# py
sunnybingome/spark8s:pyspark240py368

# r
sunnybingome/spark8s:official-spark240-r351

# jar
# jar can run using either the py or r image above.

images pull auth:

To let k8s be able to pull images from a protected registry (docker images hub), we need to create a "kube secret" which shields sensitive info such as tokens or passwords.
It can contain the username and password (that are created by the registry [ref]) to access the registry.
For Gitlab, we can create the pair of username and password from "Deploy Tokens" settings, then create kube secret for docker-registry by this command:

kubectl create secret docker-registry <secret_name> --docker-server=<registry-server> --docker-username=<username> --docker-password=<pword> [--docker-email=optional-your-email]

Later, the "secret_name" can be used by k8s to pull images. E.g:

spark-submit ...\
--conf spark.kubernetes.container.image.pullSecrets=<secret_name> \

or:

kubectl run ...
--overrides='{ "op": "add", "spec": { "imagePullSecrets": [{"name": "my-secret-name"}] } }' # seems not working???

Tip: the available scope of a secret is the specific namespace.

k8s master api address

kubectl cluster-info | grep 'Kubernetes master'
# out e.g.: Kubernetes master is running at https://192.1.1.1:443

Security

rbac (rule-based access control)

Check permisisons (using Spark's doc as an example):

kubectl auth can-i list   pods
kubectl auth can-i create pods
kubectl auth can-i edit   pods
kubectl auth can-i delete pods

see also

Kong on Kubernetes (authenticattion & authorization)

(To check & harden (make it secure, fix security holes) k8s more, see ref 1 and ref 2. For more stuff, search "Kubernetes security tools")

Multi-Cluster Switching

contexts

kubectl config get-contexts
kubectl config use-context <NAME_of_context>

namespaces

kubectl config set-context --current --namespace=<my_namespace>

FAQ of Spark on K8s:

Problem: Error: forbidden: User "system:serviceaccount:serv-account-1:default" cannot get resource "pods" in API group "" in the namespace "ns-1".
Reason: RBAC does not allow driver containers to creat workers yet (cluster mode).
Solution:

# config RBAC
kubectl create serviceaccount <service-account-name> # service-account itself (e.g. myspark, spark1) is across spacenames.
kubectl create clusterrolebinding <role-name> --clusterrole=edit --serviceaccount=<namespace>:<service-account-name> --namespace=<namespace>

# then run spark-submit cluster mode with the RBAC config:
--conf spark.kubernetes.authenticate.driver.serviceAccountName=<service-account-name>

Ref: official doc

Problem: Exception: Python in worker has different version X.x than that in driver X.x, PySpark cannot run with different minor versions
Reason: Spark checks the major version number and the first minor version number to match the driver with its workers. When using spark-on-k8s, the mis-match can only happen when using client-mode (see also spark.
Solution: Use cluster mode, or use pipenv --python=3.x; in driver to ensure the same py version on the driver and the workers.

Problem: Spark version 2.2 is old and it is hard to find container images.
Solution: See Spark 2.2 doc (py 2.7.13). (since Spark 2.3, only Dockerfiles are provided).

Problem: having problems to run the official spark integration test with k8s.
Reason: (not sure).
Solution: I use my own simple script. (TODO: github url)

Problem: Spark 2.2 cannot recognize k8s as the master. (ERROR: ... must be yarn ... etc.)
Reason: Spark 2.2 default distribution does not include k8s support.
Solution: download the k8s-supported Spark 2.2 distribuion, or wget directly from https://is.gd/OVewKa. Then the official example can be run directly by copy-paste (w/ correct k8s IP & corrected ns).