Replacing the self-signed Certificate on NSX-T

March 8, 2018, 2:13 pm

≫ Next: PKS and NSX-T: I did everything wrong

≪ Previous: BOSH Stemcell 3541.2 breaks Concourse 3.9.0

Ran into a difficulty trying to use the self-signed certificate that comes pre-configured on the manager for NSX-T. In my case, Pivotal Operations Manager refused to accept the self-signed certificate.

So, for NSX-T 2.1, it looks like the procedure is:

1. Log on to the NSX Manager and navigate to System|Trust
2. Click CSRs tab and then “Generate CSR”, populate the certificate request details and click Save
3. Select the new CSR and click Actions|Download CSR PEM to save the exported CSR in PEM format
4. Submit the CSR to your CA to get it signed and save the new certificate. Be sure to save the root CA and any subordinate CA certificates too<. In this example, certnew.cer is the signed NSX Manager certificate, sub-CA.cer is the subordinate CA certificate and root-CA.cer is the Root CA certificate
5. Open the two (or three) cer files in notepad or notepad++ and concatenate them in order of leaf cert, (subordinate CA cert), root CA cert
6. Back in NSX Manager, select the CSR and click Actions|Import Certificate for CSR. In the Window, paste in the concatenated certificates from above and click save
7. Now you’ll have a new certificate and CA certs listed under Certificates. The GUI only shows a portion of the ID by default, click it to display the full ID and copy it to the clip board
8. Launch RESTClient in Firefox.
  - Click Authentication|Basic Authentication and enter the NSX Manager credentials for Username and Password, click “Okay”
  - For the URL, enter https://<NSX Manager IP or FQDN>api/v1/node/services/http?action=apply_certificate=<certificate ID copied in previous step>
  - Set the method to POST and click SEND button
  - check the Headers to confirm that the status code is 200
9. Refresh browser session to NSX Manager GUI to confirm new certificate is in use

Notes:
I was concerned that replacing the certificate would break the components registered via the certificate thumbprint; this process does not break those things. They remain registered and trust the new certificate

↧

PKS and NSX-T: I did everything wrong

May 15, 2018, 11:24 am

≫ Next: Automating PKS Upgrades

≪ Previous: Replacing the self-signed Certificate on NSX-T

I’ve fought with PKS and NSX-T for a month or so now. I’ll admit it: I did everything wrong, several times. One thing for certain, I know how NOT to configure it. So, now that I’ve finally gotten past my configuration issues, it makes sense to share the ~~pain~~ lessons learned.

Set your expectations correctly. PKS is literally a 1.0 product right now. It’s getting a lot of attention and will make fantastic strides very quickly, but for now, it can be cumbersome and confusing. The documentation is still pretty raw. Similarly, NSX-T is very young. The docs are constantly referring you to the REST API instead of the GUI – this is fine of course, but is a turn-off for many. The GUI has many weird quirks. (when entering a tag, you’ll have to tab off of the value field after entering a value, since it is only checked onBlur)
Use Chrome Incognito NSX-T does not work in Firefox on Windows. It works in Chrome, but I had issues where the cache would problems (the web GUI would indicate that backup is not configured until I closed Chrome, cleared cache and logged in again)
Do not use exclamation point in the NSX-T admin password Yep, learned that the hard way. Supposedly, this is resolved in PKS 1.0.3, but I’m not convinced as my environment did not wholly cooperate until I reset the admin password to something without an exclamation point in it
Tag only one IP Pool with ncp/external I needed to build out several foundations on this environment and wanted to keep them in discrete IP space by created multiple “external IP Pools” and assigning each to its own foundation. Currently the nsx-cli.sh script that accompanies PKS with NSX-T only looks for the “ncp/external” tag on IP Pools, if more than one is found, it quits. I suppose you could work around this by forking the script and passing an additional “cluster” param, but I’m certain that the NSBU is working on something similar
Do not take a snapshot of the NSX Manager This applies to NSX for vSphere and NSX-T, but I have made this mistake and it was costly. If your backup solution relies on snapshots (pretty much all of them do), be sure to exclude the NSX Manager and…
Configure scheduled backups of NSX Manager I found the docs for this to be rather obtuse. Spent a while trying to configure a FileZilla SFTP or even IIS-FTP server until it finally dawned on me that it really is just FTP over SSH. So, the missing detail for me was that you’ll just need a linux machine with plenty of space that the NSX Manager can connect to – over SSH – and dump files to. I started with this procedure, but found that the permissions were too restrictive.
Use concourse pipelines This was an opportunity for me to really dig into concourse pipelines and embrace what can be done. One moment of frustration came when PKS 1.0.3 was released and I discovered that the parameters for vSphere authentication had changed. In PKS 1.0 through 1.0.2, there was a single set of credentials to be used by PKS to communicate with vCenter Server. As of 1.0.3, this was split into credentials for master and credentials for workers. So, the pipeline needed a tweak in order to complete the install. I ended up putting in a conditional to check the release version, so the right params are populated. If interested, my pipelines can be found at https://github.com/BrianRagazzi/concourse-pipelines
Count your Load-Balancers In NSX-T, the load-balancers can be considered a sort of empty appliance that Virtual Servers are attached to and can itself attach to a Logical Router. The load-balancers in-effect require pre-allocated resources that must come from an Edge Cluster. The “small” load-balancer consumes 2 CPU and 4GB RAM and the “Large” edge VM provides 8 CPU and 16GB RAM. So, a 2-node Edge Cluster can support up to FOUR active/standby Load-Balancers. This quickly becomes relevant when you realize that PKS creates a new load-balancer when a new K8s cluster is created. If you get errors in the diego databse with the ncp job when creating your fifth k8s cluster, you might need to add a few more edge nodes to the edge cluster.

Configure your NAT rules as narrow as you can. I wasted a lot of time due to mis-configured NAT rules. The log data from provisioning failures did not point to NAT mis-configuration, so wild geese were chased. Here’s what finally worked for me:

Router	Priority	Action	Source	Destination	Translated	Description
Tier1 PKS Management	512	No NAT	[PKS Management CIDR]	[PKS Service CIDR]	Any	No NAT between management and services
	512	No NAT	[PKS Service CIDR]	[PKS Management CIDR]	Any	No NAT between management and services
	1024	DNAT	Any	[External IP for Ops Manager]	[Internal IP for Ops Manager]	So Ops Manager is reachable
		DNAT	Any	[External IP for PKS Service]	[Internal IP for PKS Service] (obtain from Status tab of PKS in Ops Manager)	So PKS Service (and UAA) is reachable
		SNAT	[Internal IP for PKS Service]	Any	[External IP for PKS Service]	Return Traffic for PKS Service
	2048		[PKS Management CIDR]	[Infrastructure CIDR] (vCenter Server, NSX Manager, DNS Servers)	[External IP for Ops Manager]	So PKS Management can reach infrastructure
	2048		[PKS Management CIDR]	[Additional Infrastructure] (NTP in this case)	[External IP for Ops Manager]	So PKS Management can reach infrastructure
Tier1 PKS Services	512	No NAT	[PKS Service CIDR]	[PKS Management CIDR]	Any	No NAT between management and services
	512	No NAT	[PKS Management CIDR]	[PKS Service CIDR]	Any	No NAT between management and services
	1024	SNAT	[PKS Service CIDR]	[Infrastructure CIDR] (vCenter Server, NSX Manager, DNS Servers)	[External IP] (not the same as Ops Manager and PKS Service, but in the same L3 network)	So PKS Services can reach infrastructure
	1024	SNAT	[PKS Service CIDR]	[Additional Infrastructure] (NTP in this case)	[External IP]	So PKS Services can reach infrastructure

↧

Automating PKS Upgrades

May 22, 2018, 2:44 pm

≫ Next: NSX-T 2.2 – Error 100 when trying to enum Firewall Rules

≪ Previous: PKS and NSX-T: I did everything wrong

Last night, Pivotal announced new versions of PKS and Harbor, so I thought it’s time to simplify the upgrade process. Here is a concourse pipeline that essentially aggregates the upgrade-tile pipeline so that PKS and Harbor are upgraded in one go.

What it does:

Runs on a schedule – you set the time and days it may run
Downloads the latest version of PKS and Harbor from Pivnet- you set the major.minor version range
Uploads the PKS and Harbor releases to your BOSH director
Determines whether the new release is missing a stemcell, downloads it from PivNet and uploads it to BOSH director
Stages the tiles/releases
Applies changes

What you need:

A working Concourse instance that is able to reach the Internet to pull down the binaries and repo
The fly cli and credentials for your Concourse.
A token from your PivNet account
An instance of PKS 1.0.2 or 1.0.3 AND Harbor 1.4.x deployed on Ops Manager
Credentials for your Ops Manager
(optional) A token from your GitHub account

How to use the pipeline:

Download params.yml and pipeline.yml from here.
Edit the params.yml by replacing the values in double-parentheses with the actual value. Each line has a bit explaining what it’s expecting. For example, ((ops_mgr_host)) becomes opsmgr.pcf1.domain.local
- Remove the parens
- If you have a GitHub Token, pop that value in, otherwise remove ((github_token))
- The current pks_major_minor_version regex will get the latest 1.0.x. If you want to pin it to a specific version, or when PKS 1.1.x is available, you can make those changes here.
- The ops_mgr_usr and ops_mgr_pwd credentials are those you use to logon to Ops Manager itself. Typically set when the Ops Manager OVA is deployed.
- The schedule params should be adjusted to a convenient time to apply the upgrade. Remember that in addition to the PKS Service being offline (it’s a singleton) during the upgrade, your Kubernetes clusters may be affected if you have the “Upgrade all Clusters” errand set to run in the PKS configuration, so schedule wisely!
Open your cli and login to concourse with fly

fly -t concourse login -c http://concourse.domain.local:8080 -u username -p password
Set the new pipeline. Here, I’m naming the pipeline “PKS_Upgrade”. You’ll pass the pipeline.yml with the “-c” param and your edited params.yml with the “-l” param

fly -t concourse sp -p PKS_Upgrade -c pipeline.yml -l params.yml

Answer “y” to “Apply Configuration”…
Unpause the pipeline so it can run when in the scheduled window

fly -t concourse up -p PKS_Upgrade
Login to the Concourse web to see our shiny new pipeline!

If you don’t want to deal with the schedule and simply want it to upgrade on-demand, use the pipeline-nosched.yml instead of pipeline.yml, just be aware that when you unpause the pipeline, it’ll start doing its thing. YMMV, but for me, it took about 8 minutes to complete the upgrade.

Behind the scenes
It’s not immediately obvious how the pipeline does what it does. When I first started out, I found it frustrating that there just isn’t much to the pipeline itself. To that end, I tried making pipelines that were entirely self-contained. This was good in that you can read the pipeline and see everything it’s doing; plus it can be made to run in an air-gapped environment. The downside is that there is no separation, one error in any task and you’ll have to edit the whole pipeline file.
As I learned a little more and poked around in what others were doing, it made sense to split the “tasks” out, keep them in a GitHub public repo and pull it down to run on-demand.

Pipelines generally have two main sections; resources and jobs.
Resources are objects that are used by jobs. In this case, the binary installation files, a zip of the GitHub repo and the schedule are resources.
Jobs are (essentially) made up of plans and plans have tasks.
Each task in most pipelines uses another source yml. This task.yml will indicate which image concourse should build a container from and what it should do on that container (typically, run a script). All of these task components are in the GitHub repo, so when the pipeline job runs, it clones the repo and runs the appropriate task script in a container built on an image pulled from dockerhub.

More info
I’ve got a several pipelines in the repo. Some of them do what they’re supposed to. Most of them are derived from others’ work, so many thanks to Pivotal Services and Sabha Parameswaran

↧

NSX-T 2.2 – Error 100 when trying to enum Firewall Rules

August 29, 2018, 12:28 pm

≫ Next: PAS with NSX-T Tip: use a fresh IP Block

≪ Previous: Automating PKS Upgrades

After upgrading to NSX-T 2.2, my environment began throwing this error in the GUI when I tried to navigate to the firewall section or any router. In addition, the nsx-cli shell script for cleanup was failing every time with a similar firewall-rule-related error.

Searching for a bitm I stumbled onto KB 56611: Upgrading NSX-T manager from 2.1.0.0 to 2.2.0.0 reports “General Error has occurred” on Firewall’s General UI section.

Down at the bottom of the KB, it essentially states that if you’ve already upgraded to 2.2 from 2.1, you’ll have to replace a jar file in order to resolve the problem. Oh, and you have to open a ticket to get the .jar.

So, if you run into this – and you receive the nsx-firewall-1.0.jar file – here’s the steps for resolution:

1. SSH into the NSX Manager as root (not admin)
2. Navigate to /opt/vmare/proton-tomcat/webapps/nsxapi/WEB-INF/lib
3. Copy the existing nsx-firewall-1.0.jar file elsewhere (I copied it to home and SCP’d it out from there)
4. Copy the new nsx-firewall-1.0.jar file into this folder. (I put it on an local webserver and pulled it down with wget)
5. Change the owner of the jar to uproton:
  
  chown uproton:uproton nsx-firewall-1.0.jar
6. Change the permissions to match the other files:
  
  chmod o-r nsx-firewall-1.0.jar
7. Reboot the NSX Manager
8. Enjoy being able to see and edit firewall rules again!

↧

PAS with NSX-T Tip: use a fresh IP Block

January 9, 2019, 8:09 am

≫ Next: Using Helm and Dynamic PersistentVolumes with Multi-AZ PKS on vSphere

≪ Previous: NSX-T 2.2 – Error 100 when trying to enum Firewall Rules

I’ve fought with this for an embarrassingly long time. Had a failed PAS (Pivotal Application Services) deployment (missed several of the NSX configuration requirements) but removed the cruft and tried again and again and again. In each case, PAS and NCP would deploy, but fail on the PAS smoke_test errand. The error message said more detail is in the log.

Which Log?!

I ssh’d into the clock_global VM and found the smoke_test logs. They stated that the container for instance {whatever} could not be created and an error of NCP04004. This pointed me to the Diego Cells (where the containers would be created) and I poked around in the /var/vcap/sys/log/garden logs there. They stated that the interface for the instance could not be found. Ok, this is sounding more like an NSX problem.

I ended up parsing through the NSX Manager event log and found this gem:

IP Block Error

Ah-ha! Yup, I’d apparently allocated a couple of /28 subnets from the IP Block. So when the smoke test tried to allocate a /24, the “fixed” subnet size had already been set to /28, causing the error.

Resolution was to simply remove all of the allocated subnets from the IP block. This could have been avoided by either not reusing an existing IP Block or using the settings in the NCP configuration to create a new IP Block with a given CIDR.

↧

Using Helm and Dynamic PersistentVolumes with Multi-AZ PKS on vSphere

January 17, 2019, 1:54 pm

≫ Next: Hands-on with VxRail v4.7.100

≪ Previous: PAS with NSX-T Tip: use a fresh IP Block

So, you’ve installed PKS and created a PKS cluster. Excellent! Now what?

We want to use helm charts to deploy applications. Many of the charts use PersistentVolumes, so getting PVs set up is our first step.

There are a couple of complicating factors to be aware of when it comes to PVs in a multi-AZ/multi-vSphere-Cluster environment. First, you probably have cluster-specific datastores – particularly if you are using Pivotal Ready Architecture and VSAN. These datastores are not suitable for PersistentVolumes consumed by applications deployed to our Kubernetes cluster. To work-around this, we’ll need to provide some shared block storage to each host in each cluster. Probably the simplest way to do this is with an NFS share.

Prerequisites:

Common datastore; NFS share or iSCSI

In production, you’ll want a production-quality fault-tolerant solution for NFS or iSCSI, like Dell EMC Isilon. For this proof-of-concept, I’m going to use an existing NFS server, create a volume and share it to the hosts in the three vSphere clusters where the PKS workload VMs will run. In this case, the NFS datastore is named “sharednfs” ’cause I’m creative like that. Make sure that your hosts have adequate permissions to the share. Using VMFS on iSCSI is supported, just be aware that you may need to cable-up additional NICs if yours are already consumed by N-VDS and/or VSAN.

Workstation Prep

We’ll need a handful of command-line tools, so make sure your workstation has the PKS CLI and Kubectl CLI from Pivotal and you’ve downloaded and extracted Helm.

PKS Cluster
We’ll want to provision a cluster using the PKS CLI tool. This document assumes that your cluster was provisioned successfully, but nothing else has been done to it. For my environment, I configured the “medium” plan to use 3 Masters and 3 Workers in all three AZs, then created the cluster with the command

pks create-cluster pks1cl1 --external-hostname cl1.pks1.lab13.myenv.lab --plan "medium" --num-nodes "3"

Logged-in
Make sure you’re logged into the Kubernetes cluster. In PKS, the easiest way to do this is via the PKS cli:

pks login -a api.pks1.lab13.myenv.lab -u pksadmin -p my_password --skip-ssl-validation pks cluster pks1cl1 pks get-credentials pks1cl1 kubectl config use-context pks1cl1 kubectl get nodes -o wide

Where “pks1cl1″ is replaced by your cluster’s name,”api.pks1.lab13.myenv.lab” is replaced by the FQDN to your PKS API server, “pksadmin” is replaced by the username with admin rights to PKS and “my_password” is replaced with that account’s password.

Procedure:

Create storageclass
- Create storageclass spec yaml. Note that the file is named storageclass-nfs.yml and we’re naming the storage class itself “nfs”:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: nfs
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/vsphere-volume
parameters:
  diskformat: thin
  datastore: sharednfs
  fstype: ext3
```
- Apply the yml with kubectl
  
  kubectl create -f storageclass-nfs.yml
- Create a sample PVC (Persistent Volume Claim). Note that the file is names pvc-sample.yml, the PVC name is “pvc-sample” and uses the “nfs” storageclass we created above. This step is not absolutely necessary, but will help confirm we can use the storage.
```
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-sample
  annotations:
    volume.beta.kubernetes.io/storage-class: nfs
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs
```
- Apply the yml with kubectl
  
  kubectl create -f pvc-sample.yml
  
  If you’re watching vSphere closely, you’ll see a VMDK created in the kubevols folder of the NFS datastore
- Check that the PVC was created with
  
  kubectl get pvc
  
  and
  
  kubectl describe pvc pvc-sample
- Remove sample PVC with
  
  kubectl delete -f pvc-sample
Configure Helm and Tiller
- Create Service Account for tiller with
```
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
  name: tiller
  namespace: kube-system
```
- Apply the service account yml with Kubectl
  
  kubectl create -f rbac-config.yml
- Initialize helm and tiller with
  
  helm init --service-account tiller
- Check that tiller is ready
  
  helm version
  
  Look for a version number for the version; note that it might take a few seconds for tiller in the cluster to get ready.
Deploy sample helm chart
- Update helm local chart repository. We do this so that we can be sure that helm can reach the public repo and to cache teh latest information to our local repo.
  
  helm repo update
  
  If this step results in a certificate error, you may have to add the cert to the trusted certificates on the workstation.
- Install helm chart with ingress enabled. Here, I’ve selected the Dokuwiki app. The command below will enable ingress, so we can access it via routable IP and it will use the default storageclass we configured earlier.
  
  helm install --name dokuwiki --set ingress.enabled="true" stable/dokuwiki
- Confirm that the app was deployed
```
helm list
kubectl get pods -n default
kubectl get services -n default
```
  From the get services results, make a note of the external IP address – in the example above, it’s 192.13.6.73
- Point a browser at the external address from the previous step and marvel at your success in deploying Dokuwiki via helm to Kubernetes!
  If you want to actually login to your Dokuwiki instance, first obtain the password for the user account with this command:
```
kubectl get secret -n default dokuwiki-dokuwiki -o jsonpath="{.data.dockuwiki-password}" | base64 --decode
```
  Then login with username “user” and that password.
Additional info
- View Persistent Volume Claims with
  
  kubectl get pvc -n default
  
  This will list the PVCs and the volumes in the “default” namespace. Note the volume corresponds to the name of the VMDK on the datastore.
- Load-Balancer
  Notice that since we are leveraging the NSX-T Container Networking Interface and enabled the ingress when we installed dokuwiki, a load-balancer in NSX-T was automatically created for us to point to the application.

This took me some time to figure out; had to weed through a lot of documentation – some of which contradicted itself and quite a bit of trial-and-error. I hope this helps save someone time later!

↧

Hands-on with VxRail v4.7.100

January 23, 2019, 11:52 am

≫ Next: Manually creating a Kubernetes cluster with kubeadm

≪ Previous: Using Helm and Dynamic PersistentVolumes with Multi-AZ PKS on vSphere

Recently (yesterday!) upgraded the VxRail clusters in a lab my team uses for Pivotal Ready Architecture development and testing and immediately noticed many differences.

When trying to go to the VxRail Manager, I am redirected to the vSphere Web Client This was concerning at first, but quickly realized that the VxRail Manager interface is now embedded in the vSphere Web Client (hence the redirection!)

In this environment, we have a “management” cluster using the VxRail-managed vCenter Server. On this cluster is also another vCenter Server that is used by three additional VxRail clusters – when the “Availability Zone” clusters are deployed, they use this shared “external” vCenter Server.

After upgrading the management cluster, the VxRail extension was registered in the HTML5 vSphere Web Client.

From here, when selecting the “VxRail” choice, you’ll see the VxRail dashboard. This allows you to see a quick status of the selected VxRail cluster, some support information and recent messages from the VxRail community.

Configure/VxRail

The most important features of the VxRail extension is found under the VxRail section of the Configure tab of a selected VxRail vSphere cluster:

The System section displays the currently-installed VxRail version and has a link to perform an update. Clicking the “Update” link will launch a new browser tab and take you to the VxRail Manager web gui where you can perform the bundle update.
Next, the Market item will also launch a new browser tab on the VxRail Manager web gui where you can download available Dell EMC applications. For now, it lists Data Domain Virtual Edition and Isilon SD Edge.
The Add VxRail Hosts item will display any newly-discovered VxRail nodes and allow you to add those nodes to a cluster

The Hosts item displays the hosts in the cluster. One interesting feature here is that it displays the Service Tag and Appliance ID right here! You may not need this information often, but when you do, it’s super-critical.
You’ll notice the “Edit” button on the hosts list; this allows you to set/change the host’s name and management IP.

Monitor/VxRail

On the Monitor tab for the selected vSphere VxRail cluster, the Appliances item provides a link to the VxRail Manager web gui where the hardware is shown and will highlight any components in need of attention. Any faults will also be in the “All Issues” section of the regular web client, so the hardware detail will provide visual clues if needed.

Congratulations to the VxRail team responsible for this important milestone that brings another level of integration with vSphere and a “single-pane-of-glass”!

↧

Manually creating a Kubernetes cluster with kubeadm

April 22, 2019, 8:33 am

≫ Next: Getting Started with CredHub in Concourse

≪ Previous: Hands-on with VxRail v4.7.100

I’ve talked about Pivotal Container Service (PKS) before and now work for Pivotal, so I’ve frequently got K8s on my mind. I’ve discussed at length the benefits of PKS and the creation of K8s clusters, but didn’t have much of a point of reference for alternatives. I know about Kelsey Hightower’s book and was looking for something a little less in-the-weeds.

Enter kubeadm, included in recent versions of K8s. With this tool, you’re better able to understand what goes into a cluster, how the master and workers are related and how the networking is organized. I really wanted to stand up a K8s cluster alongside a PKS-managed cluster in order to better understand the differences (if any). This is also a part of the Linux Foundation training “Kubernetes Fundamentals”. I don’t want to spoil the course for you, but will point to some of the docs on kubernetes.io

Getting Ready

I used VMware Fusion on the Macbook to create and run two Ubuntu 18.04 VMs. Each was a linked clone with 2GB RAM, 1vCPU. Had to make sure that they had different MAC addresses, IP, UUIDs and host names. I’m sure you can use nearly any virtualization tool to get your VMs running. Once running, be sure you can SSH into each.

Install Docker

I thought, “hey I’ve done this before” and just installed Docker as per usual, but that method does not leverage the correct cgroup driver, so we’ll want to install Docker with the script found here.

Install Tools

Once again, the kubernetes.io site provides commands to properly install kubeadm, kubelet and kubectl on our Ubuntu nodes. Use the commands on that page to ensure kubelet is installed and held to the correct version.

Choose your CNI pod network

Ok, what? CNI is the Container Network Interface is a specification for networking add-ons for K8s. Kubeadm requires that we use a pod network addon that uses the CNI spec. The pod network – we may have only one per k8s cluster – is the network that the pods communicate on; think of it as using a NAT rather than the network you’ve actually assigned to the Ubuntu nodes. Further, this can be confusing, because this address space is not what is actually assigned to the pods. This address space is used when we “expose” a service. What pod-network-cidr you assign depends on which network add-on you select. In my case, I went with Canal as it seems to be both powerful and flexible. Also, the pod-network cidr used by Calico is “192.168.0.0/16”, which is already in use in my home lab – it may not have actually been a conflict, but it certainly would be confusing if it were in use twice.

Create the master node
Make sure you’re ssh’d into your designated “Master” Ubuntu VM, make sure that you’ve installed kubeadm, kubelet, kubectl and docker from the steps above. If you also choose canal, you’ll initialize the master node (not on the worker node – we’ll have a different command for that one) by running

kubeadm init –pod-network-cidr=10.244.0.0/16

Exactly that CIDR. It’ll take a few minutes to download, install and configure the k8s components. When the initialization has completed, you’ll see a message like this:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run “kubectl apply -f [podnetwork].yaml” with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.129:6443 –token zdjzrp.5jad4gihqjo46olg \
–discovery-token-ca-cert-hash sha256:90d1b349aa93a7130ee91668e4e763a4c29e5fc1502060191b38ea0e31d3cec8

Using, this, we’ll exit su and run the

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Make a note of the bottom section as we’ll need it in order to join our worker to the newly-formed cluster.

Sanity-check:

On the master, run kubectl get nodes. you’ll notice that we have 1 node and it’s not ready:

Install the network pod add-on

Referencing the docs, you’ll note that Canal has a couple yaml files to be applied to our cluster. So, again on our master node, we’ll run these commands to install and configure Canal:


kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/canal/rbac.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/canal/canal.yaml

Sanity-check:

On the master, run kubectl get nodes. You’ll now notice that we still have 1 node but now it’s ready:

Enable pod placement on master
By default, the cluster will not put pods on the master node. If you want to use a single-node cluster or use compute capacity on the master node for pods, we’ll need to remove the taint.

Default taint on master

We’ll remove the taint with the command

kubectl taint nodes –all node-role.kubernetes.io/master-

Join worker to cluster
You saved the output from kubeadm init earlier, right? We’ll need that now to join the worker to the cluster. On the worker VM, become root via sudo su – and run that command:

Join worker to cluster

Now, back on master, we run kubectl get nodes and can see both nodes!

Master and Worker Ready

Summary and Next Steps

At this point, we have a functional kubernetes cluster with 1 master and 1 worker. Next in this series, we’ll deploy some applications and compare the behavior to a PKS-managed kubernetes cluster.

↧

Getting Started with CredHub in Concourse

May 10, 2019, 10:08 am

≫ Next: Logging into a Kubernetes cluster with an OIDC LDAP account

≪ Previous: Manually creating a Kubernetes cluster with kubeadm

First, some background – I promise to keep it short. You should never have credentials in a public github repo. Probably not good to have them in a private repo either. At Pivotal, the github client is configured with credalert which complains when I try to push credentials to github. I maintain compliance, I needed a way to update the stuff in the repo and have my credentials too. Concourse supports a couple of AWS credential managers, Vault and Credhub. Since CredHub is built-in to ops-manager-deployed BOSH director, we don’t have to spin anything else up.

The simplified diagram shows how this will work. CredHub is on the BOSH director, so it’ll need to be reachable from the Concourse Web ATS service and anywhere the credhub cli will be used. If your BOSH director is behind a NAT, you may want to configure a DNAT, so it can be reached.

In this case, we’re using a “management/infrastructure” Operations Manager and BOSH director to deploy and manage concourse and minio. The pipelines on concourse will be used to deploy and maintain other foundations in the environment.

Configure UAA

Logon to the ops manager and navigate to status to record the IP address of the BOSH director. If your BOSH director is behind a NAT, locate it’s DNAT instead.
Navigate to the credentials tab. We’re going to need the uaa_login_client_credentials password and the uaa_admin_client_credentials password.
While here, save the ops manager root ca to your computer. From the installation dashboard, click on your name in the upper right, select Settings. Then click Advanced and Download Root CA.
SSH into your ops manager: ubuntu@<ops manager name or IP>
Set uaac target

uaac target https://<IP of BOSH director>:8443 –ca-cert /var/tempest/workspaces/default/root_ca_certificate
Login to uaac – ok, this gets awkward

uaac token owner get login -s <uaa_login_client_credentials>
- Replace <uaa_login_client_credentials> with the value you saved
- When prompted for a username enter admin
- For password enter the uaa_admin_client_password value you saved
- You should see “Successfully fetched token…”
Create a uaac client for concourse to use with credhub

uaac client add –name concourse-to-credhub –authorized-grant-types client_credentials –authorities credhub.read,credhub.write –access-token-validity 30 –secret MySecretPassword

Please replace MySecretPassword with something else
Create a uaac user for use with the CredHub cli

uaac user add credhub –email credhub@whatever.com -p MySecretPassword

Try out Credhub cli

1. Download and install the credhub cli. On mac, you can use brew install credhub
2. From a terminal/command line run this to point the cli to the credhub instance on the BOSH director:
  
  credhub api -s <IP of BOSH director>:8844 –ca-cert ./root_ca_certificate
  - Replace <IP of BOSH director> with the name or reachable IP of the director
  - root_ca_certificate is the root CA from ops manager you downloaded earlier
3. Login to credhub:
  
  credhub login -u credhub -p MySecretPassword
  
  User and pass are from the User we added to uaa earlier
4. Set a test value:
  
  credhub –type:value –name=/testval –value=hello
  
  Here’s we’re setting a key (aka credential) with the name /testval to the value “hello”. Note that all the things stored in credhub start with a slash and that there are several types of credentials that can be stored, the simplest being “value”
5. Get our value:
  
  credhub –name /testval
  
  This will return the metadata for our key/credential

Configuring Concourse to use CredHub

Concourse TSA must be configured to look to credhub as a credential manager. I’m using BOSH-deployed concourse, so I’ll simply update the deployment manifest with the new params. if you’re using concourse via docker-compose, you’ll want to update the yml with the additional params as described here.

For concourse deployed via BOSH and using concouse-bosh-deployment, we’ll include the /operations/credhub.yml file and the additional params. For me this looks like

bosh -e core deploy -d concourse concourse.yml \
-l ../versions.yml \
–vars-store cluster-creds.yml \
-o operations/static-web.yml \
-o operations/basic-auth.yml \
-o operations/scale.yml \
-o operations/privileged-http.yml \
-o operations/credhub.yml \
–var web_ip=192.168.100.205 \
–var external_url=http://concourse.ragazzilab.com \
–var network_name=INFRA \
–var web_vm_type=small.disk \
–var db_vm_type=small.disk \
–var azs=[BOSH] \
–var db_persistent_disk_type=10240 \
–var worker_vm_type=concourse.worker \
–var deployment_name=concourse \
–var local_user.username=myuser \
–var local_user.password=mypass \
–var web_instances=1 \
–var worker_instances=1 \
–var syslog_address=syslog.ragazzilab.com \
–var syslog_port=514 \
–var syslog_permitted_peer=syslog.ragazzilab.com \
–var credhub_url=”https://192.168.100.200:8844 ” \
–var credhub_client_id=concourse-to-credhub \
–var credhub_client_secret=MySecretPassword \
–var credhub_ca_cert=”$(cat root_ca_certificate)”

Test a pipeline

1. Use credhub cli to create a value
  
  credhub set –name /concourse/main/hello-credhub/hello –value World
  
  Concourse has a default pattern for looking up interpolation values. It’s /concourse/<team name>/<pipeline name>/<key>
2. Get the test pipeline from here.
  
  jobs:
  – name: hello-credhub
  plan:
  – do:
  – task: hello-credhub
  config:
  platform: linux
  image_resource:
  type: docker-image
  source:
  repository: ubuntu
  run:
  path: sh
  args:
  – -exc
  – |
  echo “Hello $WORLD_PARAM”
  params:
  WORLD_PARAM: ((hello))
3. Use fly to set the test pipeline
  
  fly -t concourse login -c http://concourse -u myuser -p mypass -n main
  fly -t core sp -p hello-credhub -c hello-credhub.yml
4. Run the test pipeline in concourse. If all goes well, it should say Hello World”

↧

Logging into a Kubernetes cluster with an OIDC LDAP account

September 9, 2019, 10:34 am

≫ Next: Adding a private Docker registry to a PKS 1.5 Windows Kubernetes cluster

≪ Previous: Getting Started with CredHub in Concourse

I confess, most of my experience with Kubernetes is with Pivotal Container Service (PKS) Enterprise. PKS makes it rather easy to get started and I found that I took some tasks for granted.

In PKS Enterprise, one can use the pks cli to not only life-cycle clusters, but to obtain the credentials to the cluster and automatically update the kubeconfig with the account information. So, administrative/operations users can run the command “pks get-credentials my-cluster” to have a kubeconfig updated with the authentication tokens and parameters to connect to my-cluster.

K8s OIDC using UAA on PKS

The PKS controller includes the User Account and Authentication (UAA) component, which is used to authenticate users into PKS Enterprise. UAA can also be easily configured to connect to an existing LDAP service – this is the desired configuration in most organizations so that users account exist in one place (Active Directory in my example).

So, I found myself wondering “I don’t want to provide the PKS CLI to developers, so how can they connect kubectl to the cluster?”

Assumptions:

PKS Enterprise on vSphere (with or without NSX-T)
Active Directory
Developer user account belongs to the k8s-devs security group in AD

Prerequisite configuration:

UAA on PKS configured a with UAA User Account Store: LDAP Server. This links UAA to LDAP/Active Directory
User Search Filter: userPrincipalName={0} This means that users can login as user@domain.tld
Group Search Filter: member={0} This ensures that AD groups may be used rather than specifying individual users
Configure created clusters to use UAA as the OIDC provider: Enabled This pre-configures the kubernetes API to use OpenID Connect with UAA. If not using PKS Enterprise, you’ll need to provide another OpenID Connect-Compliant endpoint (like Dex), link it to Active Directory and update the kubernetes cluster api manually to use the OpenID Authentication.

Operator: Create Role and RoleBinding:

While authentication is handled by OIDC/UAA/LDAP, Authorization must be configured on the cluster to provide access to resources via RBAC. This is done by defining a Role (or clusterRole) that indicates what actions may be taken on what resources and a RoleBinding which links the Role to one or more “subjects”.

Authenticate to kubernetes cluster with an administrative account (for example, using PKS cli to connect)

Create yaml file for our Role and RoleBinding:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: developers
rules:
- apiGroups: ["", "extensions", "apps"]
  resources: ["deployments", "replicasets", "pods"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] 
  # You can also use ["*"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: developer-dev-binding
subjects:
- kind: Group
  name: k8s-devs
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developers
  apiGroup: rbac.authorization.k8s.io

In the example above, we’re creating a Role named “developers”, granting access to the core, extensions and apps API groups and several actions against deployments, replicaSets and pods. Notice that developers in this role would have have access to secrets (for example)
The example RoleBinding binds a group named “k8s-devs” to the developers role. Notice that we have not created the k8s-devs group in Kubernetes or UAA; it exists in Active Directory

Use Kubectl to apply the yaml, creating the Role and Rolebinding in the targeted namespace

Creating the kubeconfig – the hard way

To get our developer connected with kubectl, they’ll need a kubeconfig with the authentication and connection details. The Hard way steps are:

Operator obtains the cluster’s certificate authority data. This can be done via curl or by copying the value from the existing kubeconfig.

Operator creates a template kubeconfig, replacing the value specified, then sends it to the developer user

apiVersion: v1
clusters:
- cluster:
  certificate-authority-data: <OBTAINED IN STEP 1 >
  server: < FQDN to Master Node. >
  name: PROVIDED-BY-ADMIN
contexts:
- context:
    cluster: PROVIDED-BY-ADMIN
    user: PROVIDED-BY-USER
  name:  PROVIDED-BY-ADMIN
current-context: PROVIDED-BY-ADMIN
kind: Config
preferences: {}
users:
- name: PROVIDED-BY-USER
  user:
    auth-provider:
      config:
        client-id: pks_cluster_client
        cluster_client_secret: ""
        id-token: PROVIDED-BY-USER
        idp-issuer-url: https://PROVIDED-BY-ADMIN:8443/oauth/token
        refresh-token:  PROVIDED-BY-USER
      name: oidc

The developer user obtains the id_token and refresh_token from UAA, via a curl command
curl 'https://PKS-API:8443/oauth/token' -k -XPOST -H 'Accept: application/json' -d "client_id=pks_cluster_client&client_secret=""&grant_type=password&username=UAA-USERNAME&response_type=id_token" --data-urlencode password=UAA-PASSWORD
The developer user updates the kubeconfig with the id_token and refresh token in the kubeconfig

Creating the kubeconfig – the easy way

Assuming the developer is using Mac or Linux…

Install jq on developer workstation
Download the get-pks-k8s-config.sh script, make it executable (chmod +x get-pks.k8s.config.sh)
Execute the script (replace the params with your own)
```
./get-pks-k8s-config.sh --API=api.pks.mydomain.com \
--CLUSTER=cl1.pks.mydomain.com \
--USER=dev1@mydomain.com \
--NS=scratch
```
- API – FQDN to PKS Controller, for UAA
- CLUSTER – FQDN to master node of k8s cluster
- USER – userPrincipalName for the user
- NS – Namespace to target; optional
After entering the user’s password, the script will set the params in the kubeconfig and switch context automatically

Try it out
Our developer user should able to “see” pods but not namespaces for example:

dev1 can see pods but not namespaces

Creating the kubeconfig – the easiest way

Provide the developer with the PKS CLI tool, remember we have not added them to any group or role with PKS admin permissions.
Provide the developer with the PKS API endpoint FQDN and the cluster name
The developer may run this command to generate the updated kubeconfig and set the current context
pks get-kubeconfig CLUSTERNAME -a API -u USER -k
- CLUSTERNAME is the name of the cluster
- API – FQDN to PKS Controller
- USER – userPrincipalName for the user
You’ll be prompted for the account password. Once entered, the tool will fetch the user-specific kubeconfig.

Use PKS CLI to get the kubeconfig

↧

Adding a private Docker registry to a PKS 1.5 Windows Kubernetes cluster

September 10, 2019, 9:37 am

≫ Next: Replicating images from DockerHub to Harbor

≪ Previous: Logging into a Kubernetes cluster with an OIDC LDAP account

Pivotal Container Service (PKS) 1.5 and Kubernetes 1.14 bring *beta* support for Workers running Windows. This means that we can provide the advantages of Kubernetes to a huge array of applications running on Windows. I see this especially useful for Windows applications that you don’t have the source code for and/or do not want to invest in reworking it for .NET core or languages that run on Linux.

In nearly all cases, you’ll need an image with your applications’ dependencies or configuration and in the real world, we don’t want those in the public space like dockerhub. Enter Private Docker Repositories.

PKS Enterprise includes VMware Harbor as a private registry, it’s very easy to deploy alongside PKS and provides a lot of important functionality. The Harbor interface uses TLS/SSL; you may use a self-signed, enterprise PKI-signed or public CA-signed certificate. If you chose to not use a public CA-signed certificate ($!), the self-signed or PKI-signed certificate must be trusted by the docker engine on each Kubernetes worker node.

Clusters based on Ubuntu Xenial Stemcells:

The operator/administrator simply puts the CA certificate into the “Trusted Certificates” box of the Security section in Ops Manager.
When BOSH creates the VMs for kubernetes clusters, the trusted certificates are added to the certificate store automatically.
If using an enterprise PKI where all of the internal certificates are signed by the Enterprise CA, this method makes it very easy to trust and “un-trust” CAs.

Clusters based on Windows 2019 Stemcells:

This is one of those tasks that is easier to perform on Linux that it is on Windows. Unfortunately, Windows does not automatically add the Trusted Certificates from Ops Manager to the certificate store, so extra steps are required.

1. Obtain the Registry CA Certificate. In Harbor, you may click the “REGISTRY CERTIFICATE” link while in a Project. Save the certificate to where the BOSH cli is installed (Ops Manager typically).
2. Connect BOSH cli to director. This may be done on the Ops Manager.
3. List BOSH-managed vms to identify the service_instance deployment corresponding to the targeted K8s cluster by matching the VM IP address to the IP address of the master node as reported by PKS cluster.
4. Run this command to copy the certificate to the Windows worker
```
bosh -e ENV -d DEPLOYMENT scp root.cer WINDOWS-WORKER:/
```
  - ENV – your environment alias in the BOSH cli
  - DEPLOYMENT – the BOSH deployment that corresponds to the k8s cluster; ex: service-instance_921bd35d-c46d-4e7a-a289-b577ff743e15
  - WINDOWS-WORKER – the instance name of the specific Windows worker VM; ex: windows-worker/277536dd-a7e6-446b-acf7-97770be18144
  This command copies the local file named root.cer to the root folder on the Windows VM
5. Use BOSH to SSH into the Windows Worker.
```
bosh -e ENV -d DEPLOYMENT ssh WINDOWS-WORKER
```
  - ENV – your environment alias in the BOSH cli
  - DEPLOYMENT – the BOSH deployment that corresponds to the k8s cluster; ex: service-instance_921bd35d-c46d-4e7a-a289-b577ff743e15
  - WINDOWS-WORKER – the instance name of the specific Windows worker VM; ex: windows-worker/277536dd-a7e6-446b-acf7-97770be18144
  SSH into Windows node, notice root.cer on the filesystem
6. In the Windows SSH session run “powershell.exe” to enter powershell
7. At the PS prompt, enter
```
Import-certificate -filepath .\root.cer -CertStoreLocation Cert:\LocalMachine\Root
```
  The example above imports the local file “root.cer” into the Trusted Root Certificate Store
8. Type “exit” twice to exit PS and SSH
9. Repeat steps 5-8 for each worker node.

Add docker-registry secret to k8s cluster

Whether the k8s cluster is running Windows workers or not, you’ll want to add credentials for authenticating to harbor. These credentials are stored in a secret. To add the secret, use this command:

kubectl create secret docker-registry harbor \
--docker-server=HARBOR_FQDN \
--docker-username=HARBOR_USER \
--docker-password=USER_PASS \
--docker-email=USER_EMAIL

HARBOR_FQDN – FQDN for local/private Harbor registry
HARBOR_USER – name of user in Harbor with access to project and repos containing the desired images
USER_PASS – username for the above account
USER_EMAIL – email adddress for the above account

Note that this secret is namespaced; it needs to be added to the namespace of the deployments that will reference it

More info

Here’s an example deployment yaml for a Windows K8s cluster that uses a local private docker registry. Note that Windows clusters cannot leverage NSX-T yet, so this example uses a NodePort to expose the service.

↧

Replicating images from DockerHub to Harbor

March 24, 2020, 1:24 pm

≫ Next: Configure Tanzu Kubernetes Grid to use Active Directory

≪ Previous: Adding a private Docker registry to a PKS 1.5 Windows Kubernetes cluster

Harbor Logo I found the documentation for actually replicating images from DockerHub to a local Harbor instance to be missing. So here’s what I’ve found:

Objective: Replicate the images for the Yelb sample application to local Harbor repo

Set-up and Prereqs

A local Harbor instance – I’ll be using an Enterprise PKS foundation with Harbor 1.10
An account for DockerHub

Steps

1. 1. Login to Harbor Web GUI as an administrator. Navigate to Administration/Registries
  2. Add Endpoint for local Harbor by clicking ‘New Endpoint’ and entering the following:
    - Provider: harbor
    - Name: local (or FQDN or whatever)
    - Description: optional
    - Endpoint URL: the actual URL for your harbor instance beginning with https and ending with :443
    - Access ID: username for an admin or user that at least has Project Admin permission to the target Projects/namespaces
    - Access Secret: Password for the account above
    - Verify Remote Cert: typically checked
  3. Add Endpoint for Docker Hub by clicking ‘New Endpoint’ and entering the following:
    - Provider: docker-hub
    - Name: dockerhub (or something equally profound)
    - Description: optional
    - Endpoint URL: pre-populated/li>
    - Access ID: username for your account at dockerhub
    - Access Secret: Password for the account above
    - Verify Remote Cert: typically checked
    Notice that this is for general dockerhub, not targeting a particular repo.
  4. Configure Replications for the Yelb Images
    You may create replications for several images at once using a variety of filters, but I’m going to create a replication rule for each image we need. I think this makes it easier to identify a problem, removes the risk of replicating too much and makes administration easier. Click ‘New Replication Rule‘ enter the following to create our first rule:
    - Name: yelb-db-0.5
    - Description: optional
    - Replication Mode: Pull-based (because we’re pulling the image from DockerHub)
    - Source registry: dockerhub
    - Source Registry Filter – Name: mreferre/yelb-db
    - Source Registry Filter – Tag: 0.5
    - Source Registry Filter – Resource: pre-populated
    - Destination Namespace: yelb (or whatever Project you want the images saved to)
    - Trigger Mode: Select ‘Manual’ for a one-time sync or select ‘Scheduled’ if you want to ensure the image is replicated periodically. Note that the schedule format is cron with seconds, so 0 0 23 * * 5 would trigger the replication to run every Friday at 23:00:00. Scheduled replication makes sense when the tag filter is ‘latest’ for example
    - Override: leave checked to overwrite the image if it already exists
    - Enable rule: leave checked to keep the rule enabled
  5. Add the remaining Replication Rules:
    
    Name Name Filter Tag Filter Dest Namespace
    
    yelb-ui-latest mreferre/yelb-ui latest yelb
    
    yelb-appserver-latest mreferre/yelb-appserver latest yelb
    
    redis-4.0.2 library/redis 4.0.2 yelb
    
    Note that redis is an official image, so we have to include library/

Name	Name Filter	Tag Filter	Dest Namespace
yelb-ui-latest	mreferre/yelb-ui	latest	yelb
yelb-appserver-latest	mreferre/yelb-appserver	latest	yelb
redis-4.0.2	library/redis	4.0.2	yelb

↧

Configure Tanzu Kubernetes Grid to use Active Directory

May 12, 2020, 9:36 am

≫ Next: Use Helm to deploy Harbor with Annotations for Velero

≪ Previous: Replicating images from DockerHub to Harbor

Tanzu Kubernetes Grid includes and supports packages for dex and Gangway. These are used to extend authentication to LDAP and OIDC endpoints. Recall that Kubernetes does not do user-management or traditional authentication. As a K8s cluster admin, you can create service accounts of course, but those are not meant to be used by developers.

Think of dex as a transition layer, it uses ‘connectors’ for upstream Identity providers (IdP) like Active Directory for LDAP or Okta for SAML and presents an OpenID Connect (OIDC) endpoint for k8s to use.

TKG provides not only the packages mentioned above, but also a collection of yaml files and documentation for implementation. The current version (as of May 12, 2020) documentation for configuring authentication is pretty general, the default values in the config files are suitable for OpenLDAP. So, I thought I’d share the specific settings for connecting dex to Active Directory.

Assumptions:

1. TKG Management cluster is deployed
2. Following the VMware documentation
3. Using the TKG-provided tkg-extensions
4. dex will be deployed to management cluster or to a specific workload cluster

Edits to authentication/dex/vsphere/ldap/03-cm.yaml – from Docs

Replace <MGMT_CLUSTER_IP> with the IP address of one of the control plane nodes of your management cluster. This is one of the control plane nodes where we’re putting dex
If the LDAP server is listening on the default port 636, which is the secured configuration, replace <LDAP_HOST> with the IP or DNS address of your LDAP server. If the LDAP server is listening on any other port, replace <LDAP_HOST> with the address and port of the LDAP server, for example 192.168.10.22:389 or ldap.mydomain.com:389. Never, never, never use unencrypted LDAP. You’ll need to specify port 636 unless your targeted AD controller is also a Global Catalog server in which case you’ll specify port 3269. Check with the AD team if you’re unsure.
If your LDAP server is configured to listen on an unsecured connection, uncomment insecureNoSSL: true. Note that such connections are not recommended as they send credentials in plain text over the network. Never, never, never use unencrypted LDAP.
Update the userSearch and groupSearch parameters with your LDAP server configuration. This need much more detail – see steps below

Edits to authentication/dex/vsphere/ldap/03-cm.yaml – AD specific

Obtain the root CA public certificate for your AD controller. Save a base64-encoded version of the certificate: echo root64.cer | base64 > rootcer.b64 for example will write the data from the PEM-encoded root64.cer file into a base64-encoded file named rootcer.b64
Add the base64-encoded certificate content to the rootCAData key. Be sure to remove the leading “#”. This is an alternative to using the rootCA key, where we’ll have to place the file on each Control Plane node

Update the userSearch values as follows:

key	default	set to	notes
baseDN	ou=people, dc=vmware,dc=com	DN of OU in AD under which user accounts are found	Example: ou=User Accounts,DC=ragazzilab,DC=com
filter	“(objectClass= posixAccount)”	“(objectClass=person)”
username	uid	userPrincipalName
idAttr	uid	DN	Case-sensitive
emailAttr	mail	userPrincipalName
nameAttr	givenName	cn

Update the groupSearch values as follows:

key	default	set to	notes
baseDN	ou=people, dc=vmware,dc=com	DN of OU in AD under which security Groups are found	Example: DC=ragazzilab,DC=com
filter	“(objectClass= posixGroup)”	“(objectClass=group)”
userAttr	uid	DN	Case-Sensitive
groupAttr	memberUid	“member:1.2.840.113556.1.4.1941:”	This is necessary to search within nested groups in AD
nameAttr	cn	cn

Other important Notes
When you create the oidc secret in the workload clusters running Gangway, the clientSecret value is base64-encoded, but the corresponding secret for the workload cluster in the staticClients section of the dex configmMap is decoded. This can be confusing since the decoded value is also randomly-generated.

↧

Use Helm to deploy Harbor with Annotations for Velero

August 14, 2020, 5:17 pm

≫ Next: Retrieving the Admin Password for Harbor Image Registry in Tanzu Kubernetes Grid Service

≪ Previous: Configure Tanzu Kubernetes Grid to use Active Directory

So, lets say you want to deploy an instance of Harbor to your “services” kubernetes cluster. The cluster is protected by a scheduled Velero backup Velero pickup all resources in all namespaces by default, but we need to add an annotation to indicate a persistent volume that should be included in the backup. Without this annotation, Velero will not include the PV in the backup.

First, let’s create a namespace we want to install Harbor to:
kubectl create ns harbor
Then, we’ll make sure helm has the chart for Harbor
helm repo add harbor https://helm.goharbor.io helm repo update
Finally, we’ll install harbor
helm install harbor harbor/harbor --namespace harbor \ --set expose.type=loadBalancer,expose.tls.enabled=true,expose.tls.commonName=harbor.ragazzilab.com,\ externalURL=harbor.ragazzilab.com,harborAdminPassword=harbor,\ redis.podAnnotations."backup\.velero\.io/backup-volumes"=data,\ registry.podAnnotations."backup\.velero\.io/backup-volumes"=registry-data,\ trivy.podAnnotations."backup\.velero\.io/backup-volumes"=data,\ database.podAnnotations."backup\.velero\.io/backup-volumes"=database-data,\ chartmuseum.podAnnotations."backup\.velero\.io/backup-volumes"=chartmuseum-data,\ jobservice.podAnnotations."backup\.velero\.io/backup-volumes"=job-logs

Notice a few of the configurations we’re passing here:

expose.tls.commonName is the value that will be used by the gnerated TLS certificate
externalURL is the FQDN that we’ll use to reach Harbor (post deploy, you’ll get the loadBalancer IP and add the DNS record for it)
harborAdminPassword is the password assigned by default to the admin account – clearly this should be changed immediately
The next items are for the podAnnotations; the syntax was unexpectedly different. Notice there’s a dot instead of an equals-sign between the key and the value. Also notice that the dots in the value must be escaped.

Once Harbor is deployed, you can get the loadBalancer’s IP and point your browser at it.

Now, we can wait for the Velero backup job to run or kick off a one-off backup.

Not So Fast...

I noticed that Harbor did not start properly after restore. This was because postgres in the database pod expects a specific set of permissions – which were apparently different as a result of the restore. The log on the database pod only read FATAL: data directory “/var/lib/postgresql/data” has group or world access

To return Harbor to functionality post-restore, I had to take the following steps:

Edit the database statefulSet: kubectl edit StatefulSet harbor-harbor-database -n harbor
Replace the command in the “change-permission-of-directory” initContainer from chown -R 999:999 /var/lib/postgresql/data to chmod -R 0700 /var/lib/postgresql/data
Save changes and bounce the database pod by running kubectl delete po -n harbor harbor-harbor-database-0
Bounce the remaining pods that are in CrashLoopBackup (because they’re trying to connect to the database)

Thanks to my friend and colleague Hemanth AVS for help with the podAnnotations syntax!

↧

Retrieving the Admin Password for Harbor Image Registry in Tanzu Kubernetes Grid Service

February 11, 2021, 9:22 am

≫ Next: Adding trusted certs to nodes on TKGS 7.0 U2

≪ Previous: Use Helm to deploy Harbor with Annotations for Velero

In TKGS on vSphere 7.0 through (at least) 7.0.1d, a Harbor Image Registry may be enabled for the vSphere Cluster (Under Configure|Namespaces| Image Registry). This feature currently (as of 7.0.1d) requires the Pod Service, which in turn requires NSX-T integration.

As of 7.0.1d, the self-signed certificate created for this instance of Harbor is added to the trust for nodes in TKG clusters, making it easier (possible?) to use images from Harbor.

When you login to harbor as a user, you’ll notice that the menu is very sparse. Only the ‘admin’ account can access the “Administration” menu.

To get logged in as the ‘admin’ account, we’ll need to retrieve the password from a secret for the harbor controller in the Supervisor cluster.

Steps:

SSH into the vCenter Server as root, type ‘shell’ to get to bash shell
Type ‘/usr/lib/vmware-wcp/decryptK8Pwd.py‘ to return information about the Supervisor Cluster. The results include the IP for the cluster as well as the node root password
While still in the SSH session on the vCenter Server, ssh into the Supervisor Custer node by entering ‘ssh root@<IP address from above>’. For the password, enter the PWD value from above.
Now, we have a session as root on a supervisor cluster control plane node.
Enter ‘kubectl get ns‘ to see a list of namespaces in the supervisor cluster. You’ll see a number of hidden, system namespaces in addition to those corresponding to the vSphere namespaces. Notice there is a namespace named “vmware-system-registry” in addition to one named “vmware-system-registry-#######”. The namespace with the number is where Harbor is installed.
Run ‘kubectl get secret -n vmware-system-registry-######‘ to get a list of secrets in the namespace. Locate the secret named “harbor-######-controller-registry”.
Run this to return the decoded admin password: kubectl get secret -n vmware-system-registry-###### harbor-######-controller.data.harborAdminPassword}' | base64 -d | base64 -d
In the cases I seen so far, the password is about 16 characters long, if it’s longer than that, you may not have decoded it entirely. Note that the value must be decoded twice.
Once you’ve saved the password, enter “exit” three times to get out of the ssh sessions.

Notes

Don’t manipulate the authentication settings
The process above is not supported; VMware GSS will not help you complete these steps
Some features may remain disabled (vulnerability scanning for example)
As admin, you may configure registries and replication (although it’s probably unsupported with this built-in version of Harbor for now)

↧

Adding trusted certs to nodes on TKGS 7.0 U2

March 22, 2021, 3:00 pm

≫ Next: Getting Started with VMware Tanzu SQL with MySQL for Kubernetes

≪ Previous: Retrieving the Admin Password for Harbor Image Registry in Tanzu Kubernetes Grid Service

A new feature added to TKGS as of 7.0 Update 2 is support for adding private SSL certificates to the “trust” on TKG cluster nodes.

This is very important as it finally provides a supported mechanism to use on-premises Harbor and other image registries.

It’s done by adding the encoded CAs to the “TkgServiceConfiguration”. The template for the TkgServiceConfiguration looks like this:

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
  name: tkg-service-configuration
spec:
  defaultCNI: antrea
  proxy:
    httpProxy: http://<user>:<pwd>@<ip>:<port>

  trust:
    additionalTrustedCAs:
      - name: first-cert-name
        data: base64-encoded string of a PEM encoded public cert 1
      - name: second-cert-name
        data: base64-encoded string of a PEM encoded public cert 2

Notice that there are two new sections under spec; one for proxy and one for trust. This article is going to focus on trust for additional CAs.

If your registry uses a self-signed cert, you’ll just encode that cert itself. If you take advantage on an Enterprise CA or similar to sign your certs, you’d encoded and import the “signing”, “intermediate” and/or “root” CA.

Example

Let’s add the certificate for a standalone Harbor (not the built-in Harbor instance in TKGS, its certificate is already trusted)

Download the certificate by clicking the “Registry Certificate” link

Run base64 -i <ca file> to return the base64 encoded content:

Provide a simple name and copy and paste the encoded cert into the data value:

Apply the TkgServiceConfiguration

After setting up your file. Apply it to the Supervisor cluster:

kubectl apply -f ./TanzuServiceConfiguration.yaml

Notes

Existing TKG clusters will not automatically inherit the trust for the certificates
Clusters created after the TKGServiceConfiguration is applied will get the certificates
You can scale an existing TKG cluster to trigger a rollout with the certificates
You can verify the certificates exist by connecting through SSH to the nodes and locating the certs under /etc/ssl/certs:

↧

Getting Started with VMware Tanzu SQL with MySQL for Kubernetes

July 1, 2021, 1:37 pm

≫ Next: Configuring VMware Tanzu SQL with MySQL for Kubernetes for High Availability

≪ Previous: Adding trusted certs to nodes on TKGS 7.0 U2

VMware Tanzu SQL with MySQL for Kubernetes is quite a mouthful. For this post, I’ll refer to the product as Tanzu SQL/MySQL. We’re going to deploy it onto an existing Tanzu Kubernetes Grid cluster.

Objectives:

Deploy Tanzu SQL with MySQL on Kubernetes
Use phpMyAdmin to interact with our databases
Secure database instances with TLS

Cluster Setup

Tanzu SQL/MySQL can run on any conformant kubernetes cluster, if you already have one running, you can skip ahead. If, like me, you want to provision a new TKG cluster for Tanzu SQL/MySQL, you’ll want settings like this:

K8s version 1.18 or 1.19 (not v1.20 yet)
Additional volume on /var/lib/containerd for the images
For a test cluster, best-effort small control-plane nodes (3) and best-effort-medium worker nodes (2) is sufficient to start, YMMV.
Install metrics-server and add appropriate PSPs

Get the images and chart

You’ll need to login to pivnet and registry.pivotal.io, accept the EULA for VMware Tanzu SQL with MySQL for Kubernetes.

At a command-line, run:docker login registry.pivotal.io then, provide your credentials. This is so that docker can pull down the images from VMware. Login to your local container registry as well – you’ll need permissions to push images into your project.

In the following commands, replace “{local repo}” with the FQDN for your local registry and “{project}” with the project name in that repo that you can push images to.

docker pull registry.pivotal.io/tanzu-mysql-for-kubernetes/tanzu-mysql-instance:1.0.0
helm chart pull registry.pivotal.io/tanzu-mysql-for-kubernetes/tanzu-mysql-operator-chart:1.0.0
docker pull registry.pivotal.io/tanzu-mysql-for-kubernetes/tanzu-mysql-operator:1.0.0
docker tag registry.pivotal.io/tanzu-mysql-for-kubernetes/tanzu-mysql-instance:1.0.0 {local repo}/{project}/tanzu-mysql-instance:1.0.0
docker tag registry.pivotal.io/tanzu-mysql-for-kubernetes/tanzu-mysql-operator:1.0.0 {local repo}/{project}/tanzu-mysql-operator:1.0.0
docker push {local repo}/{project}/tanzu-mysql-instance:1.0.0
docker push {local repo}/{project}/tanzu-mysql-operator:1.0.0

Retrieve the helm chart:

export HELM_EXPERIMENTAL_OCI=1 helm chart pull registry.pivotal.io/tanzu-mysql-for-kubernetes/tanzu-mysql-operator-chart:1.0.0

In the tanzu-sql-with-mysql-operator folder, copy values.yaml to values-override.yaml. Edit the keys with the correct values (we haven’t created the harbor secret yet, but we’ll name it the value you provide here). Here’s an example:


imagePullSecret: harbor
operatorImage: {local repo}/{project}/tanzu-mysql-operator:1.0.0"
instanceImage: {local repo}/{project}/tanzu-mysql-instance:1.0.0"
resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi

Deploy Operator

We’ll want to create namespace, a docker-registry secret (named harbor in the example below) and then install the chart.

kubectl create namespace tanzu-mysql
kubectl --namespace tanzu-mysql create secret docker-registry harbor --docker-server=https://{local repo} --docker-username=MYUSERNAME --docker-password=MYPASSWORD
helm install --namespace tanzu-mysql --values=./tanzu-sql-with-mysql-operator/values-override.yaml tanzu-mysql-operator ./tanzu-sql-with-mysql-operator/

Let’s check that the pods are running by running kubectl get po -n tanzu-mysql

Before Creating an Instance…

We’ll need to create a namespace to put our mysql instances, a secret in that namespace in order to pull the images from our local repo, and a way to create TLS certificates and phpMyAdmin. These commands will create the namespace, create the docker-registry secret and install cert-manager:

kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v1.0.2 --set installCRDs=true
kubectl create namespace tanzu-mysql
kubectl --namespace mysql-instances create secret docker-registry harbor --docker-server=https://<local repo> --docker-username=<username> --docker-password=<password>

Working with cert-manager

Cert-manager uses issuers to create certificates from cert-requests. There are a variety of issuers supported, but we must have the ca certificate included in the resulting certificate secret – something not all issuers do. For example, self-signed and ACME are not suitable as they do not appear to include the ca certificate in the cert secret. Luckily, the CA issuer works fine and can use a self-signed issuer as its own signer. Save the following as a yaml file to create a self-signed issuer, root cert and a CA issuer and apply it with kubectl -n mysql-instances -f cabootstrap.yaml

---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-selfsigned-ca
spec:
  isCA: true
  commonName: my-selfsigned-ca
  secretName: root-secret
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: ca-issuer
spec:
  ca:
    secretName: root-secret

Save the following as cert.yaml and apply it with kubectl -n mysql-instances -f cert.yaml to create a certificate for our instance. Adjust the names to match your environment of course. Notice the issuerRef.name is ca-issuer

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: mysql-tls-secret
spec:
  # Secret names are always required.
  secretName: mysql-tls-secret
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  subject:
    organizations:
    - ragazzilab.com
  # The use of the common name field has been deprecated since 2000 and is
  # discouraged from being used.
  commonName: mysql-tls.mydomain.local
  dnsNames:
  - mysql-tls.mydomain.local
  - mysql-tls
  - mysql-tls.mysql-instances.svc.cluster.local
  # Issuer references are always required.
  issuerRef:
    name: ca-issuer
    # We can reference ClusterIssuers by changing the kind here.
    # The default value is Issuer (i.e. a locally namespaced Issuer)
    kind: Issuer
    # This is optional since cert-manager will default to this value however
    # if you are using an external issuer, change this to that issuer group.
    group: cert-manager.io

Confirm that the corresponding secret contains three files: ca.crt, tls.crt, tls.key by using kubectl describe secret -n mysql-instances mysql-tls-secret

Create an instance and add a user

Here is an example yaml for a MySQL instance. This will create an instance name mysql-tls, using the docker-registry secret named harbor we created earlier and the certificate secret we created above named mysql-tls-secret and use a LoadBalancer IP so we can access it from outside of the cluster.

apiVersion: with.sql.tanzu.vmware.com/v1
kind: MySQL
metadata:
  name: mysql-tls
spec:
  storageSize: 2Gi
  imagePullSecret: harbor

#### Set the storage class name to change storage class of the PVC associated with this resource
  storageClassName: tanzu

#### Set the type of Service used to provide access to the MySQL database.
  serviceType: LoadBalancer # Defaults to ClusterIP

### Set the name of the Secret used for TLS
  tls:
    secret:
      name: mysql-tls-secret

Apply this yaml to the mysql-instances namespace to create the instance: kubectl apply -n mysql-instances -f ./mysqlexample.yaml

Watch for two containers in the pod to be ready

Watch for the mysql-tls-0 pod to be running with 2 containers. When the instance is created, the operator also creates a secret containing the root password. Retrieve the root password with this command: kubectl get secret -n mysql-instances mysql-tls-credentials -o jsonpath='{.data.rootPassword}' | base64 -D
Retrieve the load-balancer address for the MySQL instance with this command: kubectl get svc -n mysql-instances mysql-tls

Login to Pod and run mysql commands

Run this to get into a command prompt on the mysql pod: kubectl -n mysql-instances exec --stdin --tty pod/mysql-tls-0 -c mysql -- /bin/bash
Once in the pod and at a prompt, run this to get into the mysql cli as root: mysql -uroot -p<root password>
Once at the mysql prompt, run this to create a user named “admin” with a password set to “password” (PLEASE use a different password!)

  CREATE USER 'admin'@'%' IDENTIFIED BY 'password';
  GRANT ALL PRIVILEGES ON * . * TO 'admin'@'%';
  FLUSH PRIVILEGES;

Type exit twice to get out of mysql and the pod.

Ok, so now, we have a running instance of mysql and we’ve created a user account that can manage it (cannot login remotely as root).

Deploy, Configure and use phpMyAdmin

There are several ways to do this, but I’m going to go with kubeapps to deploy phpMyAdmin. Run this to install kubeapps with a loadbalancer front-end:

helm repo add bitnami https://charts.bitnami.com/bitnami kubectl create namespace kubeapps helm install kubeapps --namespace kubeapps bitnami/kubeapps --set frontend.service.type=LoadBalancer

Find the External IP address for kubeapps and point a browser at it: kubectl get svc -n kubeapps kubeapps. Get the token from your .kube/config file to paste into the token field in kubeapps and click submit. Once in kubeapps, be sure to select the kubeapps namespace – you should see kubeapps itself under Applications.

logged into kubeapps in the kubeapps namespace

Click “Catalog” and type “phpmyadmin” into the search field. Click on the phpmyadmin box that results. On the next page, describing phpmyadmin, click the Blue deploy button.

Now, you should be looking at a configuration yaml for phpmyadmin. First, set the Name up top to something meaningful, like phpmyadmin, the scroll down to line 256, you should see the service type currently set to ClusterIP, replace ClusterIP with LoadBalancer.

Then scroll the rest of the way to click the blue “Deploy X.Y.Z” button and hang tight. After it deploys, the Access URLs will show the IP address for phpMyAdmin.

Access URLs for phpmyadmin after deployment

Click the Access URL to get to the Login page for phpMyAdmin and supply the IP Address of the mysql instance as well as the admin username and password we created above, then click Go.

Login to instance IP with the account we made earlier

Now you should be able to manage the databases and objects in the mysql instance!

Notes

Kubernetes v1.20. May work fine, but the recent changes to the filesystem permissions in the TKG image prevented me using it today (July 1, 2021)
You don’t have to use cert-manager if you have another source for TLS certificates, just put the leaf cert, private key and ca cert into the secret referenced by the mysql instance yaml.
Looks like you can reuse the TLS cert for multiple databases, just keep in mind that if you connect using a name/fqdn that is not in the cert’s dnsNames, you may be a cert error.
This example uses Tanzu Kubernetes Grid Service in vSphere with Tanzu on vSphere 7 Update 2 using NSX-ALB.

↧

Configuring VMware Tanzu SQL with MySQL for Kubernetes for High Availability

July 7, 2021, 12:43 pm

≫ Next: Configuring Backup in Tanzu SQL with MySQL for Kubernetes

≪ Previous: Getting Started with VMware Tanzu SQL with MySQL for Kubernetes

As a follow up to the getting started post, let’s touch on what it takes to configure a MySQL instance for High Availability in Tanzu SQL/MySQL

Why this is important

In kubernetes, pods are generally treated as transient and ephemeral, they can be restarted quickly and are often stateless. This is certainly not the case with databases. We need to make sure our databases remain online and usable. MySQL itself provides a means to do High Availability with multiple instances and synchronization; we’ll be leveraging this capability today.

High Availability Architecture

Blatantly ripped off from the official docs

Unlike our stand-alone instance when create an instance with HA enabled, the operator creates FIVE pods and two services for us.

You’ll notice that the mysql-ha LoadBalancer uses the proxy pods as its endpoints and the mysql-ha-members uses the database pods themselves.

Create an HA instance

In this example, I’m going to reuse the “harbor” docker-registry secret we created originally, but we’ll want a new tls certificate for this instance.

Create the TLS certificate

Just like previously, save the following as cert-ha.yaml and apply it with kubectl -n mysql-instances -f cert-ha.yaml to create a certificate for our instance. Adjust the names to match your environment of course. Notice the issuerRef.name is ca-issuer

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: mysql-ha-secret
spec:
  # Secret names are always required.
  secretName: mysql-ha-secret
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  subject:
    organizations:
    - ragazzilab.com
  # The use of the common name field has been deprecated since 2000 and is
  # discouraged from being used.
  commonName: mysql-ha.ragazzilab.com
  dnsNames:
  - mysql-ha.ragazzilab.com
  - mysql-ha
  - mysql-ha.mysql-instances.svc.cluster.local
  # Issuer references are always required.
  issuerRef:
    name: ca-issuer
    # We can reference ClusterIssuers by changing the kind here.
    # The default value is Issuer (i.e. a locally namespaced Issuer)
    kind: Issuer
    # This is optional since cert-manager will default to this value however
    # if you are using an external issuer, change this to that issuer group.
    group: cert-manager.io

Create the instance

The only differences are highAvailability.enabled:true and the name of the certificate secret

apiVersion: with.sql.tanzu.vmware.com/v1
kind: MySQL
metadata:
  name: mysql-ha
spec:
  storageSize: 2Gi
  imagePullSecret: harbor
#### Set highAvailability.enabled:true to create three pods; one primary and two standby, plus two proxy pods
  highAvailability:
    enabled: true

#### Set the storage class name to change storage class of the PVC associated with this resource
  storageClassName: tanzu

#### Set the type of Service used to provide access to the MySQL database.
  serviceType: LoadBalancer # Defaults to ClusterIP

### Set the name of the Secret used for TLS
  tls:
    secret:
      name: mysql-ha-secret

Apply this as usual: kubectl apply -n mysql-instances -f ./mysql-ha.yaml

Create a database user

The steps to create the database user in an HA instance are just like those for the standalone instance once we determine which Pod is the primary/active and writable one. I was unable to make the one-liner method in the official docs work, so here’s what I did instead.

Get the MySQL root password: kubectl get secret -n mysql-instances mysql-ha-credentials -o jsonpath='{.data.rootPassword}' | base64 -D
Get a shell on the mysql-ha-0 pod: kubectl -n mysql-instances exec --stdin --tty pod/mysql-ha-0 -c mysql -- /bin/bash
Get into the mysql cli: mysql -uroot -p<root password>
Identify the Primary member: SELECT MEMBER_HOST, MEMBER_ROLE FROM performance_schema.replication_group_members;
If the primary node is mysql-ha-0 (the one we’re on), proceed to the next step. If it is not, go back to step step 2 to get a shell on the pod that is primary.
Now, we should be on the mysql cli on the primary pod/member. Just like with the standalone instance, let’s create a user:

CREATE USER 'admin'@'%' IDENTIFIED BY 'password';
  GRANT ALL PRIVILEGES ON * . * TO 'admin'@'%';
  FLUSH PRIVILEGES;

Type exit twice to get out of mysql and the pod.

Ok, so now, we have a running instance of mysql and we’ve created a user account that can manage it (cannot login remotely as root). We can connect phpMyAdmin to the instance using the admin credentials:

Showing the three members of the instance

↧

Configuring Backup in Tanzu SQL with MySQL for Kubernetes

July 8, 2021, 10:50 am

≪ Previous: Configuring VMware Tanzu SQL with MySQL for Kubernetes for High Availability

Backup & Restore

Prerequisite: A reachable S3 endpoint. Can be local or remote, but the pods must be able to resolve its name or IP. Create or select and existing bucket for your database backups. In this case, I have a minio instance running on-prem with a bucket named backup-mysql.

Create a secret for the S3 endpoint credentials. This account will need to be able to write to the database backup bucket. Here’s an example:

---
apiVersion: v1
kind: Secret
metadata:
  name: minio-creds
stringData:
  # S3 Credentials
  accessKeyId: "MYACCESSKEY"
  secretAccessKey: "MYSECRETKEY"

Create a TanzuMySQLBackupLocation. In the example below, we’re not using SSL with the minio endpoint, so I’m explicitly using port 80. More examples and details are found here. I like to keep the backups organized, so I’ll create a backup location for each instance and specify an bucketPath for each.

---
apiVersion: with.sql.tanzu.vmware.com/v1
kind: MySQLBackupLocation
metadata:
  name: backuplocation-mysql-ha
spec:
  storage:
    # For S3 or Minio:
    s3:
      bucket: "backup-mysql-ha"
      bucketPath: "/mysql-ha/"
      # region: "us-east-1"
      endpoint:  "http://minio.ragazzilab.com:80" # optional, default to AWS
      forcePathStyle: true
      secret:
        name: minio-creds

Test with a one-off backup. Create and apply a yaml like the following to request a backup without a schedule. Here’s an example yaml for a one-off backup for the mysql-ha instance to its corresponding backup location:

---
apiVersion: with.sql.tanzu.vmware.com/v1
kind: MySQLBackup
metadata:
  name: backup-mysql-ha-1off
spec:
  location:
    name: backuplocation-mysql-ha
  instance:
    name: mysql-ha

We can get the MySQLBackups to see that it has completed successfully:

Create a backup Schedule

Now that we’ve confirmed that the backup location and credentials work as expected, we should add a backup schedule. Here’s an example:

---
apiVersion: with.sql.tanzu.vmware.com/v1
kind: MySQLBackupSchedule
metadata:
  name: mysql-ha-daily
spec:
  backupTemplate:
    spec:
      location:
        name: backuplocation-mysql-ha
      instance:
        name:  mysql-ha
  schedule: "@daily"

Apply this kubectl apply -n mysql-instances -f backupschedule-mysql-ha-daily.yaml

I found that (unlike Velero), when applying the MySQLBackupSchedule, a backup does not immediately begin. At the scheduled time however, a pod for the backup schedule will be created to run the backup job. This pod will remain intact to run subsequent backup jobs.

Lastly, regarding backups, keep in mind that the backup data on the S3 endpoint never expires, the backups will remain there until removed manually. This may be important if you have limited capacity.

Restore/Recover

From the docs:

MySQLRestores always restores to a new MySQL instance to avoid overwriting any data on an existing MySQL instance. The MySQL instance is created automatically when the restore is triggered. Tanzu MySQL for Kubernetes does not allow you to restore a backup to an existing MySQL instance. Although you can perform this manually by copying the MySQL data from the backup artifact onto an existing MySQL instance, VMware strongly discourages you from doing this because you might overwrite existing data on the MySQL instance.

So, we should not expect to restore directly to a running database instance. If we need to recover, we’ll create a new instance and restore the backup to it.

To create a restore, we’ll need the name of the MySQLBackup object to restore from and a name of a database to create from that backup as part of the restore. We’ll put that into a yaml like the one below. Notice that we provide a spec for a new database, I wanted a loadbalancer for it although we are able to repoint the existing loadbalancer to the new proxy nodes (for ha) or the new database node (for standalone)

---
apiVersion: with.sql.tanzu.vmware.com/v1
kind: MySQLRestore
metadata:
  name: restore-ha
spec:
  backup:
    name: mysql-ha-daily-20210708-000005
  instanceTemplate:
    metadata:
      name: restored-mysql-database
    spec:
      storageSize: 2Gi
      imagePullSecret: harbor
      serviceType: LoadBalancer
      highAvailability:
        enabled: true

Apply the yaml to create the restore kubectl apply -n mysql-instances -f ./restore-ha.yamlYou should see a new database pending and a MySQLRestore object running:

Restore job succeeded and there is a new mysql instance

Now, the choice if yours to copy data from the restored database back to the original or to point the applications to the new database or to point the loadbalancer at the new database.

If you choose to repoint the existing load-balancer to the new database, here’s an example how to do that:

kubectl patch service -n mysql-instances mysql-ha -p '{"spec":{"selector":{"app.kubernetes.io/instance": "restored-mysql-database"}}}'

↧