Katonic MLOps Platform on Azure
This guide describes how to install, operate, administer, and configure the Katonic Platform in your own Azure Kubernetes cluster. This content is applicable to Katonic users with self-installation licenses.
Hardware Configurationsβ
This configuration is designed to offer high availability (HA) or performance testing. It is designed to achieve superior performance that enables real-time execution of analytics, machine learning (ML), and artificial intelligence (AI) applications in a production pipeline.
Katonic on Azureβ
Katonic can run on a Kubernetes cluster provided by Azure Kubernetes Service. When running on AKS, the Katonic architecture uses Azure resources to fulfil the Katonic MLOps platform requirements as follows:
Runtime platform:β
A: AKS cluster deployed in 2 Availability Zones (AZ), versions 1.28.5, Node/instances: Virtual Machine Scale Set.
B: Platform nodes: Node pool (min 2) Standard_DS3_v2
C: Compute nodes: Node pool (Variable) Standard_D8s_v3
D: GPU compute nodes: Nodepool (Variable) Standard_NC6s_v3
Storage:β
A: Shared filesystem and datasets: Azure Storage Account
B: Backups: Azure Storage Account
C: Environment and model image: Azure Container Registry
Networking:β
A: Ingress Load Balancer: Standard SKU Azure Load Balancer
B: Cluster network: Azure Virtual Network with a subnet with 65536 IP addresses (/16 subnet mask).
When running on AKS, the Katonic uses Azure resources to fulfil the cluster requirements as follows:β
Kubernetes control is handled by the AKS control plane with managed Kubernetes masters
The AKS clusterβs node pool which is labeled katonic.ai/node-pool=platform is configured to host the Katonic platform
Additional AKS node pools provide compute(labelled katonic.ai/node-pool=compute) and GPU(labelled katonic.ai/node-pool=gpu) nodes for user workloads
An Azure storage account stores Katonic blob data and datasets
The kubernetes.io/azure-disk provisioner is used to create persistent volumes for Katonic executions
Ingress to the Katonic application is handled by an SSL-terminating Application Gateway that points to a Kubernetes load balancer
Setting up an AKS cluster for the Katonic Platformβ
This section describes how to configure an Azure AKS cluster for use with Katonic. When configuring an AKS cluster for Katonic, you must be familiar with the following Azure services:
- Azure Kubernetes Service (AKS)
- Virtual Networking (Vnet)
- Virtual Machines and Disks
- Azure File System storage
- Azure Blob Storage
Additionally, a basic understanding of Kubernetes concepts like node pools, network CNI, storage classes, autoscaling, and Docker will be useful when deploying the cluster.
Service quotasβ
Azure maintains default service quotas for each of the services listed above. You can check the default service quotas and manage your quotas by logging in to the Azure Service Quotas console.
Create Azure Kubernetes Service(AKS)β
By default Katonic installer create AKS. If you are going to create AKS then first create new separate resource group and create AKS cluster in that resource group.
Dynamic block storageβ
AKS clusters come equipped with several kubernetes.io/azure-disk backed storage classes by default. Katonic recommends the use of Standard SSD disks for better input and output performance. The Standard SSD-based storage class(kfs) is created by default by the katonic installer.
If you creating a cluster by yourself then you need to create a kfs named storage class. To create a storage class use the following YAML.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: kfs
parameters:
skuname: StandardSSD_LRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Note: Make sure only kfs storage class is default. Remove other storage class from default.
Dynamic shared storageβ
AKS clusters come equipped with an Azure file storage class by default. Katonic recommends the use of that Azure file system storage class for better input and output performance.
Katonic Installer has an optional parameter Shared Storage.create to create a kfs-shared Storage class based on the Azure file system for the katonic platform.
If you are creating a cluster by yourself and you want to use shared storage then you need to create an Azure file system-based storage class. Use the following YAML to create it.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kfs
parameters:
skuname: StandardSSD_LRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Domainβ
Katonic must be configured to serve from a specific FQDN. To serve Katonic securely over HTTPS, you will also need an SSL certificate that covers the chosen name. Record the FQDN for use when installing Katonic.
Katonic offers the default option to use the .katonic.ai domain in all versions of the Katonic Platform. However, if you have your own domain, you can also utilize it across all versions provided by the Katonic Platform.
Resources Provisioned Post-Installationβ
When the platform is installed, the following resources are created. Take this into account when selecting your installation configuration.
SR NO. | TYPE | AMOUNT | WHEN | NOTES |
---|---|---|---|---|
1 | Network interface | 1 per node | Always | |
2 | OS boot disk (Azure managed disk) | 1 per node | Always | |
3 | Public IP address | 1 per node | The platform has public IP addresses. | |
4 | VNet | 1 | The platform is deployed to a new VNet. | |
5 | Network security group | 1 | Always | See Network Security Groups Configuration (Azure). |
6 | AKS Cluster | 1 | When AKS is used as the application cluster | Version 1.28.5 |
7 | Azure File System | 1 | When you enable shared storage while installing Katonic platform. |
Kubernetes(AKS) versionβ
Katonic platform 4.5 version has been validated with Kubernetes(AKS) version 1.28.5.5 and above.
Network pluginβ
Katonic relies on Kubernetes network policies to manage secure communication between pods in the cluster. Network policies are implemented by the network plugin, so your cluster uses a networking solution that supports NetworkPolicy, such as Calico.
You must ensure the subnets you use for your cluster have CIDR ranges of sufficient size, as every deployed pod in the cluster will be assigned an elastic network interface and consume a subnet address. Katonic recommends at least a /23 CIDR for the cluster.
The Katonic-hosting cluster should use the default network plugin created when AKS is deployed.
Data Visualisationβ
Katonic MLOps platform 4.5 include Superset Version 2.0.1 for Data Visualization.
You require an additional DNS if you're installing Superset.
Example:
- If your domain name to access platform is katonic.tesla.com.
- Then, the domain for data visualisation would look like dash-katonic.tesla.com.
Connectorsβ
Katonic MLOps platform 4.5 include Airbyte Version 0.40.32 for Connectors.
You require an additional DNS if you're installing Airbyte.
Example:
- If your domain name to access platform is katonic.tesla.com.
- Then, the domain for connectors would look like connectors-katonic.tesla.com.
Katonic Platform Installationβ
Installation of the Katonic platform has been segmented based on product. When you click the link, you will be redirected to the installation process documentation.