Katonic Adaptive Studio

Node pool requirements

The GKE cluster can be configured as either a single-node cluster or a multi-node cluster, depending on the user's needs:

Multi-Node GKE Cluster:

The GKE cluster must have at least four node pools with the following specifications and distinct node labels:

SR NO.	POOL	MIN-MAX	VM	LABELS	TAINTS
1	Platform	1-4 (With HA) 2-4 (Without HA)	c2-standard-4	katonic.ai/node-pool=platform	katonic.ai /node-pool=platform:NoSchedule
2	Compute	1-10	c2-standard-8	katonic.ai/node-pool=compute
3	Deployment	1-10	c2-standard-8	katonc.ai/node-pool=deployment	katonic.ai/node-pool=deployment:NoSchedule
4	Vectordb	1-4	c2-standard-4	katonic.ai/node-pool=vectordb	katonic.ai/node-pool=vectordb:NoSchedule
5	GPU (Optional)	0-5	Required VM type	katonic.ai/gpu={GPU-type}	katonic.ai/gpu={GPU-type}:NoSchedule

Note: For example we can use GPU type as v100, A30, A100

Note: When backup_enabled = True, then compute_nodes.min_count should be set to 2.

GCP Platform-Node Specifications

Platform nodes in platform GCP cloud deployments must fulfil the following hardware specification requirements according to the deployment type:

Component	Specification
Node count	min 2
Instance type	c2-standard-4
vCPUs	4
Memory	16 GB
Boot disk size	128 GB

GCP Compute-Node Specifications

Instance types that must be used by compute nodes in GCP cloud installations on the Katonic platform include:

Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. For specification details for each type, refer to the GCP documentation.

Note: Supported compute node configurations

c2-standard-8
c2-standard-16
c2-standard-32
Boot Disk: Min 128GB

GCP Deployment-Node Specifications

Instance types that must be used by deployment nodes in GCP cloud installations on the Katonic platform include:

Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum deployment node for the community version. For specification details for each type, refer to the GCP documentation.

Note: Supported deployment node configurations

c2-standard-8
c2-standard-16
c2-standard-32
Boot Disk: Min 128GB

GCP Vectordb-Node Specifications

Vectordb nodes in GCP cloud deployments must fulfil the following hardware specification requirements:

COMPONENT	SPECIFICATION
Instance type	c2-standard-4

GCP GPU-Node Specifications

As of now, the GPU node pool is supported by Katonic-installer version 5.0.

Choose the instance type that best fits your requirements. Google Kubernetes Engine (GKE) is also supported for application nodes in the GKS (Google Kubernetes Service) platform, utilizing the instance types provided by Google Cloud. For specification details for each type, refer to the GCP documentation.

Note: Supported gpu node configurations

Boot disk size = Min 512GB
Label = katonic.ai/gpu={GPU-type}
Taints = katonic.ai/gpu={GPU-type}:NoSchedule

Note: For example we can use GPU type as v100, A30, A100

Additional node pools with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.

Katonic Platform Installation

General completion time: 45 minute

Installation process

The Katonic platform runs on Kubernetes. To simplify the deployment and configuration of Katonic services, Katonic provides an install automation tool called the Katonic-installer that will deploy Katonic into your compatible cluster. The Katonic-installer is an ansible role delivered in a Docker container and can be run locally.

Prerequisites

To install and configure Katonic in your GCP account you must have:

quay.io credentials from Katonic
GCP with enough quota to create:
- At least 2 c2-standard-4 machines for platform nodes and at least 1 c2-standard types EC2 machine for compute nodes
A Linux operating system (Ubuntu/Debian) based machine with the following Steps:
a. A Linux operating system (Ubuntu/Debian) based machine needs 4GB RAM and 2vcpus and The boot disk size should be 50GB.
b. While creating VM select the service account (Katonic) in the Identity and API access section. Skip to step c if you already have the machine with the given specifications.
Note: After the platform is deployed successfully, the VM can be deleted.
c. Switch to the root user inside the machine.
d. gcloud CLI must be installed and logged in to your GCP project and service account using the gcloud init command.
Commands for installing gcloud CLI:
```
apt-get install snapd -y
snap install google-cloud-cli --classic
```
Commands to login using gcloud CLI:
```
gcloud init
gcloud auth application-default login
```

The following tools should be installed:
- Kubectl
- Docker

To install Katonic Platform Adaptive Studio version follow the steps mentioned below:

1. Log in to Quay with the credentials described in the requirements section above.

docker login quay.io

2. Retrieve the Katonic installer image from Quay.

docker pull quay.io/katonic/katonic-installer:v6.0.0

3. Create a directory.

mkdir katonic
cd katonic

4. Adding PEM Encoded Public Key Certificate and Private Key to Directory

Put the PEM encoded public key certificate (having extension .crt) for your domain and private key associated with the given certificate (having extension .key) inside the current directory (katonic).

5. The Katonic Installer can deploy the Katonic Platform Adaptive Studio version in two ways:

Creating GKE and deploying the Katonic Platform Adaptive Studio version.
Install Katonic Platform Adaptive Studio version on existing GKE.

1. Creating Private GKE and deploying the Katonic Platform Adaptive Studio version

Multi-Node GKE Cluster

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v6.0.0 init gcp katonic_adaptive_studio multi_node deploy_kubernetes private

Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference

PARAMETER	DESCRIPTION	VALUE
katonic_platform_version	It has the value by default regarding the Katonic Platform Version.	katonic_adaptive_studio
deploy_on	Cluster to be deployed on	GCP
create_k8s_cluster	Must be set to True	True
private_cluster	Set "True" when opting for private cluster	False
control_plane_authorized_networks:	List of allowed IP ranges (CIDR) for control plane access.
enable_exposing_genai_applications_to_internet	set "True" if opting for exposing genai applications to internet	False
public_domain_for_genai_applications	Public FQDN of domain for genai applications that will be exposed to the internet	(eg. public-chatbots.google.com)
vpc_name	Enter the name of VPC created for Private Cluster
subnet_name	Enter the name of subnet created for Private Cluster
internal_loadbalancer	Set "True" when opting for internal loadbalancer	False
gke_k8s_version	GKE version	eg. 1.30.4(1.27 and above versions supported)
cluster_name	Cluster name to be	eg. katonic-adaptive_studio-platform-v6-0-0
gcp_region	GCP region name	eg. us-east1
gcp_project_id	Set your GCP project ID	eg. ardent-timm-1000678
service_account_id	Set created service account email ID	eg. katonic-main@ardent-timm-1000678.iam.gserviceaccount.com
zone_1		eg. us-east1-b
zone_2		eg. us-east2-c
compute_nodes.instance_type	Compute node VM size	eg. c2-standard-8
compute_nodes.min_count	Minimum number of compute nodes should not be less than 1	eg. 1
compute_nodes.max_count	Maximum number of compute should be greater than compute nodes min count nodes.	eg. 3
compute_nodes.os_disk_size	Compute Nodes OS Disk Size	eg. 128 GB
vectordb_nodes.instance_type	Vectordb Node VM size	eg. c2-standard-4
vectordb_nodes.min_count	Minimum number of Vectordb nodes should be 1	eg. 1
vectordb_nodes.max_count	Maximum number of Vectordb should be greater than Deployment nodes min count nodes.	eg. 4
vectordb_nodes.os_disk_size	Vectordb Nodes OS Disk Size	eg. 128 GB
gpu_enabled	add GPU nodepool	True or False
gpu_nodes.instance_type	GPU node VM size	eg n1-standard-1
gpu_nodes.gpu_machine_type	Type of machine you need	eg nvidia-tesla-p4
gpu_nodes.gpu_type	Enter the GPU type available on the machine	eg. v100,k80
gpu_nodes.gpu_count		eg 2
gpu_nodes.min_count	Minimum number of GPU nodes	eg. 1
gpu_nodes.max_count	Maximum number of GPU nodes	eg. 2
gpu_nodes.os_disk_size	Enter GPU nodes OS disk size	eg 512 GB
gpu_nodes.gpu_vRAM	Enter GPU node RAM size
gpu_nodes.gpus_per_node	Enter the number of GPUs per node
enable_gpu_workspace	Set it true if you want to use GPU Workspace	True or False
shared_storage_create	Set it True if you want to have shared storage	True or False
genai_nfs_size	nfs storage size for genai	100Gi
milvus_size	Set Milvus Size	128Gi
mlflow_postgres_size	Set Mlflow Postgres Size	8Gi
workspace_timeout_interval	Set the Timeout Interval in Hours	eg. "12"
backup_enabled	Enable backup	True or False
backup_schedule	Backup schedule	0 0 1 * *
backup_expiration	Backup expiration	2160h0m0s
use_custom_domain	Set this to True if you want to host the Katonic platform on your custom domain. Skip if use_katonic_domain: True	True or False
custom_domain_name	Expected a valid domain	eg. katonic.tesla.com
use_katonic_domain	Set this to True if you want to host the Katonic platform on the Katonic Deploy Platform domain. Skip if use_custom_domain: True	True or False
katonic_domain_prefix	One word expected with no special characters and all small alphabets	eg. tesla
enable_pre_checks	Set this to True if you want to perform the Pre-checks	True / False
AD_Group_Management	Set "True" to enable functionality that provides you ability to sign in using Azure AD	False
AD_CLIENT_ID	Client ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRET	Client Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URL	Authorization URL endpoint of app registered for SSO.
AD_TOKEN_URL	Token URL endpoint of app registered for SSO.
quay_username	Username for quay
quay_password	Password for quay
adminUsername	Email for admin user	eg. john@katonic.ai
adminPassword	Password for admin user	At least 1 special character, at least 1 uppercase letter, at least 1 lowercase letter, minimum 8 characters
adminFirstName	Admin first name	eg. john
adminLastName	Admin last name	eg. musk

Installing the Katonic Platform Adaptive Studio version

After configuring the katonic.yml file, run the following command to install the Katonic Platform Adaptive Studio version:

docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v6.0.0

2. Deploying Katonic Platform Adaptive Studio version on existing Private GKE

The steps are similar to Installing the Katonic Platform with GCP Google Kubernetes Engine. Just edit the configuration file with all the details about the target cluster, storage systems, and hosting domain. Read the following configuration reference, these are the only parameters required when installing the Katonic MLOps platform on existing GKE.

Prerequisites

You will need to create a kfs named storage class. Please refer to the main documentation of GCP → Dynamic Block Storage for instructions on how to create the storage class.

Multi-Node GKE Cluster

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v6.0.0 init gcp katonic_adaptive_studio multi_node  kubernetes_already_exists private

The configuration template includes the following parameters:

PARAMETER	DESCRIPTION	VALUE
katonic_platform_version	It has the value by default regarding the Katonic Platform Version.	katonic_adaptive_studio
deploy_on	Cluster to be deployed on	GCP
single_node_cluster	set "True" if opting for single node cluster	True or False
private_cluster	Set "True" when opting for private cluster	False
control_plane_authorized_networks	List of allowed IP ranges (CIDR) for control plane access.
enable_exposing_genai_applications_to_internet	set "True" if opting for exposing genai applications to internet	False
public_domain_for_genai_applications	Public FQDN of domain for genai applications that will be exposed to the internet	(eg. public-chatbots.google.com)
internal_loadbalancer	Set "True" when opting for internal loadbalancer	False
cluster_name	Enter cluster name that you deploy	eg katonic-adaptive_studio-platform-v6-0-0
gcp_region	GCP region name	eg. us-east1
gcp_project_id	Set your GCP project ID	eg. ardent-timm-1000678
workspace_timeout_interval	Set the Timeout Interval in Hours	eg. "12"
use_custom_domain	Set this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: True	True or False
custom_domain_name	Expected a valid domain.	eg. katonic.tesla.com
use_katonic_domain	Set this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: True	True or False
katonic_domain_prefix	One word expected with no special characters and all small alphabets	eg. tesla
enable_pre_checks	Set this to True if you want to perform the Pre-checks	True / False
genai_nfs_size	nfs storage size for genai	100Gi
milvus_size	Set Milvus Size	128Gi
mlflow_postgres_size	Set Mlflow Postgres Size	8Gi
AD_Group_Management	Set "True" to enable functionality that provides you ability to sign in using Azure AD	False
AD_CLIENT_ID	Client ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRET	Client Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URL	Authorization URL endpoint of app registered for SSO.
AD_TOKEN_URL	Token URL endpoint of app registered for SSO.
quay_username	Username of quay
quay_password	Password of quay
adminUsername	email for admin user	eg. john@katonic.ai
adminPassword	password for admin user	at least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
adminFirstName	Admin first name	eg. john
adminLastName	Admin last name	eg. musk

Installing Katonic Platform Adaptive Studio version

After configuring the katonic.yml file, run the following command to install the Katonic Platform Adaptive Studio version:

docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v6.0.0

Installation Verification

The installation process can take up to one hour to complete fully. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are running or not.

kubectl get pods --all-namespace

This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running:

kubectl logs $POD_NAME --namespace $NAMESPACE_NAME

If the installation completes successfully, you should see a message that says:

TASK [platform-deployment : Credentials to access Katonic Adaptive Studio Platform] *******************************ok: [localhost] => {
    "msg": [
        "Platform Domain: $domain_name",
        "Username: $adminUsername",
        "Password: $adminPassword"
    ]
}

However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.

Test and troubleshoot

To verify the successful installation of Katonic, perform the following tests:

If you encounter a 500 or 502 error, take access of your cluster and execute the following command:
```
kubectl rollout restart deploy nodelog-deploy -n application
```
Login to the Katonic application and ensure that all the navigation panel options are operational. If this test fails, please verify that Keycloak was set up properly.
Create a new project and launch a Jupyter/JupyterLab workspace. If this test fails, please check that the default environment images have been loaded in the cluster.
Publish an app with Flask or Shiny. If this test fails, please verify that the environment images have Flask and Shiny installed.

Deleting the Katonic platform from GCP

When you start the installation, in your current directory, you will get the platform deletion script. you just need to run the script.

./gcp-cluster-delete.sh

Node pool requirements​

Multi-Node GKE Cluster:​

GCP Platform-Node Specifications​

GCP Compute-Node Specifications​

GCP Deployment-Node Specifications​

GCP Vectordb-Node Specifications​

GCP GPU-Node Specifications​

Katonic Platform Installation​

Installation process​

Prerequisites​

To install Katonic Platform Adaptive Studio version follow the steps mentioned below:​

1. Log in to Quay with the credentials described in the requirements section above.​

2. Retrieve the Katonic installer image from Quay.​

3. Create a directory.​

4. Adding PEM Encoded Public Key Certificate and Private Key to Directory​

5. The Katonic Installer can deploy the Katonic Platform Adaptive Studio version in two ways:​

1. Creating Private GKE and deploying the Katonic Platform Adaptive Studio version​

Multi-Node GKE Cluster​

Installing the Katonic Platform Adaptive Studio version​

2. Deploying Katonic Platform Adaptive Studio version on existing Private GKE​

Multi-Node GKE Cluster​

Installing Katonic Platform Adaptive Studio version​

Installation Verification​

Test and troubleshoot​

Deleting the Katonic platform from GCP​

Node pool requirements

Multi-Node GKE Cluster:

GCP Platform-Node Specifications

GCP Compute-Node Specifications

GCP Deployment-Node Specifications

GCP Vectordb-Node Specifications

GCP GPU-Node Specifications

Katonic Platform Installation

Installation process

Prerequisites

To install Katonic Platform Adaptive Studio version follow the steps mentioned below:

1. Log in to Quay with the credentials described in the requirements section above.

2. Retrieve the Katonic installer image from Quay.

3. Create a directory.

4. Adding PEM Encoded Public Key Certificate and Private Key to Directory

5. The Katonic Installer can deploy the Katonic Platform Adaptive Studio version in two ways:

1. Creating Private GKE and deploying the Katonic Platform Adaptive Studio version

Multi-Node GKE Cluster

Installing the Katonic Platform Adaptive Studio version

2. Deploying Katonic Platform Adaptive Studio version on existing Private GKE

Multi-Node GKE Cluster

Installing Katonic Platform Adaptive Studio version

Installation Verification

Test and troubleshoot

Deleting the Katonic platform from GCP