Skip to main content
Version: 4.4

Katonic GenAI

Node pool requirementsโ€‹

The GKE cluster must have at least three node pools with the following specifications and distinct node labels:

SR NO.POOLMIN-MAXVMLABELSTAINTS
1Platform1-4 (With HA) 2-4 (Without HA)c2-standard-4katonic.ai/node-pool=platformkatonic.ai /node-pool=platform:NoSchedule
2Compute1-10c2-standard-8katonic.ai/node-pool=compute
3Deployment1-10c2-standard-8katonc.ai/node-pool=deploymentkatonic.ai/node-pool=deployment:NoSchedule
4GPU (Optional)0-5Required VM typekatonic.ai/node-pool=gpu-{GPU-type}nvidia.com/gpu=present:NoSchedule

Note: For example we can use GPU type as v100, A30, A100

Note: When backup_enabled = True, then compute_nodes.min_count should be set to 2.

GCP Platform-Node Specificationsโ€‹

Platform nodes in platform GCP cloud deployments must fulfill the following hardware specification requirements according to the deployment type:

ComponentSpecification
Node countmin 2
Instance typec2-standard-4
vCPUs4
Memory16 GB
Boot disk size128 GB

GCP Compute-Node Specificationsโ€‹

Instance types that must be used by compute nodes in GCP cloud installations on the Katonic platform include:

Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. For specification details for each type, refer to the GCP documentation.

Note: Supported compute node configurations

  • c2-standard-8
  • c2-standard-16
  • c2-standard-32
  • Boot Disk: Min 128GB

GCP Deployment-Node Specificationsโ€‹

Instance types that must be used by deployment nodes in GCP cloud installations on the Katonic platform include:

Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum deployment node for the community version. For specification details for each type, refer to the GCP documentation.

Note: Supported deployment node configurations

  • c2-standard-8
  • c2-standard-16
  • c2-standard-32
  • Boot Disk: Min 128GB

GCP GPU-Node Specificationsโ€‹

As of now, the GPU node pool is supported by Katonic-installer version 4.4.

Choose the instance type that best fits your requirements. Google Kubernetes Engine (GKE) is also supported for application nodes in the GKS (Google Kubernetes Service) platform, utilizing the instance types provided by Google Cloud. For specification details for each type, refer to the GCP documentation.

Note: Supported gpu node configurations

  • Boot disk size = Min 512GB
  • Label = katonic.ai/node-pool=gpu-{gpu-type}
  • Taints = nvidia.com/gpu=present:NoSchedule

Note: For example we can use GPU type as v100, A30, A100

Additional node pools with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.

Katonic Platform Installationโ€‹

General completion time: 45 minute

Installation processโ€‹

The Katonic platform runs on Kubernetes. To simplify the deployment and configuration of Katonic services, Katonic provides an install automation tool called the Katonic-installer that will deploy Katonic into your compatible cluster. The Katonic-installer is an ansible role delivered in a Docker container and can be run locally.

Prerequisitesโ€‹

To install and configure Katonic in your GCP account you must have:

  • quay.io credentials from Katonic.

  • GCP with enough quota to create:

    • At least 2 c2-standard-4 machines for platform nodes and at least 1 c2-standard types EC2 machine for compute nodes
  • A Linux operating system (Ubuntu/Debian) based machine with the following Steps:

    a. A Linux operating system (Ubuntu/Debian) based machine needs 4GB RAM and 2vcpus and The boot disk size should be 50GB.

    b. While creating VM select the service account (Katonic) in the Identity and API access section. Skip to step c if you already have the machine with the given specifications.

    Note: After the platform is deployed successfully, the VM can be deleted.

    c. Switch to the root user inside the machine.

    d. gcloud CLI must be installed and logged in to your GCP project and service account using the gcloud init command.

    Commands for installing gcloud CLI:

    apt-get install snapd -y
    snap install google-cloud-cli --classic

    Commands to login using gcloud CLI:

    gcloud init
    gcloud auth application-default login

To install Katonic Platform GenAI version follow the steps mentioned below:โ€‹

1. Log in to Quay with the credentials described in the requirements section above.โ€‹

docker login quay.io

2. Retrieve the Katonic installer image from Quay.โ€‹

docker pull quay.io/katonic/katonic-installer:v4.4.1

3. Create a directory.โ€‹

mkdir katonic
cd katonic

4. Adding PEM Encoded Public Key Certificate and Private Key to Directoryโ€‹

Put the PEM encoded public key certificate (having extension .crt) for your domain and private key associated with the given certificate (having extension .key) inside the current directory (katonic).

5. The Katonic Installer can deploy the Katonic Platform GenAI version in two ways:โ€‹

  1. Creating GKE and deploying the Katonic Platform GenAI version.
  2. Install Katonic Platform GenAI version on existing GKE.

1. Creating GKE and deploying the Katonic Platform GenAI versionโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v4.4.1 init gcp katonic_genai deploy_kubernetes public

Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference

SR. NO.PARAMETERDESCRIPTIONVALUE
1katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_genai
2deploy_onCluster to be deployed onGCP
3create_k8s_clusterMust be set to TrueTrue
4gke_k8s_versionGKE versioneg. 1.27.3-gke.100(1.25 and above versions supported)
5cluster_nameCluster name to beeg. katonic-genai-platform-v4-4
6gcp_regionGCP region nameeg. us-east1
7gcp_project_idSet your GCP project IDeg. ardent-timm-1000678
8service_account_idSet created service account email IDeg. katonic-main@ardent-timm-1000678.iam.gserviceaccount.com
9high_availabilityTrue or False
10zone_1eg. us-east1-b
11zone_2eg. us-east2-c
12platform_nodes.instance_typePlatform node VM sizeeg. c2-standard-4
13platform_nodes.min_countMinimum number of platform nodes should be 2eg. 2
14platform_nodes.max_countMaximum number of platform should be greater than platform nodes min counteg. 3
15platform_nodes.os_disk_sizePlatform Nodes OS Disk Sizeeg. 128 GB
16compute_nodes.instance_typeCompute node VM sizeeg. c2-standard-8
17compute_nodes.min_countMinimum number of compute nodes should not be less than 1eg. 1
18compute_nodes.max_countMaximum number of compute should be greater than compute nodes min count nodes.eg. 3
19compute_nodes.os_disk_sizeCompute Nodes OS Disk Sizeeg. 128 GB
20deployment_nodes.instance_typeDeployment Node VM sizeeg. c2-standard-8
21deployment_nodes.min_countMinimum number of Deployment nodes should be 1eg. 1
22deployment_nodes.max_countMaximum number of Deployment should be greater than Deployment nodes min count nodes.eg. 4
23deployment_nodes.os_disk_sizeDeployment Nodes OS Disk Sizeeg. 128 GB
24gpu_enabledadd GPU nodepoolTrue or False
25gpu_nodes.instance_typeGPU node VM sizeeg n1-standard-1
26gpu_nodes.gpu_machine_typeType of machine you needeg nvidia-tesla-p4
27gpu_nodes.gpu_typeEnter the GPU type available on the machineeg. v100,k80
28gpu_nodes.gpu_counteg 2
29gpu_nodes.min_countMinimum number of GPU nodeseg. 1
30gpu_nodes.max_countMaximum number of GPU nodeseg. 2
31gpu_nodes.os_disk_sizeEnter GPU nodes OS disk sizeeg 512 GB
32gpu_nodes.gpu_vRAMEnter GPU node RAM size
33gpu_nodes.gpus_per_nodeEnter the number of GPUs per node
34enable_gpu_workspaceSet it true if you want to use GPU WorkspaceTrue or False
35genai_nfs_sizenfs storage size for genai100Gi
36backup_enabledEnable backupTrue or False
37backup_scheduleBackup schedule0 0 1 * *
38backup_expirationBackup expiration2160h0m0s
39use_custom_domainSet this to True if you want to host the Katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
40custom_domain_nameExpected a valid domaineg. katonic.tesla.com
41use_katonic_domainSet this to True if you want to host the Katonic platform on the Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
42katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
43AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
44AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
45AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
46AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
47AD_TOKEN_URLToken URL endpoint of app registered for SSO.
48quay_usernameUsername for quay registry
49quay_passwordPassword for quay registry
50adminUsernameEmail for admin usereg. john@katonic.ai
51adminPasswordPassword for admin userAt least 1 special character, at least 1 uppercase letter, at least 1 lowercase letter, minimum 8 characters
52adminFirstNameAdmin first nameeg. john
53adminLastNameAdmin last nameeg. musk

Installing the Katonic Platform GenAI versionโ€‹

docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v4.4.1

2. Deploying Katonic Platform GenAI version on existing GKEโ€‹

The steps are similar to Installing the Katonic Platform with GCP Google Kubernetes Engine. Just edit the configuration file with all the details about the target cluster, storage systems, and hosting domain. Read the following configuration reference, these are the only parameters required when installing the Katonic MLOps platform on existing GKE.

Prerequisites

You will need to create a kfs named storage class. Please refer to the main documentation of GCP โ†’ Dynamic Block Storage for instructions on how to create the storage class.

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v4.4.1 init gcp katonic_genai kubernetes_already_exists public
SR. NO.PARAMETERDESCRIPTIONVALUE
1katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_genai
2deploy_onCluster to be deployed onGCP
3cluster_nameEnter cluster name that you deployeg katonic-genai-platform-v4-4
4gcp_regionGCP region nameeg. us-east1
5gcp_project_idSet your GCP project IDeg. ardent-timm-1000678
6backup_enabledenabling of the backupTrue or False
7backup_schedulescheduling of the backup0 0 1 * *
8backup_expirationexpiration of the backup2160h0m0s
9use_custom_domainSet this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
10custom_domain_nameExpected a valid domain.eg. katonic.tesla.com
11use_katonic_domainSet this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
12katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
13AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
14AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
15AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
16AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
17AD_TOKEN_URLToken URL endpoint of app registered for SSO.
18quay_usernameUsername for quay registry
19quay_passwordPassword for quay registry
20adminUsernameemail for admin usereg. john@katonic.ai
21adminPasswordpassword for admin userat least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
22adminFirstNameAdmin first nameeg. john
23adminLastNameAdmin last nameeg. musk

Installing Katonic Platform GenAI versionโ€‹

docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v4.4.1

Installation Verificationโ€‹

The installation process can take up to one hour to complete fully. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are running or not.

kubectl get pods --all-namespace

This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running:

kubectl logs $POD_NAME --namespace $NAMESPACE_NAME

If the installation completes successfully, you should see a message that says:

TASK [platform-deployment : Credentials to access Katonic GenAI Platform] *******************************ok: [localhost] => {
"msg": [
"Platform Domain: $domain_name",
"Username: $adminUsername",
"Password: $adminPassword"
]
}

However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.

Test and troubleshootโ€‹

To verify the successful installation of Katonic, perform the following tests:

  • If you encounter a 500 or 502 error, take access of your cluster and execute the following command:

    kubectl rollout restart deploy nodelog-deploy -n application
  • Login to the Katonic application and ensure that all the navigation panel options are operational. If this test fails, please verify that Keycloak was set up properly.

  • Create a new project and launch a Jupyter/JupyterLab workspace. If this test fails, please check that the default environment images have been loaded in the cluster.

  • Publish an app with Flask or Shiny. If this test fails, please verify that the environment images have Flask and Shiny installed.

Deleting the Katonic platform from GCPโ€‹

When you start the installation, in your current directory, you will get the platform deletion script. you just need to run the script.

./gcp-cluster-delete.sh