Skip to main content

Katonic Companion

Node pool requirementsโ€‹

A. Single-Node EKS Cluster:โ€‹

The EKS cluster must have one node pool that produce worker nodes with the following specifications and distinct node labels.

SR NO.POOLMIN-MAXINSTANCELABELSTAINTS
1Compute1-10m5.2xlargekatonic.ai/node-pool=compute
2Vectordb1-4m5.xlargekatonic.ai/node-pool=vectordbkatonic.ai/node-pool=vectordb:NoSchedule

B. Multi-Node EKS Cluster:โ€‹

The EKS cluster must have at least three node pools that produce worker nodes with the following specifications and distinct node labels, and it might include an optional GPU pool:

SR NO.POOLMIN-MAXINSTANCELABELSTAINTS
1Platform2-4m5.largekatonic.ai/node-pool=platformkatonic.ai/node-pool=platform:NoSchedule
2Compute1-10m5.2xlargekatonic.ai/node-pool=compute
3Deployment1-10m5.2xlargekatonic.ai/node-pool=deploymentkatonic.ai/node-pool=deployment:NoSchedule
4Vectordb1-4m5.xlargekatonic.ai/node-pool=vectordbkatonic.ai/node-pool=vectordb:NoSchedule
5GPU (Optional)0-5p2.xlargekatonic.ai/node-pool=gpu-{GPU-type}nvidia.com/gpu=gpu-{GPU-type}:NoSchedule

Note: For example we can use Gpu type as v100, A30, A100

Note: When backup_enabled = True, then compute_nodes.min_count should be set to 2.

If you want Katonic to run with some components deployed as highly available ReplicaSets you must use 2 availability zones. All compute node pools you use must have corresponding ASGs in any AZ used by other node pools. Setting up an isolated node pool in one zone can cause volume affinity issues.

To run the node pools across multiple availability zones, you will need duplicate ASGs in each zone with the same configuration, including the same labels, to ensure pods are delivered to the zone where the required ephemeral volumes are available.

The easiest way to get suitable drivers onto GPU nodes is to use the EKS-optimized AMI distributed by Amazon as the machine image for the GPU node pool.

Additional ASGs with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.

The Katonic installer can set up all configurations of ASG and zones for the Katonic platform.

AWS Platform-Node Specificationsโ€‹

Platform nodes in platform AWS cloud deployments must fulfil the following hardware specification requirements according to the deployment type:

COMPONENTSPECIFICATION
Node countMin 2
Instance typem5.large
vCPUs4
Memory16 GB
Boot disk size128 GB

AWS Compute-Node Specificationsโ€‹

The following instance types are required for Compute nodes in Katonic platform's AWS cloud deployments:

Choose the type that best fits your requirements. AWS Elastic Kubernetes Service (EKS) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum Compute node for the teams version. For specification details of each type, refer to the AWS documentation.

Note: Supported compute node configurations

  • m5.xlarge
  • m5.2xlarge (default configuration)
  • m5.4xlarge
  • m5.8xlarge
  • m5.12xlarge
  • Boot disk : 128GB

AWS Deployment-Node Specificationsโ€‹

The following instance types are required for Deployment nodes in Katonic platform's AWS cloud deployments:

Choose the type that best fits your requirements. AWS Elastic Kubernetes Service (EKS) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum deployment node for the teams version. For specification details of each type, refer to the AWS documentation.

Note: Supported deployment node configurations

  • m5.xlarge
  • m5.2xlarge (default configuration)
  • m5.4xlarge
  • m5.8xlarge
  • m5.12xlarge
  • Boot disk : 128GB

AWS Vectordb-Node Specificationsโ€‹

Vectordb nodes in platform AWS cloud deployments must fulfil the following hardware specification requirements:

COMPONENTSPECIFICATION
Instance typem5.xlarge

AWS GPU-Node Specificationsโ€‹

The following instance types are required for GPU nodes in Katonic platform's AWS cloud deployments:

Choose the type that best fits your requirements. AWS Elastic Kubernetes Service (EKS) is also supported for application nodes, using the instance types listed below. For specification details of each type, refer to the AWS documentation.

Note: Supported GPU node configurations

  • p2.xlarge (default configuration)
  • p2.8xlarge
  • p2.16xlarge
  • Boot Disk: 512 GB

Additional node pools with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.

Prerequisitesโ€‹

To install and configure Katonic in your AWS account you must have:

  • Quay credentials from Katonic.

  • Required: PEM encoded public key certificate for your domain and private key associated with the given certificate.

  • AWS region with enough quota to create:

    • At least 4 m5.2xlarge EC2 machines.
    • p2.xlarge or greater (All P instances) VMs, if you want to use GPU.
  • A Linux operating system (Ubuntu/Debian) based machine with the following steps:

    a. A Linux operating system (Ubuntu/Debian) based machine having 4GB RAM and 2vcpus. Skip to step b if you already have the machine with the given specifications.

    Note: After the platform is deployed successfully, the VM can be deleted.

    b. Switch to the root user inside the machine.

    c. AWS CLI must be installed and logged in to your AWS account using the aws configure command, with a user that has IAM policies required to create the resources listed above.

    Commands for installing AWS CLI v2:

    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
    unzip awscliv2.zip
    sudo ./aws/install
  • The following tools should be installed:

To install Katonic Platform Companion version follow the steps mentioned below:โ€‹

1. Take access of the JumpHost and configure aws.โ€‹

2. Log in to Quay with the credentials described in the requirements section above.โ€‹

docker login quay.io

3. Retrieve the Katonic installer image from Quay.โ€‹

docker pull quay.io/katonic/katonic-installer:v5.0.9

4. Create a directory.โ€‹

mkdir katonic
cd katonic

5. Adding PEM Encoded Public Key Certificate and Private Key to Directoryโ€‹

Put the PEM encoded public key certificate (having extension .crt) for your domain and private key associated with the given certificate (having extension .key) inside the current directory (katonic).

6. The Katonic Installer can deploy the Katonic Platform Companion version in two ways:โ€‹

  1. Creating Private EKS and deploying the Katonic Platform Companion version.
  2. Install Katonic Platform on existing Private AWS Elastic Kubernetes Service.

1. Creating Private EKS and deploying the Katonic Platform Companion versionโ€‹

A. Single-Node EKS Clusterโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init aws katonic_companion single_node deploy_kubernetes private

Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference:

PARAMETERDESCRIPTIONVALUE
katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_companion
deploy_onCluster to be deployed onAWS
create_k8s_clusterMust be set True if EKS is not deployedTrue
single_node_clusterset "True" if opting for single node clusterTrue or False
eks_versionEKS versioneg. 1.30(1.27 and above versions supported)
cluster_nameEnter cluster name which you deployeg. katonic-companion-platform-v5-0
aws_regionAWS region nameeg. us-east-1
private_clusterSet "True" if opting for private clusterTrue
vpc_idPass VPC ID if opting for private cluster
subnet_1_idPass one of the Private Subnet ID's if opting for private cluster
subnet_2_idPass another one of the Private Subnet ID's if opting for private cluster
jump_server_namename of the jump server/ jump host created for private cluster
enable_exposing_genai_applications_to_internetset "True" if opting for exposing genai applications to internetFalse
public_domain_for_genai_applicationsPublic FQDN of domain for genai applications that will be exposed to the internet(eg. public-chatbots.google.com)
internal_loadbalancerSet "True" if opting for internal loadbalancerFalse
compute_nodes.instance_typeCompute node VM sizeeg. m5.2xlarge
compute_nodes.min_countMinimum number of compute nodes should be 1eg. 1
compute_nodes.max_countMaximum number of compute should be greater than compute nodes min count nodes.eg. 4
compute_nodes.os_disk_sizeCompute Nodes OS Disk Sizeeg. 128 GB
vectordb_nodes.instance_typeVectordb Node VM sizeeg. m5.xlarge
vectordb_nodes.min_countMinimum number of Vectordb nodes should be 1eg. 1
vectordb_nodes.max_countMaximum number of Vectordb should be greater than Vectordb nodes min count nodes.eg. 4
vectordb_nodes.os_disk_sizeVectordb Nodes OS Disk Sizeeg. 128 GB
gpu_enabledAdd GPU nodepoolTrue or False
gpu_nodes.instance_typeGPU node VM sizeeg. p2.xlarge
gpu_nodes.gpu_typeEnter the type of GPU available on machine.eg. v100,k80,none
gpu_nodes.min_countMinimum number of GPU nodeseg. 1
gpu_nodes.max_countMaximum number of GPU nodeseg. 4
gpu_nodes.os_disk_sizeGPU Nodes OS Disk Sizeeg. 512 GB
gpu_nodes.gpu_vRAMEnter GPU Node RAM size
gpu_nodes.gpus_per_nodeEnter GPU per node count
enable_gpu_workspaceSet it true if you want to use GPU WorkspaceTrue or False
shared_storage_createSet it true if you want to use shared storageTrue or False
genai_nfs_sizenfs storage size for genai100Gi
workspace_timeout_intervalSet the Timeout Interval in Hourseg: "12"
backup_enabledenabling of the backupTrue or False
s3_bucket_namename of the s3 bucketeg. katonic-backup
s3_bucket_regionregion of the s3 bucketeg. us-east-1
backup_schedulescheduling of the backup0 0 1 * *
backup_expirationexpiration of the backup2160h0m0s
use_custom_domainSet this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
custom_domain_nameExpected a valid domain.eg. katonic.tesla.com
use_katonic_domainSet this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
enable_pre_checksSet this to True if you want to perform the Pre-checksTrue / False
AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
AD_TOKEN_URLToken URL endpoint of app registered for SSO.
quay_usernameUsername for quay
quay_passwordPassword for quay
adminUsernameemail for admin usereg. john@katonic.ai
adminPasswordpassword for admin userat least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
adminFirstNameAdmin first nameeg. john
adminLastNameAdmin last nameeg. musk

B. Multi-Node EKS Clusterโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init aws katonic_companion multi_node deploy_kubernetes private

Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference:

PARAMETERDESCRIPTIONVALUE
katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_companion
deploy_onCluster to be deployed onAWS
create_k8s_clusterMust be set True if EKS is not deployedTrue
eks_versionEKS versioneg. 1.30(1.27 and above versions supported)
cluster_nameEnter cluster name which you deployeg. katonic-companion-platform-v5-0
aws_regionAWS region nameeg. us-east-1
private_clusterSet "True" if opting for private clusterTrue
vpc_idPass VPC ID if opting for private cluster
subnet_1_idPass one of the Private Subnet ID's if opting for private cluster
subnet_2_idPass another one of the Private Subnet ID's if opting for private cluster
jump_server_namename of the jump server/ jump host created for private cluster
enable_exposing_genai_applications_to_internetset "True" if opting for exposing genai applications to internetFalse
public_domain_for_genai_applicationsPublic FQDN of domain for genai applications that will be exposed to the internet(eg. public-chatbots.google.com)
internal_loadbalancerSet "True" if opting for internal loadbalancerFalse
compute_nodes.instance_typeCompute node VM sizeeg. m5.2xlarge
compute_nodes.min_countMinimum number of compute nodes should be 1eg. 1
compute_nodes.max_countMaximum number of compute should be greater than compute nodes min count nodes.eg. 4
compute_nodes.os_disk_sizeCompute Nodes OS Disk Sizeeg. 128 GB
vectordb_nodes.instance_typeVectordb Node VM sizeeg. m5.xlarge
vectordb_nodes.min_countMinimum number of Vectordb nodes should be 1eg. 1
vectordb_nodes.max_countMaximum number of Vectordb should be greater than Vectordb nodes min count nodes.eg. 4
vectordb_nodes.os_disk_sizeVectordb Nodes OS Disk Sizeeg. 128 GB
gpu_enabledAdd GPU nodepoolTrue or False
gpu_nodes.instance_typeGPU node VM sizeeg. p2.xlarge
gpu_nodes.gpu_typeEnter the type of GPU available on machine.eg. v100,k80,none
gpu_nodes.min_countMinimum number of GPU nodeseg. 1
gpu_nodes.max_countMaximum number of GPU nodeseg. 4
gpu_nodes.os_disk_sizeGPU Nodes OS Disk Sizeeg. 512 GB
gpu_nodes.gpu_vRAMEnter GPU Node RAM size
gpu_nodes.gpus_per_nodeEnter GPU per node count
enable_gpu_workspaceSet it true if you want to use GPU WorkspaceTrue or False
shared_storage_createSet it true if you want to use shared storageTrue or False
genai_nfs_sizenfs storage size for genai100Gi
workspace_timeout_intervalSet the Timeout Interval in Hourseg: "12"
backup_enabledenabling of the backupTrue or False
s3_bucket_namename of the s3 bucketeg. katonic-backup
s3_bucket_regionregion of the s3 bucketeg. us-east-1
backup_schedulescheduling of the backup0 0 1 * *
backup_expirationexpiration of the backup2160h0m0s
use_custom_domainSet this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
custom_domain_nameExpected a valid domain.eg. katonic.tesla.com
use_katonic_domainSet this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
enable_pre_checksSet this to True if you want to perform the Pre-checksTrue / False
AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
AD_TOKEN_URLToken URL endpoint of app registered for SSO.
quay_usernameUsername for quay
quay_passwordPassword for quay
adminUsernameemail for admin usereg. john@katonic.ai
adminPasswordpassword for admin userat least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
adminFirstNameAdmin first nameeg. john
adminLastNameAdmin last nameeg. musk

Installing the Katonic Platform Companion versionโ€‹

After configuring the katonic.yml file, run the following command to install the Katonic Platform Companion version:

docker run -it --rm --name install-katonic -v /root/.aws:/root/.aws -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9

2. Install Katonic Platform on existing Private AWS Elastic Kubernetes Serviceโ€‹

The steps are similar to Installing the Katonic Platform with Private AWS Elastic Kubernetes Service. Just edit the configuration file with all the details about the target cluster, storage systems, and hosting domain. Read the following configuration reference, these are the only parameters required when installing the Katonic Platform on existing Private EKS.

Prerequisites

You will need to create an EBS gp3-based storage class named kfs. To do this, you need to install and configure the EBS CSI driver in the EKS cluster. Refer to the documentation for instructions on creating the GP3 based storage class.

SINGLE NODEโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init aws katonic_companion single_node kubernetes_already_exists private

MULTI NODEโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init aws katonic_companion multi_node kubernetes_already_exists private

For both single node and multi-node clusters, the configuration template includes the following parameters:

PARAMETERDESCRIPTIONVALUE
katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_companion
deploy_onCluster to be deployed onAWS
single_node_clusterset "True" if opting for single node clusterTrue or False
cluster_nameEnter cluster name which you deployeg. katonic-companion-platform-v5-0
aws_regionEnter region where you deploy EKS clustereg. us-east-1
private_clusterset "True" for private clusterTrue
enable_exposing_genai_applications_to_internetset "True" if opting for exposing genai applications to internetFalse
public_domain_for_genai_applicationsPublic FQDN of domain for genai applications that will be exposed to the internet(eg. public-chatbots.google.com)
internal_loadbalancerSet "True" if opting for internal loadbalancerFalse
genai_nfs_sizenfs storage size for genai100Gi
workspace_timeout_intervalSet the Timeout Interval in Hourseg: "12"
backup_enabledEnabling of the backupTrue or False
s3_bucket_nameName of the s3 bucketeg. katonic-backup
s3_bucket_regionRegion of the s3 bucketeg. us-east-1
backup_scheduleScheduling of the backup0 0 1 * *
backup_expirationExpiration of the backup2160h0m0s
use_custom_domainSet this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
custom_domain_nameExpected a valid domain.eg. katonic.tesla.com
use_katonic_domainSet this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
enable_pre_checksSet this to True if you want to perform the Pre-checksTrue / False
AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
AD_TOKEN_URLToken URL endpoint of app registered for SSO.
quay_usernameUsername of quay registry
quay_passwordPassword of quay registry
adminUsernameEmail for admin usereg. john@katonic.ai
adminPasswordPassword for admin userat least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
adminFirstNameAdmin first nameeg. john
adminLastNameAdmin last nameeg. musk

Note: In the katonic.yml template, the single_node_cluster parameter will be set to True for single node clusters and will be omitted for multi-node clusters

Installing the Katonic Platform Companion versionโ€‹

After configuring the katonic.yml file, run the following command to install the Katonic Platform Companion version:

docker run -it --rm --name install-katonic -v /root/.aws:/root/.aws -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9

Installation Verificationโ€‹

The installation process can take up to 60 minutes to complete fully. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are in a running state or not.

cd /root/katonic

aws eks --region $(cat /root/katonic/katonic.yml | grep aws_region | awk '{print $2}') update-kubeconfig --name $(cat /root/katonic/katonic.yml | grep cluster_name | awk '{print $2}')-$(cat /root/katonic/katonic.yml | grep random_value | awk '{print $2}')

kubectl get pods --all-namespace

This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running the:

kubectl logs $POD_NAME --namespace $NAMESPACE_NAME

If the installation completes successfully, you should see a message that says:

TASK [platform-deployment : Credentials to access Katonic Companion Platform] *******************************ok: [localhost] => {
"msg": [
"Platform Domain: $domain_name",
"Username: $adminUsername",
"Password: $adminPassword"
]
}

However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.

Test and troubleshootโ€‹

To verify the successful installation of Katonic, perform the following tests:

  • If you encounter a 500 or 502 error, take access of your cluster and execute the following command:

    kubectl rollout restart deploy nodelog-deploy -n application
  • Login to the Katonic application and ensure that all the navigation panel options are operational. If this test fails, please verify that Keycloak was set up properly.

  • Create a new project and launch a Jupyter/JupyterLab workspace. If this test fails, please check that the default environment images have been loaded in the cluster.

  • Publish an app with Flask or Shiny. If this test fails, please verify that the environment images have Flask and Shiny installed.

Deleting the Katonic platform from AWSโ€‹

When you start the installation, in your current directory, you will get the platform deletion script. you just need to run the script.

./aws-cluster-delete.sh