Skip to main content

Katonic OneClick

This guide describes how to install the Community version of the Katonic MLOps platform on AWS using a One-click installation strategy.

Hardware Configurations

A scalable cluster implementation is composed of a standard set of master nodes, a set of worker nodes dedicated to hosting Katonic platform services, and a set of worker nodes dedicated to hosting compute workloads. This configuration is designed to achieve superior performance that enables real-time execution of analytics, machine learning (ML), and artificial intelligence (AI) applications in a production pipeline.

Katonic on EKS

Katonic can run on a Kubernetes cluster provided by AWS Elastic Kubernetes Service. When running on EKS, the Katonic architecture uses AWS resources to fulfill the Katonic Generative AI platform requirements as follows:

Architecture1

  • The control of Kubernetes is transferred to the EKS control plane, which offers managed Kubernetes masters.

  • Katonic uses a dedicated Auto Scaling Group (ASG) of EKS workers to host the Katonic platform.

  • ASGs of EKS workers host elastic compute for Katonic executions.

  • The kubernetes.io/aws-ebs provisioner is used to create persistent volumes for Katonic executions.

  • Katonic cannot be installed on EKS Fargate since Fargate does not support stateful workloads with persistent volumes.

All AWS services listed previously are required except GPU compute instances.

Your annual Katonic license fee will not include any charges incurred from using AWS services.

You can find detailed pricing information for the Amazon services listed above at (https://aws.amazon.com/pricing.)

Security considerations

To provision an EKS cluster, it is essential to create IAM policies in the AWS console. Katonic recommends following the standard security practice of granting the least privilege when creating IAM policies. It is advised to start with minimal privileges and only grant elevated privileges when necessary.

For more information, refer to the concept of Grant Least Privilege concept.

IAM permissions for user

The only permission required for a user to carry out the installation is IAMFullAccess

And here are the permissions that are granted to an EC2 instance to complete the installation.

  • AmazonEC2FullAccess

  • IAMLimitedAccess

  • AmazonVPCFullAccess

  • AmazonEKSAllAccessPolicy

  • AWSCloudFormationFullAccess

  • AmazonElasticFileSystemFullAccess

Service quotas

Amazon maintains default service quotas for each of the services listed previously. You can check the default service quotas and manage your quotas by logging in to the AWS Service Quotas console.

Domain

Katonic allocates a domain that has user required prefix with additional random suffix to provide a *.katonic.cloud domain to access the Katonic Generative AI Platform. eg. tesla-07092023.katonic.cloud

Calculating Required Infrastructure Resources (AWS)

Allocated Infrastructure Resources

When the platform is installed, it creates the following resources. Take this into account when selecting your installation configuration.

Sr. No.TypeAmountWhenNotes
1Classic Elastic Load Balancer1AlwaysOnly 1 is required. Automatically gets created by EKS when required.
2Network interface1 per nodeAlways
3OS boot disk (AWS EBS )1 per nodeAlways
4Public IP address1 per nodeThe platform has public IP addresses.
5VPC1The platform is deployed to a new VPC.
6Security Group1AlwaysSee Security Groups Configuration (AWS).
7SNS1AlwaysOur platform ensures a seamless installation process, requiring user confirmation via an SNS confirmation email before initiating the installation. The email will only be valid for 10 minutes. The installation will fail if you fail to subscribe within 10 mins of receiving the email.
8EKS Cluster1EKS is used as the application clusterversion 1.27
9AWS EFS1When you enable shared storage while installing Katonic platform.

Kubernetes(EKS) version

Katonic Generative AI platform 4.4 version has been validated with Kubernetes(EKS) version 1.27.

Node pools

The EKS cluster consists of three node pools, each designed to deliver worker nodes with specific specifications and unique node labels

PoolMin-MaxInstanceLabelsTaints
1platform2-4m5.xlargekatonic.ai/node-pool=platformkatonic.ai/node-pool=platform:NoSchedule
2compute1-10m5.2xlargekatonic.ai/node-pool=compute
3deployment1-10m5.2xlargekatonic.ai/node-pool=deploymentkatonic.ai/node-pool=deployment:NoSchedule

AWS Platform-Node Specifications

For platform nodes in AWS cloud deployments, hardware specifications will align with the following requirements based on the deployment type:

ComponentSpecification
Node countmin 2
Instance typem5.xlarge
vCPUs4
Memory16 GB
Boot disk size128 GB

AWS Compute-Node Specifications

In the context of one-click installation, the Katonic Generative AI platform streamlines the setup process for AWS cloud deployments.

For compute nodes, we have the flexibility to select the instance type that best suits the needs from the provided options. Additionally, AWS Elastic Kubernetes Service (EKS) is supported for application nodes, utilizing the specified instance types. The community version of Katonic requires a minimum of 1 compute node, which is automatically provisioned during the one-click setup. For detailed specifications for each type, please refer to the AWS documentation.

Note: Supported compute node configurations

  • m5.xlarge

  • m5.2xlarge (default configuration)

  • m5.4xlarge

  • m5.8xlarge

  • m5.12xlarge

Boot disk size:

  • Boot Disk: 128GB

AWS Deployment-Node Specifications​

For deployment nodes, we have the flexibility to select the instance type that best suits the needs from the provided options. Additionally, AWS Elastic Kubernetes Service (EKS) is supported for application nodes, utilizing the specified instance types. The community version of Katonic requires a minimum of 1 compute node, which is automatically provisioned during the one-click setup. For detailed specifications for each type, please refer to the AWS documentation.

Note: Supported deployment node configurations

  • m5.xlarge

  • m5.2xlarge (default configuration)

  • m5.4xlarge

  • m5.8xlarge

  • m5.12xlarge

Boot disk size:

  • Boot Disk: 128GB

Katonic Platform Installation

Completion Time

General completion time: 1 hour

Prerequisites

To install and configure Katonic in your AWS account you must have:

  • AWS region with enough quota to create:

    • At least 4 m5.2xlarge EC2 machines
  • IAM Full Access to the user.

  • At least one EC2 Key Pair (RSA encrypted and .pem file format) must exist in the region where you want to deploy the Katonic Generative AI Platform. If not present, follow the provided link for creating SSH Key Pair in AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-key-pairs.html

Installation Process

  • First, click on the One-Click install button on the katonic.ai website.

Note: Make sure the AWS user has prerequisites satisfied

Architecture1

  • Fill the CloudFormation Stack Template with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference:
ParameterDescriptionValue
1Stack NameName of the Stackdefault : katonic
2Instance TypeType of ec2 instanceeg t3.medium
3SSH LocationSSH access to CIDRdefault : 0.0.0.0
4KeyPairNameEC2 kay pair name.eg. oneclick Note: If you don’t have keypair already then first you need to create it.
5RegionAWS region nameeg. us-east-1
6Platform Nodes TypePlatform node VM sizeeg. m5.xlarge
7PlatformNodesMinCountMinimum number of platform nodes should be 2eg. 2
8PlatformNodesMaxCountMaximum number of platform nodeseg. 4
9PlatformNodesOsSizePlatform Nodes OS Disk Sizeeg. 128 GB
10Compute Nodes TypeCompute node VM sizeeg. m5.2xlarge
11ComputeNodesMinCountMinimum number of platform nodes should not be less than 1eg. 1
12ComputeNodesMaxCountMaximum number of platform nodeseg. 4
13ComputeNodesOsSizeCompute Nodes OS Disk Sizeeg. 128 GB
14Deployment Nodes TypeDeployment Node VM sizeeg. m5.xlarge
15DeploymentNodesMinCountMinimum number of Deployment nodes should be 1eg. 1
16DeploymentNodesMaxCountMaximum number of Deployment nodeseg. 4
17DeploymentNodesOsSizeDeployment Nodes OS Disk Sizeeg. 128 GB
18GenerativeAIstoragesizeGenerative AI Storage Size in GiBeg: 64
19OpenAI_KeyValid OpenAI KeyFormat: "^sk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}$"
20AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
21AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
22AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
23AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
23AD_TOKEN_URLToken URL endpoint of app registered for SSO.
25Admin UsernameEmail for the admin usereg. john@katonic.ai
26Admin PasswordPassword for the admin userat least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
27Admin First NameAdmin first nameeg. john
28Admin Last NameAdmin last nameeg. musk

Note: Permissions : IAM role for CloudFormation to use for all operations performed on the stack Important: Do not make any alterations to the IAM role or its settings in this section. Please retain the default values and configurations as they are.

Architecture1

Note: This one-click installation currently does not support the ap-south-1 (Mumbai) region. Please ensure that you select a supported region.

  • After putting the values, users should check the box to indicate their acknowledgment before proceeding to create the stack.

Architecture1

  • Finally, click on Create Stack.

Note: you’ll get an email for subscribing to the AWS SNS topic created for this installation.

Architecture1

You have 10 mins to subscribe to the SNS after receiving an email on the mail provided in adminUsername. The installation will fail if you fail to subscribe within 10 mins of receiving the email.

  • After that, you must wait an hour for the installation to complete.

  • Accessing deployed platform.

Note: You will receive an email once the installation is complete

Architecture1

Open the platform in the browser and use the credentials for logging in.

Installation Verification

Accessing Deployed Cluster

Step 1: Take access to the EC2 machine which installed the platform

In the ec2 service of AWS, You will see an instance named Katonic-installer in the region where the katonic stack is deployed.

Click on Connect

Architecture1

Step 2: Cloudshell

Leaving the last page as it is, in a new tab open the AWS Cloudshell service.

Architecture1

Open the service and wait for the shell to be ready.

Step 3: Uploading SSH keypair to CloudShell

Click on actions in the top right corner, and select the upload file option.

Architecture1

Select the .pem file for the SSH keypair assigned Katonic-installer instance, and upload.

eg. we passed ohio named SSH keypair to the template. And the .pem file for that is saved in our local machine by the named ohio.pem.

Architecture1

Click on upload.

Step 4: SSH into the machine using AWS Cloudshell

Use the following command to secure the uploaded .pem file.

Architecture1

Finally copy, the SSH command from the EC2 tab and paste it into Cloudshell.

Copy the command:

Architecture1

Paste in AWS Cloudshell. Switch to the root user using the sudo -i command.

Architecture1

Use the following commands to get backend access(kubectl) to deploy AWS EKS in the cluster.

cd /root/katonic
aws eks --region $(cat /root/katonic/katonic.yml | grep aws_region | awk '{print $2}') update-kubeconfig --name $(cat /root/katonic/katonic.yml | grep cluster_name | awk '{print $2}')-$(cat /root/katonic/katonic.yml | grep random_value | awk '{print $2}')

Verification

  • First, take access to the EC2 instance named “katonic-installer” which is created by the One-Click CloudFormation Stack.

  • The installation process can take up to one hour to fully complete. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are running or not.

kubectl get pods --all-namespace
  • This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running:
kubectl logs $POD_NAME --namespace $NAMESPACE_NAME

However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.

Note: You can stop the Katonic-installer instance after installation complete. Do not terminate the Katonic-installer EC2 instance.

Test and troubleshoot

Run the following tests to verify that your Katonic installation was successful:

  • If you are getting 500 or 502 error then take access of your cluster and run the below command:
kubectl rollout restart deploy nodelog-deploy -n application
  • Login to the Katonic application and that all the navigation panel options are operational.

Failure of this test means you must check that Keycloak was set up properly.

  • Create a new project and launch a Jupyter/JupyterLab workspace.

Failure of this test means you must check that default environment images have been loaded in the cluster.

  • Publish an app with flask or shiny apps.

Failure of this test means you must check that the environment images have flask and shiny installed.

Deleting Katonic Generative AI Platform from AWS

After completing the Oneclick installation you will get the platform deletion script in the Katonic-installer instance on path /root/katonic. you just need to run the script.

./aws-cluster-delete.sh