Skip to main content

Introduction

Katonic Python SDK for Complete ML Model Life Cycle.

Katonic Python SDK is a comprehensive package to perform all the Machine Learning and Data Science related operations.

For a complete list of examples and notebooks, please take a look at the Examples.

Minimum Requirementsโ€‹

  • Python 3.8 or higher.

Download using pip for the base packageโ€‹

pip install katonic==1.6.2

Katonic base package installation consists of a log package to log your standard or custom-built models into Katonic model registry, see example here.

The topics in this page:

  • Connectors
  • Filemanager
  • Feature Store
  • Experiment Operations
  • Registry Operation
  • Pipelines Operations
  • Drift

Connectorsโ€‹

A typical AI model life cycle begins with loading data into your workspace and analyzing it for useful insights. You can use Katonic SDK for this; it contains several connectors that you can use to load data and place it wherever you want. For example, Azure Blob, MySql, Postgres, and so on.

Install Connectorsโ€‹

pip install katonic[connectors]

You can explore the Connectors examples here.

Filemanagerโ€‹

Once you get the data you can use Katonic Filemanager to Get, Store and Update or manipulate Objects within the file manager with Katonic SDK.

Install Filemanager.โ€‹

pip install katonic[filemanager]

You can explore the Filemanager examples here.

Feature Storeโ€‹

Once you loaded all the necessary data that you want to work with. You'll do the preprocessing of it. Which consists of Handling the missing values, Removing the Outliers, Scaling the Data and Encoding the features etc. Once you've finished preprocessing the data. You need to ingest the data into a Feature store.

By uploading the clean data to a feature store, you can share it across the organization. So that other teams and data scientist working on the same problem can make use of it. By this way you can achieve Feature Reusability.

Training models and making predictions from the Feature store data will improve the consistency between the training data and serving data otherwise it will lead to training-serving skew.

Install Feature Storeโ€‹

pip install katonic[fs]

You can explore the feature store examples here.

Experiment Operationsโ€‹

Training Machine Learning models just with one or two lines of code, can be done by the Auto ML component inside the Katonic SDK.

Even all the metrics for Classification and Regression will get catalouged using SDK. Available Metrices are Accuracy score, F-1 score, Precison, Recall, Log loss, Mean Squared Error, Mean Absolute Error and Root Mean Squared Error.

You can explore the automl examples here.

Registry Operationsโ€‹

Once you finished training the models with your data. Katonic's SDK will keep track of all the models and store the Model metadata and metrices inside the Experiment Registry. From there you can choose the best model and send it into Model Registy.

Install Log.โ€‹

pip install katonic==1.6.2

Logging ML Model Examples.โ€‹

from katonic.log.logmodel import LogModel

from xgboost import XGBClassifier

# Creating a new experiment using set_exp function from log client.
exp_name = "diabetes_prediction"
lm = LogModel(exp_name, source_name="xgboost_model.ipynb")

clf = XGBClassifier(random_state=0)
clf.fit(X_train, y_train)

artifact_path = # [Optional] define custom artifact path name (str)
model_mertics = # [Optional] define custom metric in dictionary form

# Logging ML model
lm.model_logging(
model_name="xgboost",
model_type="xgboost",
model=clf,
artifact_path=artifact_path,
current_working_dir="xgboost_model.ipynb",
metrics=model_mertics
)

You can explore the logs examples here.

In Model Registy you can store the Best models according to your performance Metrices. By using the model registy you can tag the models with staging or production. The models that are with the tag production can be Deployed to the production and the models with staging tag can get a review check from the QA team and get to the further stages.

Pipeline Operationsโ€‹

No Data Scientist want to do the same thing again and again, instead of that Data Scientist want to use the previous work that he had done for the future purposes. We can do the same thing inside an AI Model Life Cycle.

We can convert all the work that we had done till now into a Scalable Pipeline. For that you can use the Pipelines component inside the Katonic SDK. If you want to perform the same operations with the different data, you just need to change the data source and run the pipeline. Every thing will get done automatically in a scalable manner.

Install Pipelinesโ€‹

pip install katonic[pipeline]

You can explore the pipeline examples here.

Driftโ€‹

An AI model life cycle will not end with the model deployment. You need to monitor the model's performance continuously in order to detect the model detoriation or model degradation. Drift component from Katonic's SDK will help you to find the Drift inside your data. It will perform certain statistical analysis upon the data in order to check if the upcoming data has any Outliers or the data is abnormal it will let you know through a Visual representaion.

Install Drift.โ€‹

pip install katonic[drift]

You can explore the drift examples here.