Skip to main content

Model Training Experiments.

Data scientists have access to many libraries and packages that help with model development. Some of the most common for Python are XGBoost, Keras, and scikit-learn. These packages are already available in the Katonic SDK(Auto ML).

Once we are done with the text preprocessing like removing unnecessary punctuations, symbols and converting those text into numerical features. We need to use them to train Machine Learning models. For that we are going to Katonic's Auto ML tool which will keep track of all your experiments that you're doing inside a Notebook and store them in an Experiments Registry. From there we can compare different models and find the best model.

Before doing any model training in the Notebook, you need to set an experiment using katonic Auto ML package.

Setting the Experiment names to Catalouge all the models.

from katonic.ml.client import set_exp
exp_name = set_exp("movie_genre_usecase")

The models that you're going to train will get stored under the Experiment name that you gave. In this case the experiment name is movie_genre_usecase.

We are using three of the most commonly used machine learning algorithms for this text classification usecase.

  • Decision Tree Classifier.

  • Random Forest Classifier.

  • K Nearest Neighbor Classifier.

Training


# Decision Tree Classifier

classifier = Classifier(X_train, X_test, Y_train, Y_test, experiment_name = "movie_genre_usecase", average = "weighted")

classifier.DecisionTreeClassifier()

# Random Forest Classifier.

classifier.RandomForestClassifier()

# K Nearest Neighbors Classifier.

classifier.KNeighborsClassifier()