Skip to main content
Version: 4.5

Building Entire Pipeline

Katonic custom components are pre-built components based on low-code no-code architecture. The components help you built a full end-to-end pipeline without writing a single line of code.

Here we'll be taking tips dataset and perform different preprocessing to prepare it for training.

All of the components will be listed inside the katonic pipeline canvas

canvas

To build the pipeline one component(Input Config) needs to be the 1st element in the pipeline that takes necessary parameters from the user, then all these parameters are traversed throughout the entire pipeline.

input

Follow the below steps to design and run a pipeline using custom componentโ€‹

  • Drag and drop the components into the canvas in order to use them.

    input_config

  • After you have imported your input config onto the canvas, you can right-click on it and select the option to open its properties.

    • Here are three pipeline properties, pipeline parameters and node parameters.

    • Pipeline properties are those properties which are applicable for the entierity of the pipeline.

    • Pipeline parameters are used defined parameters than can be used inside the pipeline

    • Node properties are the properties related to an individual components.

      input_config_rc

  • A panel will appear on the right side of the screen and various input fields will be visible for you to complete.

    input_config_prop

  • Provide suitable values for each of the input fields.

    • DATA_PATH: Input data file path inside private or shared bucket in katonic platform.(Provided input: tips.pkl)

    • IS_TRAINING: To tell the system if the pipeline will run for training of inference. (Default: True)

    • STORAGE_BUCKET: The katonic file manager bucket name. You can get it by going to the File Manager section of the platform. I'm using the private bucket as the data is stored there.

      [image for file manager]

    • FOLDER_NAME: The folder name that will contain all the artifacts related to the pipeline.(Provided input: Automl_test)

    • TARGET_COLUMN: The target column name in the dataset.(Provided input: total_bill)

    • ID_COLUMN: The unique id column in the data, if doesn't exist create one as it is required for the pipeline to run properly.(Provided input: ID)

    • ACCESS_KEY: The access key of your file manager. This can be generated by going to File Manager section in the katonic platform and clickin on Access Token+.

    • SECRET_KEY:The secret key of your file manager. This can be generated by going to File Manager section in the katonic platform and clickin on Access Token+.

      input_config_with_param

  • Now drag and drop mean imputer to the canvas from the catalogue

    mean_imputer

  • Connect the output port of input config to the input port of mean imputer

    input_mean_connect

  • Populate the imputer component with suitable values and use the output from the input configuration as the input parameters for the imputer component.

    • INPUT_PARAMETERS: This contains all the necessary paramaters required to run the pipeline and here we'll take the parameters from input congfig.

    • COLUMN_NAMES: The name of columns to be imputed(Comma seperated if multiple.)

      mean_imputer_params

  • Likewise, bring the one-hot encoding component to the canvas to encode the categorical attributes and connect it to mean imputer

    one_hot

  • Utilize the output from the mean imputer component as the input parameters for the one-hot encoding component.

    one_hot_input_params

  • Provide suitable values for each of the input fields.

    one_hot_params

  • Now that we are using private bucket as our storage destination, we have to attach it to the pipeline.

  • Open properties panel of any components then click on pipeline properties.

    pipeline_properties

  • Click on Add button of data volumne

    data_volume_add

  • Provide the data for attaching data volume

    • Mount Path: Provide /kfs_private if private bucket is used and provide /kfs_public if shared bucket is used

    • Persistent Volume Claim Name: Provide your bucket name

      data_volume_params

  • Click on the run button

    run_button

  • Assign a name to the pipeline and choose the runtime.

    pipeline_run_prop

  • Run pipeline and click on run details to see your pipeline

    run_details

  • Once the pipeline is complete running it will look like below

    pipeline_complete

  • To execute the pipeline for making predictions, choose the name of the newly created folder from the logs.

    created_folder_name

  • Paste it to the folder name section of input config

    inference_pipe_foldername

  • Write False in is_training parameter

    inference_pipe_istrain

  • Provide the inference data file name

    inference_pipe_data_path

  • Run the pipeline by clickin on the run button.

    run_button

  • The final resulted file path can be found in the log of last running component

    inference_data_path_output

You can create any pipeline with components as per your requirements using katonic custom components.