Skip to main content

Ingesting Features to feature store

Once we preprocessed all the features and transformed them into required state, we can store them inside a Feature store. So that anyone from the same organization/project can reuse them at any time.

Overview of Feature Store

Katonic Feature Store is a python library which we can use in various phases of Machine Learning Project. We can use it to store the defined features from the Online and Offline data sources and writes them in Online and Offline stores. We can use these features across the Organization with low latency and high consistency. It is also very useful with the Real-time data to get the necessary features in real-time.

Feature Store Advantages :โ€‹

  • Feature Consistency:

    You can keep the consistency between the Training and Inferencing data by using Feature Store. Both the training and inferencing features will get retrieved from Feature Store itself.

  • Feature Reusability:

    Once you preprocess the data and store that into a feature store, you can reuse these features across the teams with in the Organization,for any number of projects.

In Order to ingest the features, we need to initialize the Feature with the following details.

from katonic.fs.feature_store import FeatureStore
from katonic.fs.entities import Entity, FeatureView
from katonic.fs.value_type import ValueType
from katonic.fs.core.offline_stores import DataFrameSource

fs = FeatureStore(
user_name = "username",
project_name = "default_loan_prediction",
description = "project which will predict bank defaulters
)

Now we've initialized feature store, all the operations that we are going to do using feature store, will be stored under this specific project and user.

Defining a Entity key which will act as a primary key for the feature view that we are going to create.

entity_key = Entity(name = "id", value_type = ValueType.INT64)

Note: Before creating a Feature view, please make sure that your event time stamp column's data type is accurate. Then only we can perform the point-in-time joins in the future stages.

After defining the Entity key, we will create a batch source to provide the data to feature store.

We can provide the data in various formats like DataFrame source or File Source.

batch_source =  DataFrameSource(
dataframe, # Provide your dataframe
event_timestamp_column = "event_timestamp"), #Event column name

Once we defined the Data Source, we need to define the columns, which we want to store inside the feature store.

cols = [
'annual_inc', 'short_emp', 'emp_length_num',
'dti', 'last_delinq_none', 'revol_util', 'total_rec_late_fee',
'od_ratio', 'grade_A', 'grade_B', 'grade_C', 'grade_D',
'grade_E', 'grade_F', 'grade_G', 'home_ownership_MORTGAGE',
'home_ownership_OWN', 'home_ownership_RENT', 'purpose_car',
'purpose_credit_card', 'purpose_debt_consolidation',
'purpose_home_improvement', 'purpose_house', 'purpose_major_purchase',
'purpose_medical', 'purpose_moving', 'purpose_other',
'purpose_small_business', 'purpose_vacation', 'purpose_wedding',
'term_ 36 months', 'term_ 60 months'
]

After that we will create a Feature View, that defines the structure of the table which is going to be stored in the Offline Store.

default_view = FeatureView(
name = "default_loan_feature_view", # A name for your feature view
entities = ["id"], # Entity Key
ttl = '2d',
features = cols, # Columns you want in Feature Store
batch_source = batch_source, # Your source object eg. FileSource or DataFrameSource.
)

Now we have successfully defined all the required things to ingest the data into feature store. With this, we can write the feature data to Offline store.

fs.write_table([entity_key, default_view])

Now we have successfully written the Data to an offline store, these files will get stored in your private bucket.