Skip to main content
Version: 3.2

Feature Store.

Once we preprocessed all the the features and transformed them into required state, we can store them inside a Feature store. So that any person from the same organization can use them at any time without doing all the preprocessing thing.

Overview of Feature Store.

Katonic Feature Store is a python library which we can use in various phases of Machine Learning Project. We can use feature store to store the defined features from the Online and Offline data sources and writes them in Online and Offline stores. We can use these features across the Organizatoin with low latency and high consistency. It is also very useful with the Real-time data to get the neccesary features in the Real-time manner.

Feature Store Advantages :โ€‹

  • Feature Consistency:

    You can keep the consistency between the Training and Inferencing data by using Feature Store. Both the training and inferencing features will get retrieved from Feature Store itself.

  • Feature Reusability:

    Once you preprocess the data and store that into a feature store. You can these features across the teams with in the Organization.For any number of projects.

In Order to ingest the features into Feature Store. We need to initialize the Feature with the following details.

python
from katonic.fs import FeatureStore, Entity, ValueType, DataFrameSource

fs = FeatureStore(
user_name = "username",
project_name = "default_loan_prediction",
description = "project which will predict bank defaulters
)

Now we've initialized feature store, all the operations that we are gonna do using feature store, will store under this specific project and user.

Defining a Entity key which will act as a primary key for the feature view that we are gonna create.

python
entity_key = Entity(name = "id", value_type = ValueType.INT64)

Before creating a Feature view, please make sure that your event time stamp column's data type is Accurate.

Then only we can perform the point-in-time joins in the future stages

After Defining the Entity key, We will create a batch source to provie the data to feature store.

We can provide the data in various formats like Data frame source or File Source.

python
batch_source = DataFrameSource(
dataframe, # Provide your dataframe
event_timestamp_column = "event_timestamp", #Event timestamp column name

Once we defined the Data Source. We need to define the columns, which we want to store inside the feature store.

python
cols = [
'annual_inc', 'short_emp', 'emp_length_num',
'dti', 'last_delinq_none', 'revol_util', 'total_rec_late_fee',
'od_ratio', 'grade_A', 'grade_B', 'grade_C', 'grade_D',
'grade_E', 'grade_F', 'grade_G', 'home_ownership_MORTGAGE',
'home_ownership_OWN', 'home_ownership_RENT', 'purpose_car',
'purpose_credit_card', 'purpose_debt_consolidation',
'purpose_home_improvement', 'purpose_house', 'purpose_major_purchase',
'purpose_medical', 'purpose_moving', 'purpose_other',
'purpose_small_business', 'purpose_vacation', 'purpose_wedding',
'term_ 36 months', 'term_ 60 months'
]

After that we will create a Feature View, that defines the structure of the table which is going to be stored in the Offline Store.

python
default_view = FeatureView(
name = "default_loan_feature_view", # A name for your feature view
entities = ["id"], # Entity Key
ttl = '2d',
features = cols, # Columns you want in Feature Store
batch_source = batch_source, # Your source object eg. FileSource or DataFrameSource.
)

Now we have successfully defined all the required things to ingest the data into feature store. With these we can write the feature data to Offline store.

python
fs.write_table([entity_key, default_view])

Now we have successfully written the Data to an offline store, these files will get stored in your private bucket.