Advanced Topics¶
Passing specific features into a task¶
Certain features of the same data type may need to be processed differently than others.
For example, suppose you are working on solving a problem which has a dataset containing text features. One of which lends itself well to using word-grams for preprocessing, while the other char-grams.
When using Composable ML in DataRobot, a user may pass one or more specific features to another task.
Any time project-specific functionality is being used, make sure to:
w.set_project(project_id="<project_id>")
# or
# w = Workshop(project_id="<project_id>")
Here we only select the “Age” feature, perform missing value imputation, and pass it
to the Keras neural network classifier. Note that similar to other pieces of functionality,
you may auto-complete feature names with w.Features.<tab>
to complete available features.
features = w.FeatureSelection(w.Features.Age)
pni = w.Tasks.PNI2(features)
keras = w.Tasks.KERASC(pni)
keras_blueprint = w.BlueprintGraph(keras)
You may link a blueprint to a specific project, if desired. This will ensure the blueprint is validated based on the _linked_ project, e.g. ensuring the selected features exist in the dataset associated with the project.
# Make sure it is saved at least once, or pass `user_blueprint_id` to `link_to_project`
keras_blueprint.save()
keras_blueprint.link_to_project(project_id="<project_id>")
To pass only the desired column into a task, add the Task “Single Column Converter” or “Multiple Column Converter”. Then, pick the column name from the original dataset as the parameter “column_name” or “column_names”. The following task(s) will only receive the selected column(s).
Click Update and then Save Blueprint to see the new task referencing the chosen column. Here’s an example of a blueprint performing specific preprocessing on certain columns. Notice how each column name is observable.
Continuing with this example, you can also pass all columns to another task. To do so, add a new connection from “Numeric Variables” to the desired task.
You may link a blueprint to a specific project, if desired. This will ensure the blueprint is validated based on the _linked_ project, e.g. ensuring the selected features exist in the dataset associated with the project.
Features may also be excluded instead, which is particularly useful when a particular feature should be processed one way, and everything else, processed another way.
without_insurance_type = w.FeatureSelection(w.Features.Insurance_Type, exclude=True)
only_insurance_type = w.FeatureSelection(w.Features.Insurance_Type)
one_hot = w.Tasks.PDM3(without_insurance_type)
ordinal = w.Tasks.ORDCAT2(only_insurance_type)
keras = w.Tasks.KERASC(one_hot, ordinal)
keras_blueprint = w.BlueprintGraph(keras)
To process certain features in different ways, add the Task “Multiple Column Converter”. Then, pick the column name to process in one way from the original dataset as the parameter “column_names”.
Next, create an edge from categorical (for this example, any can be chosen) data to the modeler, insert the alternative processing task, and then add a second “Multiple Column Converter” and pick the same column name and change “method” to be “exclude”.
Now one column will be processed using one task, and all others will be processed using a different task.