Walkthrough

The following is meant to provide a walkthrough of the most popularly used functionality in the Blueprint Workshop.

Separate examples are available for leveraging custom tasks and selecting specific columns from a project’s dataset.

[1]:
import datarobot as dr
[2]:
from datarobot_bp_workshop import Workshop, Visualize
[3]:
with open('../../../api.token', 'r') as f:
    token = f.read()
    dr.Client(token=token, endpoint='https://app.datarobot.com/api/v2')

Initialize

[4]:
w = Workshop()

Construct a Blueprint

[5]:
w.Task('PNI2')
[5]:
Missing Values Imputed (quick median) (PNI2)

Input Summary: (None)
Output Method: TaskOutputMethod.TRANSFORM
[6]:
w.Tasks.PNI2()
[6]:
Missing Values Imputed (quick median) (PNI2)

Input Summary: (None)
Output Method: TaskOutputMethod.TRANSFORM
[7]:
pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
keras_blueprint = w.BlueprintGraph(keras, name='A blueprint I made with the Python API').save()
[8]:
user_blueprint_id = keras_blueprint.user_blueprint_id

Visualize It

[9]:
keras_blueprint.show()
../../_images/examples_walkthrough_Walkthrough_12_0.png

Inspecting Tasks

[10]:
pni
[10]:
Missing Values Imputed (quick median) (PNI2)

Input Summary: Numeric Data
Output Method: TaskOutputMethod.TRANSFORM
[11]:
rdt
[11]:
Smooth Ridit Transform (RDT5)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM
[12]:
binning
[12]:
Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM
[13]:
keras
[13]:
Keras Neural Network Classifier (KERASC)

Input Summary: Smooth Ridit Transform (RDT5) | Binning of numerical variables (BINNING)
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  learning_rate (learning_rate) = 0.123
[14]:
keras.task_parameters.learning_rate
[14]:
0.123
[15]:
keras.task_parameters.batch_size = 32
[16]:
keras
[16]:
Keras Neural Network Classifier (KERASC)

Input Summary: Smooth Ridit Transform (RDT5) | Binning of numerical variables (BINNING)
Output Method: TaskOutputMethod.PREDICT

Task Parameters:
  batch_size (batch_size) = 32
  learning_rate (learning_rate) = 0.123
[17]:
keras_blueprint
[17]:
Name: 'A blueprint I made with the Python API'

Input Data: Numeric
Tasks: Missing Values Imputed (quick median) | Smooth Ridit Transform | Binning of numerical variables | Keras Neural Network Classifier

Validation

We intentionally feed the wrong input data type

[18]:
pni = w.Tasks.PNI2(w.TaskInputs.CAT)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
invalid_keras_blueprint = w.BlueprintGraph(keras)
[19]:
invalid_keras_blueprint.save('A blueprint with warnings (PythonAPI)', user_blueprint_id=user_blueprint_id).show()
../../_images/examples_walkthrough_Walkthrough_25_0.png
[20]:
binning.set_task_parameters_by_name(max_bins=-22)
[20]:
Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

Task Parameters:
  max_bins (b) = -22
[21]:
invalid_keras_blueprint.save('A blueprint with warnings (PythonAPI)', user_blueprint_id=user_blueprint_id).show()
Binning of numerical variables (BINNING)

  Invalid value(s) supplied
    max_bins (b) = -22
      - Must be a 'intgrid' parameter defined by: [2, 500]

Failed to save: parameter validation failed.
../../_images/examples_walkthrough_Walkthrough_27_1.png
[22]:
keras.validate_task_parameters()
Keras Neural Network Classifier (KERASC)

All parameters valid!

[22]:

Update to the Original Valid Blueprint

[23]:
pni = w.Tasks.PNI2(w.TaskInputs.NUM)
rdt = w.Tasks.RDT5(pni)
binning = w.Tasks.BINNING(pni)
keras = w.Tasks.KERASC(rdt, binning)
keras.set_task_parameters_by_name(learning_rate=0.123)
keras_blueprint = w.BlueprintGraph(keras)
blueprint_graph = keras_blueprint.save('A blueprint I made with the Python API', user_blueprint_id=user_blueprint_id)

Help with Tasks

[24]:
help(w.Tasks.PNI2)
Help on PNI2 in module datarobot_bp_workshop.factories object:

class PNI2(datarobot_bp_workshop.friendly_repr.FriendlyRepr)
 |  Missing Values Imputed (quick median)
 |
 |  Impute missing values on numeric variables with their median and create indicator variables to mark imputed records
 |
 |  Parameters
 |  ----------
 |  output_method: string, one of (TaskOutputMethod.TRANSFORM).
 |  task_parameters: dict, which may contain:
 |
 |    scale_small (s): select, (Default=0)
 |      Possible Values: [False, True]
 |
 |    threshold (t): int, (Default=10)
 |      Possible Values: [1, 99999]
 |
 |  Method resolution order:
 |      PNI2
 |      datarobot_bp_workshop.friendly_repr.FriendlyRepr
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __call__(zelf, *inputs, output_method=None, task_parameters=None, output_method_parameters=None, x_transformations=None, y_transformations=None, freeze=False, version=None)
 |
 |  __friendly_repr__(zelf)
 |
 |  documentation(zelf, auto_open=False)
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  description = 'Impute missing values on numeric variables with ...eate...
 |
 |  label = 'Missing Values Imputed (quick median)'
 |
 |  task_code = 'PNI2'
 |
 |  task_parameters = scale_small (s): select, (Default=0)
 |
 |  threshold (t):...
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from datarobot_bp_workshop.friendly_repr.FriendlyRepr:
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from datarobot_bp_workshop.friendly_repr.FriendlyRepr:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

List Task Categories

[25]:
w.list_categories(show_tasks=False)
Custom

Preprocessing

  Numeric Preprocessing

    Data Quality

    Dimensionality Reducer

    Scaling

  Categorical Preprocessing

  Text Preprocessing

  Image Preprocessing

  Summarized Categorical Preprocessing

  Geospatial Preprocessing

Models

  Regression

  Binary Classification

  Multi-class Classification

  Boosting

  Unsupervised

    Anomaly Detection

    Clustering

Calibration

Other

  Column Selection

  Automatic Feature Selection

[25]:

Search for Tasks by Name

[26]:
w.search_tasks('keras')
[26]:
Keras Autoencoder with Calibration: [KERAS_AUTOENCODER_CAL]
  - Keras Autoencoder for Anomaly Detection with Calibration


Keras Autoencoder: [KERAS_AUTOENCODER]
  - Keras Autoencoder for Anomaly Detection


Keras Neural Network Classifier: [KERASC]
  - Keras Neural Network Classifier


Keras Neural Network Classifier: [KERASMULTIC]
  - Keras Neural Network Multi-Class Classifier


Keras Neural Network Regressor: [KERASR]
  - Keras Neural Network Regressor


Keras Variational Autoencoder with Calibration: [KERAS_VARIATIONAL_AUTOENCODER_CAL]
  - Keras Variational Autoencoder for Anomaly Detection with Calibration


Keras Variational Autoencoder: [KERAS_VARIATIONAL_AUTOENCODER]
  - Keras Variational Autoencoder for Anomaly Detection


Keras encoding of text variables: [KERAS_TOKENIZER]
  - Text encoding based on Keras Tokenizer class


Regularized Quantile Regressor with Keras: [KERAS_REGULARIZED_QUANTILE_REG]
  - Regularized Quantile Regression implemented in Keras

Search Custom Tasks

[27]:
w.search_tasks('Awesome')
[27]:
Awesome Model: [CUSTOMR_6019ae978cc598a46199cee1]
  - This is the best model ever.

Quick Description

[32]:
w.Tasks.PDM3.description
[32]:
'One-Hot (or dummy-variable) transformation of categorical features'

View Documentation For a Task

[33]:
binning.documentation()
[33]:
'https://app.datarobot.com/model-docs/tasks/BINNING-Binning-of-numerical-variables.html'

View Task Parameter Values

As an example, let’s look at the Binning Task

[34]:
binning.get_task_parameter_by_name('max_bins')
[34]:
20

Modify A Task Parameter

[35]:
binning.set_task_parameters_by_name(max_bins=22)
[35]:
Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

Task Parameters:
  max_bins (b) = 22

Or set with the key / short-name directly

[36]:
binning.task_parameters.b = 22

Validate Parameters

[37]:
binning.task_parameters.b = -22
[38]:
binning.validate_task_parameters()
Binning of numerical variables (BINNING)

  Invalid value(s) supplied
    max_bins (b) = -22
      - Must be a 'intgrid' parameter defined by: [2, 500]

[38]:

[39]:
binning.set_task_parameters(b=22)
[39]:
Binning of numerical variables (BINNING)

Input Summary: Missing Values Imputed (quick median) (PNI2)
Output Method: TaskOutputMethod.TRANSFORM

Task Parameters:
  max_bins (b) = 22

Make sure it’s valid…

[40]:
binning.validate_task_parameters()
Binning of numerical variables (BINNING)

All parameters valid!

[40]:

Update an existing blueprint in personal repository by passing the user_blueprint_id

[41]:
blueprint_graph = keras_blueprint.save('A blueprint I made with the Python API (updated)', user_blueprint_id=user_blueprint_id)
[42]:
assert user_blueprint_id == blueprint_graph.user_blueprint_id

Retrieve a Blueprint from your Saved Blueprints

[43]:
w.get(user_blueprint_id).show()
../../_images/examples_walkthrough_Walkthrough_65_0.png

Retrieve Blueprints From Personal Blueprints Repository

[44]:
for bp in w.list(limit=3):
    bp.show()
../../_images/examples_walkthrough_Walkthrough_67_0.png
../../_images/examples_walkthrough_Walkthrough_67_1.png
../../_images/examples_walkthrough_Walkthrough_67_2.png

Delete a Blueprint From Your Personal Repository

[45]:
w.delete(user_blueprint_id)
Blueprints deleted.

Existing Blueprints API to Retrieve Leaderboard Blueprints

[46]:
project_id = '5eb9656901f6bb026828f14e'
project = dr.Project.get(project_id)
menu = project.get_blueprints()
[47]:
for bp in menu[6:9]:
    Visualize.show_dr_blueprint(bp)
../../_images/examples_walkthrough_Walkthrough_72_0.png
../../_images/examples_walkthrough_Walkthrough_72_1.png
../../_images/examples_walkthrough_Walkthrough_72_2.png

Clone a Blueprint From a Leaderboard

[48]:
ridge = menu[7]
blueprint_graph = w.clone(blueprint_id=ridge.id, project_id=project_id)
blueprint_graph.show()
../../_images/examples_walkthrough_Walkthrough_74_0.png
[49]:
ridge.id, project_id
[49]:
('1774086bd8bfd4e1f45c5ff503a99ee2', '5eb9656901f6bb026828f14e')

Any Blueprint Can Be Used as a Tutorial

[50]:
source_code = blueprint_graph.to_source_code(to_stdout=True)
w = Workshop(user_blueprint_id='61d5db87e0f01fe2a6ce3335')

rst = w.Tasks.RST(w.TaskInputs.DATE)

pdm3 = w.Tasks.PDM3(w.TaskInputs.CAT)
pdm3.set_task_parameters(cm=500, sc=25)

gs = w.Tasks.GS(w.TaskInputs.NUM)

enetcd = w.Tasks.ENETCD(rst, pdm3, gs)
enetcd.set_task_parameters(a=0)

enetcd_blueprint = w.BlueprintGraph(enetcd, name='Ridge Regressor')

Execute it

[51]:
eval(compile(source_code, 'blueprint', 'exec'))
[52]:
enetcd_blueprint.show()
../../_images/examples_walkthrough_Walkthrough_80_0.png

Delete the original (or any) Blueprint directly

[53]:
blueprint_graph.delete()
Blueprint deleted.

Modify the source code

[54]:
#w = Workshop()

rst = w.Tasks.RST(w.TaskInputs.DATE)

# Use numeric data cleansing instead
ndc = w.Tasks.NDC(w.TaskInputs.NUM)

pdm3 = w.Tasks.PDM3(w.TaskInputs.CAT)
pdm3.set_task_parameters(cm=500, sc=25)

enetcd = w.Tasks.ENETCD(rst, ndc, pdm3)
enetcd.set_task_parameters(a=0.0)

enetcd_blueprint = w.BlueprintGraph(enetcd, name='Ridge Regressor')
[55]:
enetcd_blueprint.show()
../../_images/examples_walkthrough_Walkthrough_85_0.png

Add the Blueprint to a Project and Train It

[56]:
project_id = '5eb9656901f6bb026828f14e'
[57]:
enetcd_blueprint.save()
[57]:
Name: 'Ridge Regressor'

Input Data: Date | Categorical | Numeric
Tasks: Standardize | One-Hot Encoding | Numeric Data Cleansing | Elastic-Net Regressor (L1 / Least-Squares Loss)
[58]:
enetcd_blueprint.train(project_id=project_id)
Training requested! Blueprint Id: fa329535f1e5f5465e2c55024aacb910
[58]:
Name: 'Ridge Regressor'

Input Data: Date | Categorical | Numeric
Tasks: Standardize | One-Hot Encoding | Numeric Data Cleansing | Elastic-Net Regressor (L1 / Least-Squares Loss)
[59]:
enetcd_blueprint.delete()
Blueprint deleted.