Steps

Below contains detailed reference documentation for working with Steps in Graphbook. You can create steps with decorators and functions or by extending any of the following base classes.

See also

Decorators graphbook.step(), graphbook.batch(), graphbook.source(), and graphbook.prompt() to create steps in a functional way.

class graphbook.steps.Step(item_key=None)

The base class of the executable workflow node, step. All other step classes should be a descendant of this class.

forward_note(note: Note) str | Dict[str, List[Note]]

Routes a Note. Must return the corresponding output key or a dictionary that contains Notes. Is called after on_after_item().

Parameters:

note (Note) – The Note input

Returns:

A string that the note is associated with, or if multiple notes are being processed at a time, a StepOutput may be used.

log(message: str, type: str = 'info')

Logs a message

Parameters:
  • message (str) – message to log

  • type (str) – type of log

on_after_item(note: Note)

Executes upon receiving a Note and after processing items

Parameters:

note (Note) – The Note input

on_clear()

Executes when a request to clear the step is made. This is useful for steps that have internal states that need to be reset.

on_end()

Executes upon end of graph execution

on_item(item: Any, note: Note)

Executes upon receiving an item. Is called after on_note() and before on_after_item().

Parameters:
  • item (Any) – The item to process

  • note (Note) – The Note that the item belongs to

on_note(note: Note)

Executes upon receiving a Note

Parameters:

note (Note) – The Note input

on_start()

Executes upon start of graph execution

remove_children()

Removes all children steps

set_child(child: Step, slot_name: str = 'out')

Sets a child step

Parameters:
  • child (Step) – child step

  • slot_name (str) – slot to bind the child to

class graphbook.steps.BatchStep(batch_size, item_key)

A Step used for batch processing. This step will consume Pytorch tensor batches loaded by the worker pool by default.

dump_data(note: Note, output)

Dumps data to be processed by the worker pool. This is called after the step processes the batch of items. The class must have a dump_fn method that takes in the output data and returns a string path to the dumped data.

Parameters:
  • note (Note) – The Note input

  • item_key (str) – The item key to dump

  • output (Any) – The output data to dump

static dump_fn(**args)

The dump function to be overriden by BatchSteps that write outputs to disk.

in_q(note: Note | None)

Enqueue a note to be processed by the step

Parameters:

note (Note) – The Note input

static load_fn(**args)

The load function to be overriden by BatchSteps that will forward preprocessed data to on_item_batch.

on_clear()

Executes when a request to clear the step is made. This is useful for steps that have internal states that need to be reset.

on_item_batch(outputs, items, notes) List[Tuple[Any]] | None

Called when B items are loaded and are ready to be processed where B is batch_size. This is meant to be overriden by subclasses.

Parameters:
  • outputs (List[Any]) – The list of loaded outputs of length B

  • items (List[Any]) – The list of anys of length B associated with outputs. This list has the same order as outputs does along the batch dimension

  • notes (List[Note]) – The list of Notes of length B associated with outputs. This list has the same order as outputs does along the batch dimension

Returns:

The output data to be dumped as a list of parameters to be passed to dump_fn. If None is returned, nothing will be dumped.

Return type:

List[Tuple[Any]] | None

class graphbook.steps.PromptStep

A Step that is capable of prompting the user for input. This is useful for interactive workflows where data labeling, model evaluation, or any other human input is required. Once the prompt is handled, the execution lifecycle of the Step will proceed, normally.

get_prompt(note: Note) dict

Returns the prompt to be displayed to the user. This method can be overriden by the subclass. By default, it will return a boolean prompt. If None is returned, the prompt will be skipped on this note. A list of available prompts can be found in graphbook.prompts.

Parameters:

note (Note) – The Note input to display to the user

on_clear()

Clears any awaiting prompts and the prompt queue. If you plan on overriding this method, make sure to call super().on_clear() to ensure the prompt queue is cleared.

on_prompt_response(note: Note, response: Any)

Called when the user responds to the prompt. This method must be overriden by the subclass.

Parameters:
  • note (Note) – The Note input that was prompted

  • response (Any) – The user’s response

See also

graphbook.prompts for available user prompts.

class graphbook.steps.SourceStep

A Step that accepts no input but produce outputs. This this class will attempt to load all data at once, so it is recommended to use GeneratorSourceStep especially for large datasets.

load() Dict[str, List[Note]]

Function to load data and convert into Notes. Must output a dictionary of Notes.

class graphbook.steps.GeneratorSourceStep

A Step that accepts no input but produce outputs.

load() Generator[Dict[str, List[Note]], None, None]

Function to load data and convert into Notes. Must output a generator that yields a dictionary of Notes.

on_clear()

Executes when a request to clear the step is made. This is useful for steps that have internal states that need to be reset.