Steps
Below contains detailed reference documentation for working with Steps in Graphbook. You can create steps with decorators and functions or by extending any of the following base classes.
See also
Decorators graphbook.step()
, graphbook.batch()
, graphbook.source()
, and graphbook.prompt()
to create steps in a functional way.
- class graphbook.steps.Step(item_key=None)
The base class of the executable workflow node, step. All other step classes should be a descendant of this class.
- forward_note(note: Note) str | Dict[str, List[Note]]
Routes a Note. Must return the corresponding output key or a dictionary that contains Notes. Is called after on_after_item().
- Parameters:
note (Note) – The Note input
- Returns:
A string that the note is associated with, or if multiple notes are being processed at a time, a StepOutput may be used.
- log(message: str, type: str = 'info')
Logs a message
- Parameters:
message (str) – message to log
type (str) – type of log
- on_after_item(note: Note)
Executes upon receiving a Note and after processing items
- Parameters:
note (Note) – The Note input
- on_clear()
Executes when a request to clear the step is made. This is useful for steps that have internal states that need to be reset.
- class graphbook.steps.BatchStep(batch_size, item_key)
A Step used for batch and parallel processing using custom function definitions. Override the load_fn and dump_fn methods with your custom logic to load and dump data, respectively.
- dump_data(note: Note, output)
Dumps data to be processed by the worker pool. This is called after the step processes the batch of items. The class must have a dump_fn method that takes in the output data and returns a string path to the dumped data.
- Parameters:
note (Note) – The Note input
item_key (str) – The item key to dump
output (Any) – The output data to dump
- in_q(note: Note | None)
Enqueue a note to be processed by the step
- Parameters:
note (Note) – The Note input
- static load_fn(**args)
The load function to be overriden by BatchSteps that will forward preprocessed data to on_item_batch.
- on_clear()
Executes when a request to clear the step is made. This is useful for steps that have internal states that need to be reset.
- on_item_batch(outputs, items, notes) List[Tuple[Any]] | None
Called when B items are loaded and are ready to be processed where B is batch_size. This is meant to be overriden by subclasses.
- Parameters:
outputs (List[Any]) – The list of loaded outputs of length B
items (List[Any]) – The list of anys of length B associated with outputs. This list has the same order as outputs does along the batch dimension
notes (List[Note]) – The list of Notes of length B associated with outputs. This list has the same order as outputs does along the batch dimension
- Returns:
The output data to be dumped as a list of parameters to be passed to dump_fn. If None is returned, nothing will be dumped.
- Return type:
List[Tuple[Any]] | None
- class graphbook.steps.PromptStep
A Step that is capable of prompting the user for input. This is useful for interactive workflows where data labeling, model evaluation, or any other human input is required. Once the prompt is handled, the execution lifecycle of the Step will proceed, normally.
- get_prompt(note: Note) dict
Returns the prompt to be displayed to the user. This method can be overriden by the subclass. By default, it will return a boolean prompt. If None is returned, the prompt will be skipped on this note. A list of available prompts can be found in
graphbook.prompts
.- Parameters:
note (Note) – The Note input to display to the user
See also
graphbook.prompts
for available user prompts.
- class graphbook.steps.SourceStep
A Step that accepts no input but produce outputs. This this class will attempt to load all data at once, so it is recommended to use GeneratorSourceStep especially for large datasets.
- load() Dict[str, List[Note]]
Function to load data and convert into Notes. Must output a dictionary of Notes.
- class graphbook.steps.GeneratorSourceStep
A Step that accepts no input but produce outputs.
- load() Generator[Dict[str, List[Note]], None, None]
Function to load data and convert into Notes. Must output a generator that yields a dictionary of Notes.