histocartography.pipeline module¶

Pipeline utilities

Summary¶

Classes:

`BatchPipelineRunner`
`PipelineRunner`
`PipelineStep`	Base pipelines step

class PipelineStep(save_path: Union[None, str, pathlib.Path] = None, precompute: bool = True, link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None)[source]¶

Bases: abc.ABC

Base pipelines step

__init__(save_path: Union[None, str, pathlib.Path] = None, precompute: bool = True, link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None) → None[source]¶

Abstract class that helps with saving and loading precomputed results

Parameters

save_path (Union[None, str, Path], optional) – Base path to save results to. When set to None, the results are not saved to disk. Defaults to None.
precompute (bool, optional) – Whether to perform the precomputation necessary for the step. Defaults to True.
link_path (Union[None, str, Path], optional) – Path to link the output directory to. When None, no link is created. Only supported when save_path is not None. Defaults to None.
precompute_path (Union[None, str, Path], optional) – Path to save the output of the precomputation to. If not specified it defaults to the output directory of the step when save_path is not None. Defaults to None.

precompute(link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None) → None[source]¶

Precompute all necessary information for this step

Parameters

link_path (Union[None, str, Path], optional) – Path to link the output to. Defaults to None.
precompute_path (Union[None, str, Path], optional) – Path to load/save the precomputation outputs. Defaults to None.

process(*args: Any, output_name: Optional[str] = None, **kwargs: Any) → Any[source]¶

Main process function of the step and outputs the result. Try to saves the output when output_name is passed.

Parameters: output_name (Optional[str], optional) – Unique identifier of the passed datapoint. Defaults to None.
Returns: Result of the pipeline step
Return type: Any

class PipelineRunner(output_path: Optional[str] = None, inputs: Optional[Iterable[str]] = None, outputs: Optional[Iterable[str]] = None, stages: Iterable[dict] = [], save_intermediate: bool = False, precompute: bool = True)[source]¶

Bases: object

__init__(output_path: Optional[str] = None, inputs: Optional[Iterable[str]] = None, outputs: Optional[Iterable[str]] = None, stages: Iterable[dict] = [], save_intermediate: bool = False, precompute: bool = True) → None[source]¶

Create a pipeline runner for a given configuration

Parameters

output_path (Optional[str], optional) – Path to the output and intermediate files. When set to None the runner does not save the outputs. Defaults to None.
inputs (Optional[Iterable[str]], optional) – Inputs to the pipeline. Defaults to None.
outputs (Optional[Iterable[str]], optional) – Outputs of the pipeline. Defaults to None.
stages (Iterable[dict], optional) – Stages to complete. Defaults to [].
save_intermediate (bool, optional) – Whether to save the intermediate steps. Defaults to False.
precompute (bool, optional) – Whether to perform the precomputation steps. Defaults to True.

precompute(save_intermediate: bool) → None[source]¶

Run the precomputation step of the pipeline.

Parameters: save_intermediate (bool) – Whether to save intermediate outputs

run(output_name: Optional[str] = None, **inputs: Dict[str, Any]) → Dict[str, Any][source]¶

Run the preprocessing pipeline for a given name and input parameters and return the specified outputs

Parameters: output_name (Optional[str], optional) – Unique identifier of the datapoint. Defaults to None.
Returns: Output of the pipeline as defined in the configuration
Return type: Dict[str, Any]

class BatchPipelineRunner(pipeline_config: Dict[str, Any], save_path: Optional[str], save_intermediate: bool = False)[source]¶

Bases: object

__init__(pipeline_config: Dict[str, Any], save_path: Optional[str], save_intermediate: bool = False) → None[source]¶

Run Helper that runs the pipeline for multiple inputs with multiprocessing support

Parameters

pipeline_config (Dict[str, Any]) – Configuration of the pipeline
save_path (Optional[str]) – Path to save the outputs to
save_intermediate (bool, optional) – Whether to save intermediate outputs. Defaults to False.

link_output(link_directory: str) → None[source]¶

Creates a symlink between the output directory of the pipeline and the provided path.: Overwrites link if it already exists.

Parameters: link_directory (str) – Path to link the output directory to

precompute() → None[source]¶: Precompute all necessary information for all stages

run(metadata: pandas.core.frame.DataFrame, cores: int = 1, return_out: bool = False) → Optional[Dict[str, Dict[str, Any]]][source]¶

Runs the pipeline for the provided metadata dataframe and a specified: number of cores for multiprocessing. Does not support saving of outputs

Parameters

metadata (pd.DataFrame) – Dataframe with the columns as defined in the config inputs
cores (int, optional) – Number of cores to use for multiprocessing. Defaults to 1.
return_out (bool, optional) – If the method should also return the output batch data. If True, make sure you have enough memory. Only supported for single-core processing. Default to False.

Returns

If return_out is True, returns the processed output.: Otherwise returns None

Return type

batched_out (Optional[Dict[str, Dict[str, Any]]])

Reference¶

If you use histocartography in your projects, please cite the following:

@inproceedings{pati2021,
    title = {Hierarchical Graph Representations for Digital Pathology},
    author = {Pushpak Pati, Guillaume Jaume, Antonio Foncubierta, Florinda Feroce, Anna Maria Anniciello, Giosuè Scognamiglio, Nadia Brancati, Maryse Fiche, Estelle Dubruc, Daniel Riccio, Maurizio Di Bonito, Giuseppe De Pietro, Gerardo Botti, Jean-Philippe Thiran, Maria Frucci, Orcun Goksel, Maria Gabrani},
    booktitle = {https://arxiv.org/pdf/2102.11057},
    year = {2021}
}