histocartography.pipeline module

Pipeline utilities

Summary

Classes:

BatchPipelineRunner

PipelineRunner

PipelineStep

Base pipelines step

class PipelineStep(save_path: Union[None, str, pathlib.Path] = None, precompute: bool = True, link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None)[source]

Bases: abc.ABC

Base pipelines step

__init__(save_path: Union[None, str, pathlib.Path] = None, precompute: bool = True, link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None)None[source]

Abstract class that helps with saving and loading precomputed results

Parameters
  • save_path (Union[None, str, Path], optional) – Base path to save results to. When set to None, the results are not saved to disk. Defaults to None.

  • precompute (bool, optional) – Whether to perform the precomputation necessary for the step. Defaults to True.

  • link_path (Union[None, str, Path], optional) – Path to link the output directory to. When None, no link is created. Only supported when save_path is not None. Defaults to None.

  • precompute_path (Union[None, str, Path], optional) – Path to save the output of the precomputation to. If not specified it defaults to the output directory of the step when save_path is not None. Defaults to None.

precompute(link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None)None[source]

Precompute all necessary information for this step

Parameters
  • link_path (Union[None, str, Path], optional) – Path to link the output to. Defaults to None.

  • precompute_path (Union[None, str, Path], optional) – Path to load/save the precomputation outputs. Defaults to None.

process(*args: Any, output_name: Optional[str] = None, **kwargs: Any)Any[source]

Main process function of the step and outputs the result. Try to saves the output when output_name is passed.

Parameters

output_name (Optional[str], optional) – Unique identifier of the passed datapoint. Defaults to None.

Returns

Result of the pipeline step

Return type

Any

class PipelineRunner(output_path: Optional[str] = None, inputs: Optional[Iterable[str]] = None, outputs: Optional[Iterable[str]] = None, stages: Iterable[dict] = [], save_intermediate: bool = False, precompute: bool = True)[source]

Bases: object

__init__(output_path: Optional[str] = None, inputs: Optional[Iterable[str]] = None, outputs: Optional[Iterable[str]] = None, stages: Iterable[dict] = [], save_intermediate: bool = False, precompute: bool = True)None[source]

Create a pipeline runner for a given configuration

Parameters
  • output_path (Optional[str], optional) – Path to the output and intermediate files. When set to None the runner does not save the outputs. Defaults to None.

  • inputs (Optional[Iterable[str]], optional) – Inputs to the pipeline. Defaults to None.

  • outputs (Optional[Iterable[str]], optional) – Outputs of the pipeline. Defaults to None.

  • stages (Iterable[dict], optional) – Stages to complete. Defaults to [].

  • save_intermediate (bool, optional) – Whether to save the intermediate steps. Defaults to False.

  • precompute (bool, optional) – Whether to perform the precomputation steps. Defaults to True.

precompute(save_intermediate: bool)None[source]

Run the precomputation step of the pipeline.

Parameters

save_intermediate (bool) – Whether to save intermediate outputs

run(output_name: Optional[str] = None, **inputs: Dict[str, Any])Dict[str, Any][source]

Run the preprocessing pipeline for a given name and input parameters and return the specified outputs

Parameters

output_name (Optional[str], optional) – Unique identifier of the datapoint. Defaults to None.

Returns

Output of the pipeline as defined in the configuration

Return type

Dict[str, Any]

class BatchPipelineRunner(pipeline_config: Dict[str, Any], save_path: Optional[str], save_intermediate: bool = False)[source]

Bases: object

__init__(pipeline_config: Dict[str, Any], save_path: Optional[str], save_intermediate: bool = False)None[source]

Run Helper that runs the pipeline for multiple inputs with multiprocessing support

Parameters
  • pipeline_config (Dict[str, Any]) – Configuration of the pipeline

  • save_path (Optional[str]) – Path to save the outputs to

  • save_intermediate (bool, optional) – Whether to save intermediate outputs. Defaults to False.

Creates a symlink between the output directory of the pipeline and the provided path.

Overwrites link if it already exists.

Parameters

link_directory (str) – Path to link the output directory to

precompute()None[source]

Precompute all necessary information for all stages

run(metadata: pandas.core.frame.DataFrame, cores: int = 1, return_out: bool = False)Optional[Dict[str, Dict[str, Any]]][source]
Runs the pipeline for the provided metadata dataframe and a specified

number of cores for multiprocessing. Does not support saving of outputs

Parameters
  • metadata (pd.DataFrame) – Dataframe with the columns as defined in the config inputs

  • cores (int, optional) – Number of cores to use for multiprocessing. Defaults to 1.

  • return_out (bool, optional) – If the method should also return the output batch data. If True, make sure you have enough memory. Only supported for single-core processing. Default to False.

Returns

If return_out is True, returns the processed output.

Otherwise returns None

Return type

batched_out (Optional[Dict[str, Dict[str, Any]]])

Reference

If you use histocartography in your projects, please cite the following:

@inproceedings{pati2021,
    title = {Hierarchical Graph Representations for Digital Pathology},
    author = {Pushpak Pati, Guillaume Jaume, Antonio Foncubierta, Florinda Feroce, Anna Maria Anniciello, Giosuè Scognamiglio, Nadia Brancati, Maryse Fiche, Estelle Dubruc, Daniel Riccio, Maurizio Di Bonito, Giuseppe De Pietro, Gerardo Botti, Jean-Philippe Thiran, Maria Frucci, Orcun Goksel, Maria Gabrani},
    booktitle = {https://arxiv.org/pdf/2102.11057},
    year = {2021}
}