histocartography.pipeline module¶
Pipeline utilities
Summary¶
Classes:
Base pipelines step |
- class PipelineStep(save_path: Union[None, str, pathlib.Path] = None, precompute: bool = True, link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None)[source]¶
Bases:
abc.ABC
Base pipelines step
- __init__(save_path: Union[None, str, pathlib.Path] = None, precompute: bool = True, link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None) → None[source]¶
Abstract class that helps with saving and loading precomputed results
- Parameters
save_path (Union[None, str, Path], optional) – Base path to save results to. When set to None, the results are not saved to disk. Defaults to None.
precompute (bool, optional) – Whether to perform the precomputation necessary for the step. Defaults to True.
link_path (Union[None, str, Path], optional) – Path to link the output directory to. When None, no link is created. Only supported when save_path is not None. Defaults to None.
precompute_path (Union[None, str, Path], optional) – Path to save the output of the precomputation to. If not specified it defaults to the output directory of the step when save_path is not None. Defaults to None.
- precompute(link_path: Union[None, str, pathlib.Path] = None, precompute_path: Union[None, str, pathlib.Path] = None) → None[source]¶
Precompute all necessary information for this step
- Parameters
link_path (Union[None, str, Path], optional) – Path to link the output to. Defaults to None.
precompute_path (Union[None, str, Path], optional) – Path to load/save the precomputation outputs. Defaults to None.
- process(*args: Any, output_name: Optional[str] = None, **kwargs: Any) → Any[source]¶
Main process function of the step and outputs the result. Try to saves the output when output_name is passed.
- Parameters
output_name (Optional[str], optional) – Unique identifier of the passed datapoint. Defaults to None.
- Returns
Result of the pipeline step
- Return type
Any
- class PipelineRunner(output_path: Optional[str] = None, inputs: Optional[Iterable[str]] = None, outputs: Optional[Iterable[str]] = None, stages: Iterable[dict] = [], save_intermediate: bool = False, precompute: bool = True)[source]¶
Bases:
object
- __init__(output_path: Optional[str] = None, inputs: Optional[Iterable[str]] = None, outputs: Optional[Iterable[str]] = None, stages: Iterable[dict] = [], save_intermediate: bool = False, precompute: bool = True) → None[source]¶
Create a pipeline runner for a given configuration
- Parameters
output_path (Optional[str], optional) – Path to the output and intermediate files. When set to None the runner does not save the outputs. Defaults to None.
inputs (Optional[Iterable[str]], optional) – Inputs to the pipeline. Defaults to None.
outputs (Optional[Iterable[str]], optional) – Outputs of the pipeline. Defaults to None.
stages (Iterable[dict], optional) – Stages to complete. Defaults to [].
save_intermediate (bool, optional) – Whether to save the intermediate steps. Defaults to False.
precompute (bool, optional) – Whether to perform the precomputation steps. Defaults to True.
- precompute(save_intermediate: bool) → None[source]¶
Run the precomputation step of the pipeline.
- Parameters
save_intermediate (bool) – Whether to save intermediate outputs
- run(output_name: Optional[str] = None, **inputs: Dict[str, Any]) → Dict[str, Any][source]¶
Run the preprocessing pipeline for a given name and input parameters and return the specified outputs
- Parameters
output_name (Optional[str], optional) – Unique identifier of the datapoint. Defaults to None.
- Returns
Output of the pipeline as defined in the configuration
- Return type
Dict[str, Any]
- class BatchPipelineRunner(pipeline_config: Dict[str, Any], save_path: Optional[str], save_intermediate: bool = False)[source]¶
Bases:
object
- __init__(pipeline_config: Dict[str, Any], save_path: Optional[str], save_intermediate: bool = False) → None[source]¶
Run Helper that runs the pipeline for multiple inputs with multiprocessing support
- Parameters
pipeline_config (Dict[str, Any]) – Configuration of the pipeline
save_path (Optional[str]) – Path to save the outputs to
save_intermediate (bool, optional) – Whether to save intermediate outputs. Defaults to False.
- link_output(link_directory: str) → None[source]¶
- Creates a symlink between the output directory of the pipeline and the provided path.
Overwrites link if it already exists.
- Parameters
link_directory (str) – Path to link the output directory to
- run(metadata: pandas.core.frame.DataFrame, cores: int = 1, return_out: bool = False) → Optional[Dict[str, Dict[str, Any]]][source]¶
- Runs the pipeline for the provided metadata dataframe and a specified
number of cores for multiprocessing. Does not support saving of outputs
- Parameters
metadata (pd.DataFrame) – Dataframe with the columns as defined in the config inputs
cores (int, optional) – Number of cores to use for multiprocessing. Defaults to 1.
return_out (bool, optional) – If the method should also return the output batch data. If True, make sure you have enough memory. Only supported for single-core processing. Default to False.
- Returns
- If return_out is True, returns the processed output.
Otherwise returns None
- Return type
batched_out (Optional[Dict[str, Dict[str, Any]]])
Reference¶
If you use histocartography in your projects, please cite the following:
@inproceedings{pati2021,
title = {Hierarchical Graph Representations for Digital Pathology},
author = {Pushpak Pati, Guillaume Jaume, Antonio Foncubierta, Florinda Feroce, Anna Maria Anniciello, Giosuè Scognamiglio, Nadia Brancati, Maryse Fiche, Estelle Dubruc, Daniel Riccio, Maurizio Di Bonito, Giuseppe De Pietro, Gerardo Botti, Jean-Philippe Thiran, Maria Frucci, Orcun Goksel, Maria Gabrani},
booktitle = {https://arxiv.org/pdf/2102.11057},
year = {2021}
}