src package

Subpackages

Submodules

src.analyzer module

This module defines the Analyzer class, responsible for orchestrating the entire genomic data analysis pipeline.

It manages data preparation, alignment, variant calling, annotation, and conversion steps, using various components like SequenceAligner, BamGrouper, BQSRPerformer, and variant callers. It also handles command execution, logging, and file management throughout the process.

Main Features: - Initializes with a configurator and command caller. - Prepares data by performing sequence alignment, grouping reads, and recalibration. - Analyzes samples by calling variants, annotating them, converting formats, and generating reports. - Manages paths, logs, and subprocess execution.

class src.analyzer.Analyzer(configurator: Configurator, cmd_caller: CommandExecutor | callable = None)[source]

Bases: Protocol

Protocol class for design your own analyze stage that manages the entire genomic data analysis pipeline.

This class orchestrates the steps involved in processing sequencing data, including data preparation, read alignment, variant calling, annotation, and format conversion. It leverages various components such as SequenceAligner, BamGrouper, BQSRPerformer, variant callers, and annotation tools to perform the analysis systematically.

configurator

Configuration object containing paths, parameters, and logger.

Type:

Configurator

cmd_caller

Function or object to execute system commands.

Type:

Union[CommandExecutor, callable]

prepare_data(sample)[source]

Prepares raw sequencing data by trimming, aligning, and recalibrating.

analyze(sample)[source]

Performs variant calling, annotation, and converts formats for reporting.

prepare_data(sample: SampleDataContainer) SampleDataContainer[source]

Prepares raw sequencing data for analysis, including alignment, read grouping, and recalibration.

Parameters:

sample (SampleDataContainer) – Sample with raw data and metadata.

Returns:

Updated sample with paths to intermediate and final files.

Return type:

SampleDataContainer

analyze(sample: SampleDataContainer) SampleDataContainer[source]

Performs variant calling, annotation, and format conversion.

Parameters:

sample (SampleDataContainer) – Sample with aligned data.

Returns:

Updated sample with annotated variants and reports.

Return type:

SampleDataContainer

_abc_impl = <_abc._abc_data object>
_is_protocol = True
class src.analyzer.BRCAAnalyzer(configurator: Configurator, cmd_caller: CommandExecutor | callable = None)[source]

Bases: Analyzer

Main analyzer class that manages the entire genomic data analysis pipeline.

This class orchestrates the steps involved in processing sequencing data, including data preparation, read alignment, variant calling, annotation, and format conversion. It leverages various components such as SequenceAligner, BamGrouper, BQSRPerformer, variant callers, and annotation tools to perform the analysis systematically.

configurator

Configuration object containing paths, parameters, and logger.

Type:

Configurator

cmd_caller

Function or object to execute system commands.

Type:

Union[CommandExecutor, callable]

prepare_data(sample)[source]

Prepares raw sequencing data by trimming, aligning, and recalibrating.

analyze(sample)[source]

Performs variant calling, annotation, and converts formats for reporting.

prepare_data(sample: SampleDataContainer) SampleDataContainer[source]

Prepares raw sequencing data for analysis, including alignment, read grouping, and recalibration.

Parameters:

sample (SampleDataContainer) – Sample with raw data and metadata.

Returns:

Updated sample with paths to intermediate and final files.

Return type:

SampleDataContainer

analyze(sample: SampleDataContainer) SampleDataContainer[source]

Performs variant calling, annotation, and format conversion.

Parameters:

sample (SampleDataContainer) – Sample with aligned data.

Returns:

Updated sample with annotated variants and reports.

Return type:

SampleDataContainer

_abc_impl = <_abc._abc_data object>
_is_protocol = False
class src.analyzer.TestAnalyzerhg19(configurator: Configurator, cmd_caller: CommandExecutor | callable = None)[source]

Bases: Analyzer

_abc_impl = <_abc._abc_data object>
_is_protocol = False
prepare_data(sample: SampleDataContainer) SampleDataContainer[source]

Prepares raw sequencing data for analysis, including alignment, read grouping, and recalibration.

Parameters:

sample (SampleDataContainer) – Sample with raw data and metadata.

Returns:

Updated sample with paths to intermediate and final files.

Return type:

SampleDataContainer

analyze(sample: SampleDataContainer) SampleDataContainer[source]

Performs variant calling, annotation, and format conversion.

Parameters:

sample (SampleDataContainer) – Sample with aligned data.

Returns:

Updated sample with annotated variants and reports.

Return type:

SampleDataContainer

src.configurator module

This module contains the implementation of the Configurator class, which manages the configuration setup for an analysis pipeline.

It handles command-line argument parsing, logging configuration, output directory validation/creation, and loading configuration parameters from a configuration file.

The Configurator class is designed as a singleton to ensure a single point of configuration management throughout the application.

It utilizes other components such as PathValidator, LoggingConfigurator, and ConfigLoader to perform its tasks.

Usage:

Instantiate the Configurator class to initialize configuration, logging, and output directories.

class src.configurator.Configurator(*args, **kwargs)[source]

Bases: object

Singleton class responsible for managing configuration, logging, and output directory setup for the analysis pipeline.

This class handles parsing command-line arguments, setting up logging, validating and creating the output directory, and loading configuration parameters from a configuration file.

args

Parsed command-line arguments.

Type:

argparse.Namespace

log_path

Absolute path to the log file.

Type:

str

logger

Logger instance for logging messages.

Type:

logging.Logger

output_dir

Path to the output directory where results will be stored.

Type:

str

config

Dictionary containing configuration parameters.

Type:

dict

_parse_args()[source]

Parses command-line arguments.

_setup_logger(log_filename)[source]

Sets up the logging configuration.

_setup_output_directory(output_dir)[source]

Validates or creates the output directory.

_load_configuration(config_path)

Loads configuration parameters.

parse_configuration(base_config_filepath, target_section)[source]

Loads specific configuration sections.

static _parse_args() Namespace[source]

Parses command-line arguments using argparse.

Returns:

Parsed arguments object containing command-line parameters.

Return type:

argparse.Namespace

static _setup_logger(log_filename: PathLike, args: Namespace = None) tuple[PathLike, Logger][source]

Sets up the logging system with the specified log file.

Parameters:

log_filename (PathLike[AnyStr]) – Path to the log file.

Returns:

A tuple containing the absolute path to the log file and the configured Logger object.

Return type:

tuple

_setup_output_directory(output_dir: PathLike) PathLike[source]

Validates the output directory path, creates it if it doesn’t exist, and handles existing directory conflicts based on user input.

Parameters:

output_dir (PathLike[AnyStr]) – Path to the desired output directory.

Returns:

Absolute path to the validated or created output directory.

Return type:

PathLike[AnyStr]

parse_configuration(base_config_filepath: PathLike | None, target_section: AnyStr = 'Pathes') dict[source]

Loads a specific section of the configuration from a base configuration file.

Parameters:
  • base_config_filepath (PathLike[AnyStr], optional) – Path to the base configuration file. Defaults to ‘src/conf/config.ini’ relative to current directory.

  • target_section (str, optional) – The section within the configuration file to load. Defaults to ‘Pathes’.

Returns:

Dictionary of configuration parameters from the specified section.

Return type:

dict

src.demultiplexor_adapter module

Module for demultiplexor adapter functionality.

src.demultiplexor_adapter.main()[source]

Main function responsible for creating and executing the demultiplexor adapter as an autonomous component outside the pipeline.

src.dependency_handler module

This module provides the DependencyHandler class, which is responsible for managing dependencies, checking the existence of reference files or archives, extracting archives, verifying module installation, attempting to install missing modules via pip, and verifying or creating paths.

Classes:
  • DependencyHandler:

    Singleton class to handle dependencies and reference files.

Functions:
  • Various static methods for module loading, installation

path verification, and file extension resolution.

Usage:

Instantiate the DependencyHandler singleton to perform dependency management tasks, such as checking reference files, installing modules, and verifying paths.

class src.dependency_handler.DependencyHandler(*args, **kwargs)[source]

Bases: object

Singleton class to manage dependencies and reference files.

Provides methods to set loggers, check and resolve reference files (including archives), verify and install modules, and verify or create filesystem paths.

set_logger(new_logger: Logger) None[source]

Sets a new logger for the handler, replacing the current one.

Parameters:

new_logger (logging.Logger) – The new logger to set.

Raises:

RuntimeError – If the current logger is not set or if the new logger is None.

check_reference(ref_filepath: PathLike, ref_dirpath: PathLike = '.') PathLike[source]

Check that file or archive with reference sequence exists. If there is only an archive, extract it to the current reference directory.

Returns path to reference file if it exists or raise FileNotFoundError otherwise.

static is_module_loaded(module_name: AnyStr) bool[source]

Checks if a module is loaded in the current environment.

Parameters:

module_name (str) – Name of the module to check.

Returns:

True if the module is loaded, False otherwise.

Return type:

bool

static try_to_install_module(module_name: AnyStr, logger: Logger | None = None) bool[source]

Attempts to install a module via pip.

Parameters:
  • module_name (str) – Name of the module to install.

  • logger (logging.Logger) – A logger for the function call.

Returns:

True if installation succeeded, False otherwise.

Return type:

bool

static fetch_dependency(module_name: AnyStr, logger: Logger | None = None) None[source]

Attempts to fetch a dependency module from pip, prompting the user if needed.

Parameters:
  • module_name (str) – Name of the module to fetch.

  • logger (logging.Logger) – A logger for the function call.

Exits:
  • Exits with code os.EX_SOFTWARE if installation fails.

  • Exits with code os.EX_OK if user chooses not to install.

  • Exits with code os.EX_USAGE if command is unrecognized.

static verify_path(src: str, logger: Logger | None = None) bool[source]

Checks the existence of the file or directory at the given path src.

If the path does not exist, creates the necessary directories and the file.

Parameters:
  • src (str) – The path to the file or directory to check or create.

  • logger (logging.Logger) – A logger for the function call.

Returns:

True if the path exists or was successfully created, otherwise False.

Return type:

bool

static resolve_file_path_by_extensions(base_name: PathLike, extension_list: list[str]) PathLike[source]
Searches for the first existing file that matches

the base name with any of the provided extensions.

Parameters:
  • base_name (PathLike[AnyStr]) – The base file path without extension.

  • extension_list (list[str]) –

    A list of file extensions to check

    (including the dot, e.g., ‘.txt’).

Returns:

The full path to the first existing file

that matches the base name with one of the extensions.

Return type:

PathLike[AnyStr]

Raises:

FileNotFoundError – If no file matching the base name with any of the provided extensions is found.

src.main module

src.table_manager module

Module contents