src.core package

Subpackages

Submodules

src.core.base module

This module provides utility classes and functions for system command execution, file management, archive extraction, platform detection, and logging.

Main components include:
  • SingletonMeta:

    A metaclass implementing the singleton pattern.

  • LoggerMixin:

    A mixin class to add logging capabilities to other classes.

  • ICommandExecutor:

    Protocol defining an interface for command execution.

  • CommandExecutor:

    A class to execute system commands via a callable.

  • execute:

    Utility function to run commands with an executor.

  • touch:

    Creates or updates the timestamp of a file.

  • insert_processing_infix:

    Inserts a string into a filename before its extension.

  • extract_archive:

    Extracts various archive formats (zip, tar, gzip).

  • get_platform:

    Detects the current operating system platform.

This module facilitates building robust scripts and applications that require system command execution, file handling, archive processing, and environment detection.

class src.core.base.SingletonMeta[source]

Bases: type

Metaclass implementing the Singleton pattern.

Ensures that only one instance of a class is created.

_instances = {}
class src.core.base.LoggerMixin(logger: Logger = None)[source]

Bases: object

Mixin class providing logging capabilities.

logger

Custom logger instance.

Type:

logging.Logger

set_logger(logger: Logger = None) None[source]

Sets the logger instance.

Parameters:

logger (Optional[logging.Logger]) – New logger to set.

class src.core.base.ICommandExecutor(*args, **kwargs)[source]

Bases: Protocol

Protocol for command executor classes.

Defines the interface for executing system commands.

run(command: list[str] | str | dict[str, str]) bool[source]

Executes a command.

Parameters:

command (Union[list[str], str, dict[str, str]]) – Command to execute.

_abc_impl = <_abc._abc_data object>
_is_protocol = True
class src.core.base.CommandExecutor(caller: callable = <built-in function system>, logger: ~logging.Logger | None = None)[source]

Bases: LoggerMixin, ICommandExecutor

Executes system commands using a provided callable.

caller

Function that executes commands, defaults to os.system.

Type:

callable

logger

Logger instance for logging.

Type:

logging.Logger

run(command: list[str] | str | dict[str, str]) bool[source]

Executes the given command.

Parameters:

command (Union[list[str], str, dict[str, str]]) – Command to run.

Returns:

True if command executed successfully, False otherwise.

Return type:

bool

_abc_impl = <_abc._abc_data object>
_is_protocol = False
src.core.base.execute(executor, command) None[source]

Executes a command using the provided executor.

Parameters:
src.core.base.touch(path: PathLike) None[source]

Creates an empty file or updates the timestamp if it exists.

Parameters:

path (PathLike[AnyStr]) – Path to the file.

src.core.base.insert_processing_infix(infix_str: str, filename: PathLike) PathLike[source]

Inserts an infix string into a filename before its extension.

Parameters:
  • infix_str (str) – String to insert.

  • filename (PathLike[AnyStr]) – Original filename.

Returns:

Modified filename with infix inserted.

Return type:

PathLike

src.core.base.extract_archive(archive_filepath: PathLike) PathLike[source]

Extracts an archive file (zip, tar, gzip).

Parameters:

archive_filepath (PathLike[AnyStr]) – Path to archive file.

Returns:

List of extracted file names or extracted filename for gzip.

Return type:

PathLike

Raises:
  • FileNotFoundError – If the archive file does not exist.

  • IOError – If the archive format is unsupported or extraction fails.

src.core.base.get_platform() str[source]

Detects the current operating system platform.

Returns:

Platform name ( ‘linux’, ‘freebsd’, ‘aix’, ‘macos’, ‘windows’, ‘unknown’).

Return type:

str

src.core.base.get_unique_path(parent_dir: PathLike = '.') Path[source]

Produce a Path object with directory name based on timestamp.

src.core.sample_data_container module

This module defines a container class for storing sample-related data paths and identifiers.

It includes attributes for R1 and R2 file paths, sample identifiers, and processing log paths.

class src.core.sample_data_container.SampleDataContainer(r1_source: PathLike, r2_source: PathLike = None, sid: str = '1', processing_path: PathLike = None, processing_logpath: PathLike = None, target_regions: list[tuple[str, str]] = None, bam_filepath: PathLike | None = None, vcf_filepath: PathLike | None = None, report_path: PathLike = None)[source]

Bases: object

A container class for storing sample-related data paths and identifiers.

r1_source

Path to the R1 file.

Type:

PathLike[AnyStr]

r2_source

Path to the R2 file (optional).

Type:

Optional[PathLike[AnyStr]]

sid

Patient identifier.

Type:

str

processing_path

Path for storing processing logs.

Type:

PathLike[AnyStr]

processing_logpath

Path to processing logs.

Type:

PathLike[AnyStr]

bam_filepath

Path to BAM file (optional).

Type:

Optional[PathLike[AnyStr]]

vcf_filepath

Path to VCF file (optional).

Type:

Optional[PathLike[AnyStr]]

report_path

Path to the report directory.

Type:

PathLike[AnyStr]

r1_source
r2_source
sid
processing_path
processing_logpath
report_path
target_regions
bam_filepath
vcf_filepath
parse_regions(configurator: Configurator, path: PathLike = None, logger: Logger = None)[source]

Parses target regions from a SAM file and updates the object’s target_regions attribute.

This method reads a SAM file (defaulting to a path based on the object’s processing_path and sid) and extracts chromosome information from sequence headers (@SQ lines). It formats the chromosome identifiers into interval strings (e.g., ‘chr01-interval’) and generates corresponding region tuples using the provided configurator.

Parameters:
  • configurator (Configurator) – An instance used to generate region tuples from interval strings.

  • path (PathLike[AnyStr], optional) – Path to the SAM file. If None, a default path based on the object’s processing_path and sid is used.

  • logger (logging.Logger, optional) – Logger for logging critical errors encountered during file processing.

Raises:

FileNotFoundError, PermissionError, IOError, OSError – If the file cannot be opened or read, an exception is raised after logging the error if a logger is provided.

Side Effects:

Updates the object’s target_regions attribute with a list of region tuples generated from parsed chromosome intervals.

src.core.sample_data_factory module

This module defines an interface for a sample data factory and a base class for sample data containers.

It provides a way to parse sample data from various sources, storing the information in a structured manner.

The ISampleDataFactory protocol defines the required method for parsing sample data, and SampleDataContainer provides a common structure for storing sample data paths and identifiers.

The module utilizes the logging module for logging operations.

class src.core.sample_data_factory.ISampleDataFactory(*args, **kwargs)[source]

Bases: Protocol

Interface for a sample data factory.

Defines the method parse_sample_data to be implemented by concrete classes.

parse_sample_data(path: PathLike, sample_id: AnyStr) SampleDataContainer[source]
Parses sample data from the given path

for the specified sample ID.

Parameters:
  • path (PathLike[AnyStr]) – Directory path containing sample files.

  • sample_id (AnyStr) – Identifier for the sample.

Returns:

An instance containing parsed sample data.

Return type:

SampleDataContainer

_abc_impl = <_abc._abc_data object>
_is_protocol = True
class src.core.sample_data_factory.SampleDataFactory(logger: Logger = None, outpath: PathLike = None)[source]

Bases: LoggerMixin, ISampleDataFactory

Concrete implementation of the ISampleDataFactory interface.

Uses logging for error reporting and parsing sample data from files.

_abc_impl = <_abc._abc_data object>
_is_protocol = False
parse_sample_data(path: PathLike, sample_id: AnyStr) SampleDataContainer[source]

Parses sample data files from a directory based on the sample ID.

Looks for files containing

the sample ID and ‘R1’ or ‘R2’ in their names.

Parameters:
  • path (PathLike[AnyStr]) – Directory path containing sample files.

  • sample_id (AnyStr) – Identifier for the sample.

Returns:

An instance with source paths for R1 and R2, or None if files are not found.

Return type:

SampleDataContainer

Module contents