pmotools.pmo_builder.panel_information_to_pmo module

class pmotools.pmo_builder.panel_information_to_pmo.PMOPanelBuilder(target_table: DataFrame, panel_name: str, target_name_col: str = 'target_name', forward_primers_seq_col: str = 'fwd_primer', reverse_primers_seq_col: str = 'rev_primer', reaction_name_col: str | None = None, reaction_name_col_delimiter: str = ',', forward_primers_start_col: int | None = None, forward_primers_end_col: int | None = None, reverse_primers_start_col: int | None = None, reverse_primers_end_col: int | None = None, insert_start_col: int | None = None, insert_end_col: int | None = None, chrom_col: str | None = None, strand_col: str | None = None, ref_seq_col: str | None = None, gene_name_col: str | None = None, target_attributes_col: str | None = None, target_attributes_col_delimiter: str = ',', additional_target_info_cols: list | None = None)[source]

Bases: object

Build PMO target_info and panel_info structures from a target table.

Wraps a dataframe of one-row-per-target panel data and converts it into the nested dictionaries a PMO expects. Most users should call panel_info_table_to_pmo() instead of using this class directly.

Parameters:

target_table – dataframe with one row per target
panel_name – name assigned to the panel
target_name_col – column holding the target names. Default: target_name
forward_primers_seq_col – column holding the forward primer sequence. Default: fwd_primer
reverse_primers_seq_col – column holding the reverse primer sequence. Default: rev_primer
reaction_name_col – optional column naming which reaction each target belongs to; if omitted, all targets go in a single reaction
reaction_name_col_delimiter – delimiter splitting the reaction column into multiple reactions. Default: ,
forward_primers_start_col – optional column with the 0-based forward primer start
forward_primers_end_col – optional column with the 0-based forward primer end
reverse_primers_start_col – optional column with the 0-based reverse primer start
reverse_primers_end_col – optional column with the 0-based reverse primer end
insert_start_col – optional column with the 0-based insert start
insert_end_col – optional column with the 0-based insert end
chrom_col – optional chromosome column; required if any location columns are set
strand_col – optional strand column
ref_seq_col – optional reference-sequence column for the insert
gene_name_col – optional gene-name column
target_attributes_col – optional column of target attribute classifications
target_attributes_col_delimiter – delimiter splitting the attributes column into multiple attributes. Default: ,
additional_target_info_cols – optional list of extra column names to copy verbatim into each target dict

build_panel_info_dict(targets_dict)[source]

Build the panel_info dictionary, grouping targets into reactions.

If no reaction column was configured, all targets are placed in a single reaction named full.

Parameters:: targets_dict – the target_info list from build_target_info_dict()
Returns:: a panel_info dictionary with panel_name and reactions, where each reaction lists target indices into targets_dict

build_target_info_dict(genome_id_col: str | None = None)[source]

Build the list of target_info dictionaries from the target table.

Validates target-name uniqueness and primer/location uniqueness, then assembles one dict per target including primer sequences and, where available, insert and primer genomic locations.

Parameters:: genome_id_col – optional column holding the genome id for each target; if omitted, a genome_id of 0 is used
Returns:: a list of target_info dictionaries

check_location_columns()[source]

Validate the optional genomic-location column configuration.

If any location column is set, enforces that chrom_col is present and that primer/insert start and end columns are supplied as pairs.

Raises:: ValueError – if location columns are set inconsistently
Returns:: the list of location columns if any were provided, otherwise None

check_target_names_are_unique()[source]

Raise an exception if the target names are not unique

Returns:: Nothing

check_unique_target_info(columns_to_check)[source]

Raise an exception if the target info is not unique

Parameters:: columns_to_check – the columns to check to ensure the target info is unique
Returns:: Nothing

summarise_targets_missing_optional_info()[source]

Warn about targets missing optional location fields.

For each of insert, forward-primer, and reverse-primer locations that was requested, finds targets with empty coordinate fields and emits a warning. Targets listed here are skipped when their location block is built.

Returns:: a tuple (missing_insert_loc, missing_fwd_primer_loc, missing_rev_primer_loc); each element is a list of target names, or None if that location type was not requested

pmotools.pmo_builder.panel_information_to_pmo.check_genome_info(genome_info)[source]

Validate that genome info contains the required keys.

Accepts either a single genome dict or a list of them, and checks each for the keys name, genome_version, taxon_id, and url.

Parameters:

genome_info – a genome dict or list of genome dicts

Raises:

TypeError – if genome_info is not a dict or list, or a list element is not a dict
ValueError – if the list is empty or any entry is missing required keys

Returns:

Nothing

pmotools.pmo_builder.panel_information_to_pmo.merge_panel_info_dicts(panel_info_dicts: list[dict]) → dict[source]

Merge multiple panel_info dictionaries produced by panel_info_table_to_pmo.

Target lists are concatenated (deduplicated by target_name) and all genome references are collapsed so that genome identifiers remain valid across the merged structure.

Parameters:: panel_info_dicts – a list of panel_info dicts, each with target_info and panel_info (and optionally targeted_genomes)
Raises:: ValueError – if the list is empty, a dict lacks target_info, or a target has location data without accompanying targeted_genomes
Returns:: a merged dict with panel_info and target_info keys, plus targeted_genomes if any genomes were present

pmotools.pmo_builder.panel_information_to_pmo.panel_info_table_to_pmo(target_table: DataFrame, panel_name: str, genome_info: dict | list | None = None, target_name_col: str = 'target_name', forward_primers_seq_col: str = 'fwd_primer', reverse_primers_seq_col: str = 'rev_primer', reaction_name_col: str | None = None, reaction_name_col_delimiter: str = ',', forward_primers_start_col: str | None = None, forward_primers_end_col: str | None = None, reverse_primers_start_col: str | None = None, reverse_primers_end_col: str | None = None, insert_start_col: str | None = None, insert_end_col: str | None = None, chrom_col: str | None = None, strand_col: str | None = None, ref_seq_col: str | None = None, gene_name_col: str | None = None, genome_id_col: str | None = None, target_attributes_col: str | None = None, target_attributes_col_delimiter: str = ',', additional_target_info_cols: list | None = None)[source]

Convert a dataframe containing panel information into dictionary of targets and reference information

Parameters:

target_table (pd.DataFrame) – the dataframe containing the target information
panel_name (str) – the panel ID assigned to the panel
genome_info (dict or list, optional) – reference genome information, needed if the target info contains genome location
target_name_col (str) – the name of the column containing the target IDs. Default: target_name
forward_primers_seq_col (str) – the name of the column containing the sequence of the forward primer. Default: fwd_primer
reverse_primers_seq_col (str) – the name of the column containing the sequence of the reverse primer. Default: rev_primer
reaction_name_col (str, optional) – the name of the column containing which reaction the target was part of. By default they will all be put in one reaction.
reaction_name_col_delimiter (str) – the delimiter used to split the reaction name column into multiple reactions. Default is a comma.
forward_primers_start_col (str, optional) – the name of the column containing the 0-based start coordinate of the forward primer
forward_primers_end_col (str, optional) – the name of the column containing the 0-based end coordinate of the forward primer
reverse_primers_start_col (str, optional) – the name of the column containing the 0-based start coordinate of the reverse primer
reverse_primers_end_col (str, optional) – the name of the column containing the 0-based end coordinate of the reverse primer
insert_start_col (str, optional) – the name of the column containing the 0-based start coordinate of the insert
insert_end_col (str, optional) – the name of the column containing the 0-based end coordinate of the insert
chrom_col (str, optional) – the name of the column containing the chromosome for the target
gene_name_col (str, optional) – the name of the column containing the gene id
strand_col (str, optional) – the name of the column containing the strand for the target
ref_seq_col (str, optional) – the name of the column containing the reference sequence for the insert
target_attributes_col (str, optional) – a list of classification type for the primer target
target_attributes_col_delimiter (str) – the delimiter used to split the target attributes column into multiple attributes. Default is a comma.
genome_id_col (str, optional) – the name of the column containing the genome ID (default is 0)
additional_target_info_cols (list, optional) – a list of additional column names to copy verbatim into each target information dictionary

Returns:

a dict of the panel information

Return type:

dict