pmotools.pmo_builder.panel_information_to_pmo module
- class pmotools.pmo_builder.panel_information_to_pmo.PMOPanelBuilder(target_table: DataFrame, panel_name: str, target_name_col: str = 'target_name', forward_primers_seq_col: str = 'fwd_primer', reverse_primers_seq_col: str = 'rev_primer', reaction_name_col: str | None = None, reaction_name_col_delimiter: str = ',', forward_primers_start_col: int | None = None, forward_primers_end_col: int | None = None, reverse_primers_start_col: int | None = None, reverse_primers_end_col: int | None = None, insert_start_col: int | None = None, insert_end_col: int | None = None, chrom_col: str | None = None, strand_col: str | None = None, ref_seq_col: str | None = None, gene_name_col: str | None = None, target_attributes_col: str | None = None, target_attributes_col_delimiter: str = ',', additional_target_info_cols: list | None = None)[source]
Bases:
objectBuild PMO
target_infoandpanel_infostructures from a target table.Wraps a dataframe of one-row-per-target panel data and converts it into the nested dictionaries a PMO expects. Most users should call
panel_info_table_to_pmo()instead of using this class directly.- Parameters:
target_table – dataframe with one row per target
panel_name – name assigned to the panel
target_name_col – column holding the target names. Default:
target_nameforward_primers_seq_col – column holding the forward primer sequence. Default:
fwd_primerreverse_primers_seq_col – column holding the reverse primer sequence. Default:
rev_primerreaction_name_col – optional column naming which reaction each target belongs to; if omitted, all targets go in a single reaction
reaction_name_col_delimiter – delimiter splitting the reaction column into multiple reactions. Default:
,forward_primers_start_col – optional column with the 0-based forward primer start
forward_primers_end_col – optional column with the 0-based forward primer end
reverse_primers_start_col – optional column with the 0-based reverse primer start
reverse_primers_end_col – optional column with the 0-based reverse primer end
insert_start_col – optional column with the 0-based insert start
insert_end_col – optional column with the 0-based insert end
chrom_col – optional chromosome column; required if any location columns are set
strand_col – optional strand column
ref_seq_col – optional reference-sequence column for the insert
gene_name_col – optional gene-name column
target_attributes_col – optional column of target attribute classifications
target_attributes_col_delimiter – delimiter splitting the attributes column into multiple attributes. Default:
,additional_target_info_cols – optional list of extra column names to copy verbatim into each target dict
- build_panel_info_dict(targets_dict)[source]
Build the panel_info dictionary, grouping targets into reactions.
If no reaction column was configured, all targets are placed in a single reaction named
full.- Parameters:
targets_dict – the target_info list from
build_target_info_dict()- Returns:
a panel_info dictionary with
panel_nameandreactions, where each reaction lists target indices intotargets_dict
- build_target_info_dict(genome_id_col: str | None = None)[source]
Build the list of target_info dictionaries from the target table.
Validates target-name uniqueness and primer/location uniqueness, then assembles one dict per target including primer sequences and, where available, insert and primer genomic locations.
- Parameters:
genome_id_col – optional column holding the genome id for each target; if omitted, a genome_id of 0 is used
- Returns:
a list of target_info dictionaries
- check_location_columns()[source]
Validate the optional genomic-location column configuration.
If any location column is set, enforces that
chrom_colis present and that primer/insert start and end columns are supplied as pairs.- Raises:
ValueError – if location columns are set inconsistently
- Returns:
the list of location columns if any were provided, otherwise None
- check_target_names_are_unique()[source]
Raise an exception if the target names are not unique
- Returns:
Nothing
- check_unique_target_info(columns_to_check)[source]
Raise an exception if the target info is not unique
- Parameters:
columns_to_check – the columns to check to ensure the target info is unique
- Returns:
Nothing
- summarise_targets_missing_optional_info()[source]
Warn about targets missing optional location fields.
For each of insert, forward-primer, and reverse-primer locations that was requested, finds targets with empty coordinate fields and emits a warning. Targets listed here are skipped when their location block is built.
- Returns:
a tuple
(missing_insert_loc, missing_fwd_primer_loc, missing_rev_primer_loc); each element is a list of target names, or None if that location type was not requested
- pmotools.pmo_builder.panel_information_to_pmo.check_genome_info(genome_info)[source]
Validate that genome info contains the required keys.
Accepts either a single genome dict or a list of them, and checks each for the keys
name,genome_version,taxon_id, andurl.- Parameters:
genome_info – a genome dict or list of genome dicts
- Raises:
TypeError – if genome_info is not a dict or list, or a list element is not a dict
ValueError – if the list is empty or any entry is missing required keys
- Returns:
Nothing
- pmotools.pmo_builder.panel_information_to_pmo.merge_panel_info_dicts(panel_info_dicts: list[dict]) dict[source]
Merge multiple panel_info dictionaries produced by panel_info_table_to_pmo.
Target lists are concatenated (deduplicated by target_name) and all genome references are collapsed so that genome identifiers remain valid across the merged structure.
- Parameters:
panel_info_dicts – a list of panel_info dicts, each with
target_infoandpanel_info(and optionallytargeted_genomes)- Raises:
ValueError – if the list is empty, a dict lacks
target_info, or a target has location data without accompanyingtargeted_genomes- Returns:
a merged dict with
panel_infoandtarget_infokeys, plustargeted_genomesif any genomes were present
- pmotools.pmo_builder.panel_information_to_pmo.panel_info_table_to_pmo(target_table: DataFrame, panel_name: str, genome_info: dict | list | None = None, target_name_col: str = 'target_name', forward_primers_seq_col: str = 'fwd_primer', reverse_primers_seq_col: str = 'rev_primer', reaction_name_col: str | None = None, reaction_name_col_delimiter: str = ',', forward_primers_start_col: str | None = None, forward_primers_end_col: str | None = None, reverse_primers_start_col: str | None = None, reverse_primers_end_col: str | None = None, insert_start_col: str | None = None, insert_end_col: str | None = None, chrom_col: str | None = None, strand_col: str | None = None, ref_seq_col: str | None = None, gene_name_col: str | None = None, genome_id_col: str | None = None, target_attributes_col: str | None = None, target_attributes_col_delimiter: str = ',', additional_target_info_cols: list | None = None)[source]
Convert a dataframe containing panel information into dictionary of targets and reference information
- Parameters:
target_table (pd.DataFrame) – the dataframe containing the target information
panel_name (str) – the panel ID assigned to the panel
genome_info (dict or list, optional) – reference genome information, needed if the target info contains genome location
target_name_col (str) – the name of the column containing the target IDs. Default: target_name
forward_primers_seq_col (str) – the name of the column containing the sequence of the forward primer. Default: fwd_primer
reverse_primers_seq_col (str) – the name of the column containing the sequence of the reverse primer. Default: rev_primer
reaction_name_col (str, optional) – the name of the column containing which reaction the target was part of. By default they will all be put in one reaction.
reaction_name_col_delimiter (str) – the delimiter used to split the reaction name column into multiple reactions. Default is a comma.
forward_primers_start_col (str, optional) – the name of the column containing the 0-based start coordinate of the forward primer
forward_primers_end_col (str, optional) – the name of the column containing the 0-based end coordinate of the forward primer
reverse_primers_start_col (str, optional) – the name of the column containing the 0-based start coordinate of the reverse primer
reverse_primers_end_col (str, optional) – the name of the column containing the 0-based end coordinate of the reverse primer
insert_start_col (str, optional) – the name of the column containing the 0-based start coordinate of the insert
insert_end_col (str, optional) – the name of the column containing the 0-based end coordinate of the insert
chrom_col (str, optional) – the name of the column containing the chromosome for the target
gene_name_col (str, optional) – the name of the column containing the gene id
strand_col (str, optional) – the name of the column containing the strand for the target
ref_seq_col (str, optional) – the name of the column containing the reference sequence for the insert
target_attributes_col (str, optional) – a list of classification type for the primer target
target_attributes_col_delimiter (str) – the delimiter used to split the target attributes column into multiple attributes. Default is a comma.
genome_id_col (str, optional) – the name of the column containing the genome ID (default is 0)
additional_target_info_cols (list, optional) – a list of additional column names to copy verbatim into each target information dictionary
- Returns:
a dict of the panel information
- Return type:
dict