pmotools.pmo_engine.pmo_processor module
- class pmotools.pmo_engine.pmo_processor.PMOProcessor[source]
Bases:
objectA class to extract info out of a loaded PMO object
- static count_library_samples_per_target(pmodata, min_reads: float = 0.0, collapse_across_runs: bool = False) DataFrame[source]
Count the number of library samples per target, optionally collapsing across bioinformatics runs.
- Parameters:
pmodata – the loaded PMO
min_reads – the minimum number of reads for a target in order for it to be counted
collapse_across_runs – if True, sums across bioinformatics_run_id per target
- Returns:
a pandas dataframe:
if
collapse_across_runs=False: columns arebioinformatics_run_id,target_name,sample_countif
collapse_across_runs=True: columns aretarget_name,sample_count
- static count_specimen_by_field_value(pmodata, meta_fields: list[str]) DataFrame[source]
Count the values of the meta fields. If a specimen doesn’t have a field, it is marked as ‘NA’. Groups are combinations of all given meta fields.
- Parameters:
pmodata – the pmo to count from
meta_fields (list[str]) – a list of meta fields to count
- Returns:
counts for all sub-field groups, with metadata
- static count_specimen_per_meta_fields(pmodata) DataFrame[source]
Get a pandas dataframe of counts of the meta fields within the specimen_info section
- Parameters:
pmodata – the pmo to count from
- Returns:
a pandas dataframe of counts with the following columns: field, present_in_specimens_count, total_specimen_count
- static count_targets_per_library_sample(pmodata, min_reads: float = 0.0) DataFrame[source]
Count the number of targets per library sample.
- Parameters:
pmodata – the loaded PMO
min_reads – a minimum number of reads for a target in order for it to be counted
- Returns:
a pandas DataFrame, columns = [bioinformatics_run_id, library_sample_name, target_number]
- static count_targets_per_panel(pmodata) DataFrame[source]
Count the targets per panel.
- Parameters:
pmodata – the pmo to count from
- Returns:
counts for each panel
- static extract_allele_counts_freq_from_pmo(pmodata, bioinformatics_run_ids: list[int] = None, library_sample_names: list[str] = None, target_names: list[str] = None, collapse_across_runs: bool = False) DataFrame[source]
Extract allele counts from PMO data into a single DataFrame.
- Parameters:
pmodata – the pmo data structure
bioinformatics_run_ids – optional list of bioinformatics_run_ids to include
library_sample_names – optional list of library_sample_names to include
target_names – optional list of target_names to include
collapse_across_runs – whether to collapse count/freqs across bioinformatics_run_id runs
- Returns:
DataFrame with columns: bioinformatics_run_id (if not collapsing), target_name, mhap_id, count, freq, target_total
- static extract_from_pmo_samples_with_meta_groupings(pmodata, meta_fields_values: str)[source]
Extract out of a PMO the data associated with specimens that belong to specific meta data groupings
- Parameters:
pmodata – the PMO to extract from
meta_fields_values – Meta Fields to include, should either be a table with columns field, values (comma separated values) (and optionally group) or supplied command line as field1=value1,value2,value3:field2=value1,value2;field1=value5,value6, where each group is separated by a semicolon
- Returns:
a tuple of (filtered PMO, group counts dataframe)
- static extract_from_pmo_with_read_filter(pmodata, read_filter: float)[source]
Extract out data from the PMO with inconclusive read filter
- Parameters:
pmodata – the pmo to extract data from
read_filter – the read filter to use, inconclusive filter
- Returns:
a new pmodata with the data only with detected microhaplotypes above this read filter
- static filter_pmo_by_library_sample_ids(pmodata, library_sample_ids: set[int])[source]
Extract out of a load PMO the data associated with select library_sample_ids
:param pmodata:the loaded PMO :param library_sample_ids: the library_sample_ids to extract the info for :return: a new PMO with only the data associated with the supplied library_sample_ids
- static filter_pmo_by_library_sample_names(pmodata, library_sample_names: set[str])[source]
Filters pmodata by library sample names
- Parameters:
pmodata – the pmodata object
library_sample_names – set of library sample names, will be converted into indexes to extract out
- Returns:
filtered pmodata object containing only the indexes
- static filter_pmo_by_specimen_ids(pmodata, specimen_ids: set[int])[source]
Extract out of a load PMO the data associated with select specimen_ids
:param pmodata:the loaded PMO :param specimen_ids: the specimen_ids to extract the info for :return: a new PMO with only the data associated with the supplied specimen_ids
- static filter_pmo_by_specimen_names(pmodata, specimen_names: set[str])[source]
Extract out of a loaded PMO the data associated with select specimen_names
:param pmodata:the loaded PMO :param specimen_names: the specimen_names to extract the info for :return: a new PMO with only the data associated with the supplied specimen_names
- static filter_pmo_by_target_ids(pmodata, target_ids: set[int])[source]
Extract out data from the PMO for only select target IDs
- Parameters:
pmodata – the pmo to extract data from
target_ids – the target_ids to extract
- Returns:
a new pmo with the data for only the targets supplied
- static filter_pmo_by_target_names(pmodata, target_names: set[str])[source]
Extract out data from the PMO for only select target names
- Parameters:
pmodata – the pmo to extract data from
target_names – the target_names to extract
- Returns:
a new pmo with the data for only the targets supplied
- static get_bioinformatics_run_names(pmodata) list[str][source]
Get a list of bioinformatics_run_names in pmodata[“bioinformatics_run_info”] in order they appear
- Parameters:
pmodata – the PMO to get bioinformatics_run_names from
- Returns:
a list of all bioinformatics_run_names
- static get_index_key_of_bioinformatics_run_names(pmodata)[source]
Get key of bioinformatics_run_name to index in pmodata[“bioinformatics_run_info”]
- Parameters:
pmodata – the PMO to get indexes from
- Returns:
a dictionary of indexes keyed by bioinformatics_run_name
- static get_index_key_of_library_sample_names(pmodata)[source]
Get key of library_sample_name to index in pmodata[“library_sample_info”]
- Parameters:
pmodata – the PMO to get indexes from
- Returns:
a dictionary of indexes keyed by library_sample_name
- static get_index_key_of_panel_names(pmodata)[source]
Get key of panel_name to index in pmodata[“panel_info”]
- Parameters:
pmodata – the PMO to get indexes from
- Returns:
a dictionary of indexes keyed by panel_name
- static get_index_key_of_specimen_names(pmodata)[source]
Get key of specimen_name to index in pmodata[“specimen_info”]
- Parameters:
pmodata – the PMO to get indexes from
- Returns:
a dictionary of indexes keyed by specimen_name
- static get_index_key_of_target_in_representative_microhaplotypes(pmodata)[source]
Get key of target_name to index for the representative microhaplotypes for the target_name in pmodata[“representative_microhaplotypes”]
- Parameters:
pmodata – the PMO to get indexes from
- Returns:
a dictionary of indexes keyed by target_name
- static get_index_key_of_target_names(pmodata)[source]
Get key of target_name to index in pmodata[“target_info”]
- Parameters:
pmodata – the PMO to get indexes from
- Returns:
a dictionary of indexes keyed by target_name
- static get_index_of_bioinformatics_run_names(pmodata, bioinformatics_run_names: list[str])[source]
Get index of bioinformatics_run_name in pmodata[“bioinformatics_run_info”]
- Parameters:
pmodata – the PMO to get indexes from
bioinformatics_run_names – a list of bioinformatics_run_names
- Returns:
the index of bioinformatics_run_names in pmodata[“bioinformatics_run_name”] returned in the same order as bioinformatics_run_names
- static get_index_of_library_sample_names(pmodata, library_sample_names: list[str])[source]
Get index of library_sample_name in pmodata[“library_sample_info”]
- Parameters:
pmodata – the PMO to get indexes from
library_sample_names – a list of library_sample_names
- Returns:
the index of library_sample_names in pmodata[“library_sample_info”] returned in the same order as library_sample_names
- static get_index_of_panel_names(pmodata, panel_names: list[str])[source]
Get index of panel_name in pmodata[“panel_info”]
- Parameters:
pmodata – the PMO to get indexes from
panel_names – a list of panel_names
- Returns:
the index of panel_names in pmodata[“panel_info”] returned in the same order as panel_names
- static get_index_of_specimen_names(pmodata, specimen_names: list[str])[source]
Get index of specimen_name in pmodata[“specimen_info”]
- Parameters:
pmodata – the PMO to get indexes from
specimen_names – a list of specimen_names
- Returns:
the index of specimen_names in pmodata[“specimen_info”] returned in the same order as specimen_names
- static get_index_of_target_in_representative_microhaplotypes(pmodata, target_names: list[str])[source]
Get index of target_name in pmodata[“representative_microhaplotypes”][“targets”]
- Parameters:
pmodata – the PMO to get indexes from
target_names – a list of target_names
- Returns:
the index of target_names in pmodata[“representative_microhaplotypes”][“targets”] returned in the same order as target_names
- static get_index_of_target_names(pmodata, target_names: list[str])[source]
Get index of target_name in pmodata[“target_info”]
- Parameters:
pmodata – the PMO to get indexes from
target_names – a list of target_names
- Returns:
the index of target_names in pmodata[“target_info”] returned in the same order as target_names
- static get_library_ids_for_specimen_ids(pmodata, specimen_ids: set[int])[source]
get a dictionary that lists the library_ids for a specimen_id
- Parameters:
pmodata – the PMO to get indexes from
specimen_ids – a set of specimen_ids
- Returns:
a dictionary that lists the library_ids for a specimen_id
- static get_library_sample_names(pmodata) list[str][source]
Get a list of library_sample_names in pmodata[“library_sample_info”] in the order they appear
- Parameters:
pmodata – the PMO to get library_sample_names from
- Returns:
a list of all library_sample_names
- static get_panel_names(pmodata) list[str][source]
Get a list of panel_names in pmodata[“panel_info”] in the order they appear
- Parameters:
pmodata – the PMO to get panel_names from
- Returns:
a list of all panel_names
- static get_sorted_bioinformatics_run_names(pmodata) list[str][source]
Get a name sorted list of bioinformatics_run_names in pmodata[“bioinformatics_run_info”]
- Parameters:
pmodata – the PMO to get bioinformatics_run_names from
- Returns:
a list of all bioinformatics_run_names
- static get_sorted_library_sample_names(pmodata) list[str][source]
Get a name sorted list of library_sample_names in pmodata[“library_sample_info”]
- Parameters:
pmodata – the PMO to get library_sample_names from
- Returns:
a list of all library_sample_names
- static get_sorted_panel_names(pmodata) list[str][source]
Get a name sorted list of panel_names in pmodata[“panel_info”]
- Parameters:
pmodata – the PMO to get panel_names from
- Returns:
a list of all panel_names
- static get_sorted_specimen_names(pmodata) list[str][source]
Get a name sorted list of specimen_names in pmodata[“specimen_info”]
- Parameters:
pmodata – the PMO to get specimen_names from
- Returns:
a list of all specimen_names
- static get_sorted_target_names(pmodata) list[str][source]
Get a name sorted list of target_names in pmodata[“target_info”]
- Parameters:
pmodata – the PMO to get target_names from
- Returns:
a list of all target_names