pmotools.pmo_engine.pmo_processor module

class pmotools.pmo_engine.pmo_processor.PMOProcessor[source]

Bases: object

A class to extract info out of a loaded PMO object

static count_library_samples_per_target(pmodata, min_reads: float = 0.0, collapse_across_runs: bool = False) DataFrame[source]

Count the number of library samples per target, optionally collapsing across bioinformatics runs.

Parameters:
  • pmodata – the loaded PMO

  • min_reads – the minimum number of reads for a target in order for it to be counted

  • collapse_across_runs – if True, sums across bioinformatics_run_id per target

Returns:

a pandas dataframe:

  • if collapse_across_runs=False: columns are bioinformatics_run_id, target_name, sample_count

  • if collapse_across_runs=True: columns are target_name, sample_count

static count_specimen_by_field_value(pmodata, meta_fields: list[str]) DataFrame[source]

Count the values of the meta fields. If a specimen doesn’t have a field, it is marked as ‘NA’. Groups are combinations of all given meta fields.

Parameters:
  • pmodata – the pmo to count from

  • meta_fields (list[str]) – a list of meta fields to count

Returns:

counts for all sub-field groups, with metadata

static count_specimen_per_meta_fields(pmodata) DataFrame[source]

Get a pandas dataframe of counts of the meta fields within the specimen_info section

Parameters:

pmodata – the pmo to count from

Returns:

a pandas dataframe of counts with the following columns: field, present_in_specimens_count, total_specimen_count

static count_targets_per_library_sample(pmodata, min_reads: float = 0.0) DataFrame[source]

Count the number of targets per library sample.

Parameters:
  • pmodata – the loaded PMO

  • min_reads – a minimum number of reads for a target in order for it to be counted

Returns:

a pandas DataFrame, columns = [bioinformatics_run_id, library_sample_name, target_number]

static count_targets_per_panel(pmodata) DataFrame[source]

Count the targets per panel.

Parameters:

pmodata – the pmo to count from

Returns:

counts for each panel

static extract_allele_counts_freq_from_pmo(pmodata, bioinformatics_run_ids: list[int] = None, library_sample_names: list[str] = None, target_names: list[str] = None, collapse_across_runs: bool = False) DataFrame[source]

Extract allele counts from PMO data into a single DataFrame.

Parameters:
  • pmodata – the pmo data structure

  • bioinformatics_run_ids – optional list of bioinformatics_run_ids to include

  • library_sample_names – optional list of library_sample_names to include

  • target_names – optional list of target_names to include

  • collapse_across_runs – whether to collapse count/freqs across bioinformatics_run_id runs

Returns:

DataFrame with columns: bioinformatics_run_id (if not collapsing), target_name, mhap_id, count, freq, target_total

static extract_from_pmo_samples_with_meta_groupings(pmodata, meta_fields_values: str)[source]

Extract out of a PMO the data associated with specimens that belong to specific meta data groupings

Parameters:
  • pmodata – the PMO to extract from

  • meta_fields_values – Meta Fields to include, should either be a table with columns field, values (comma separated values) (and optionally group) or supplied command line as field1=value1,value2,value3:field2=value1,value2;field1=value5,value6, where each group is separated by a semicolon

Returns:

a tuple of (filtered PMO, group counts dataframe)

static extract_from_pmo_with_read_filter(pmodata, read_filter: float)[source]

Extract out data from the PMO with inconclusive read filter

Parameters:
  • pmodata – the pmo to extract data from

  • read_filter – the read filter to use, inconclusive filter

Returns:

a new pmodata with the data only with detected microhaplotypes above this read filter

static filter_pmo_by_library_sample_ids(pmodata, library_sample_ids: set[int])[source]

Extract out of a load PMO the data associated with select library_sample_ids

:param pmodata:the loaded PMO :param library_sample_ids: the library_sample_ids to extract the info for :return: a new PMO with only the data associated with the supplied library_sample_ids

static filter_pmo_by_library_sample_names(pmodata, library_sample_names: set[str])[source]

Filters pmodata by library sample names

Parameters:
  • pmodata – the pmodata object

  • library_sample_names – set of library sample names, will be converted into indexes to extract out

Returns:

filtered pmodata object containing only the indexes

static filter_pmo_by_specimen_ids(pmodata, specimen_ids: set[int])[source]

Extract out of a load PMO the data associated with select specimen_ids

:param pmodata:the loaded PMO :param specimen_ids: the specimen_ids to extract the info for :return: a new PMO with only the data associated with the supplied specimen_ids

static filter_pmo_by_specimen_names(pmodata, specimen_names: set[str])[source]

Extract out of a loaded PMO the data associated with select specimen_names

:param pmodata:the loaded PMO :param specimen_names: the specimen_names to extract the info for :return: a new PMO with only the data associated with the supplied specimen_names

static filter_pmo_by_target_ids(pmodata, target_ids: set[int])[source]

Extract out data from the PMO for only select target IDs

Parameters:
  • pmodata – the pmo to extract data from

  • target_ids – the target_ids to extract

Returns:

a new pmo with the data for only the targets supplied

static filter_pmo_by_target_names(pmodata, target_names: set[str])[source]

Extract out data from the PMO for only select target names

Parameters:
  • pmodata – the pmo to extract data from

  • target_names – the target_names to extract

Returns:

a new pmo with the data for only the targets supplied

static get_bioinformatics_run_names(pmodata) list[str][source]

Get a list of bioinformatics_run_names in pmodata[“bioinformatics_run_info”] in order they appear

Parameters:

pmodata – the PMO to get bioinformatics_run_names from

Returns:

a list of all bioinformatics_run_names

static get_index_key_of_bioinformatics_run_names(pmodata)[source]

Get key of bioinformatics_run_name to index in pmodata[“bioinformatics_run_info”]

Parameters:

pmodata – the PMO to get indexes from

Returns:

a dictionary of indexes keyed by bioinformatics_run_name

static get_index_key_of_library_sample_names(pmodata)[source]

Get key of library_sample_name to index in pmodata[“library_sample_info”]

Parameters:

pmodata – the PMO to get indexes from

Returns:

a dictionary of indexes keyed by library_sample_name

static get_index_key_of_panel_names(pmodata)[source]

Get key of panel_name to index in pmodata[“panel_info”]

Parameters:

pmodata – the PMO to get indexes from

Returns:

a dictionary of indexes keyed by panel_name

static get_index_key_of_specimen_names(pmodata)[source]

Get key of specimen_name to index in pmodata[“specimen_info”]

Parameters:

pmodata – the PMO to get indexes from

Returns:

a dictionary of indexes keyed by specimen_name

static get_index_key_of_target_in_representative_microhaplotypes(pmodata)[source]

Get key of target_name to index for the representative microhaplotypes for the target_name in pmodata[“representative_microhaplotypes”]

Parameters:

pmodata – the PMO to get indexes from

Returns:

a dictionary of indexes keyed by target_name

static get_index_key_of_target_names(pmodata)[source]

Get key of target_name to index in pmodata[“target_info”]

Parameters:

pmodata – the PMO to get indexes from

Returns:

a dictionary of indexes keyed by target_name

static get_index_of_bioinformatics_run_names(pmodata, bioinformatics_run_names: list[str])[source]

Get index of bioinformatics_run_name in pmodata[“bioinformatics_run_info”]

Parameters:
  • pmodata – the PMO to get indexes from

  • bioinformatics_run_names – a list of bioinformatics_run_names

Returns:

the index of bioinformatics_run_names in pmodata[“bioinformatics_run_name”] returned in the same order as bioinformatics_run_names

static get_index_of_library_sample_names(pmodata, library_sample_names: list[str])[source]

Get index of library_sample_name in pmodata[“library_sample_info”]

Parameters:
  • pmodata – the PMO to get indexes from

  • library_sample_names – a list of library_sample_names

Returns:

the index of library_sample_names in pmodata[“library_sample_info”] returned in the same order as library_sample_names

static get_index_of_panel_names(pmodata, panel_names: list[str])[source]

Get index of panel_name in pmodata[“panel_info”]

Parameters:
  • pmodata – the PMO to get indexes from

  • panel_names – a list of panel_names

Returns:

the index of panel_names in pmodata[“panel_info”] returned in the same order as panel_names

static get_index_of_specimen_names(pmodata, specimen_names: list[str])[source]

Get index of specimen_name in pmodata[“specimen_info”]

Parameters:
  • pmodata – the PMO to get indexes from

  • specimen_names – a list of specimen_names

Returns:

the index of specimen_names in pmodata[“specimen_info”] returned in the same order as specimen_names

static get_index_of_target_in_representative_microhaplotypes(pmodata, target_names: list[str])[source]

Get index of target_name in pmodata[“representative_microhaplotypes”][“targets”]

Parameters:
  • pmodata – the PMO to get indexes from

  • target_names – a list of target_names

Returns:

the index of target_names in pmodata[“representative_microhaplotypes”][“targets”] returned in the same order as target_names

static get_index_of_target_names(pmodata, target_names: list[str])[source]

Get index of target_name in pmodata[“target_info”]

Parameters:
  • pmodata – the PMO to get indexes from

  • target_names – a list of target_names

Returns:

the index of target_names in pmodata[“target_info”] returned in the same order as target_names

static get_library_ids_for_specimen_ids(pmodata, specimen_ids: set[int])[source]

get a dictionary that lists the library_ids for a specimen_id

Parameters:
  • pmodata – the PMO to get indexes from

  • specimen_ids – a set of specimen_ids

Returns:

a dictionary that lists the library_ids for a specimen_id

static get_library_sample_names(pmodata) list[str][source]

Get a list of library_sample_names in pmodata[“library_sample_info”] in the order they appear

Parameters:

pmodata – the PMO to get library_sample_names from

Returns:

a list of all library_sample_names

static get_panel_names(pmodata) list[str][source]

Get a list of panel_names in pmodata[“panel_info”] in the order they appear

Parameters:

pmodata – the PMO to get panel_names from

Returns:

a list of all panel_names

static get_sorted_bioinformatics_run_names(pmodata) list[str][source]

Get a name sorted list of bioinformatics_run_names in pmodata[“bioinformatics_run_info”]

Parameters:

pmodata – the PMO to get bioinformatics_run_names from

Returns:

a list of all bioinformatics_run_names

static get_sorted_library_sample_names(pmodata) list[str][source]

Get a name sorted list of library_sample_names in pmodata[“library_sample_info”]

Parameters:

pmodata – the PMO to get library_sample_names from

Returns:

a list of all library_sample_names

static get_sorted_panel_names(pmodata) list[str][source]

Get a name sorted list of panel_names in pmodata[“panel_info”]

Parameters:

pmodata – the PMO to get panel_names from

Returns:

a list of all panel_names

static get_sorted_specimen_names(pmodata) list[str][source]

Get a name sorted list of specimen_names in pmodata[“specimen_info”]

Parameters:

pmodata – the PMO to get specimen_names from

Returns:

a list of all specimen_names

static get_sorted_target_names(pmodata) list[str][source]

Get a name sorted list of target_names in pmodata[“target_info”]

Parameters:

pmodata – the PMO to get target_names from

Returns:

a list of all target_names

static get_specimen_names(pmodata) list[str][source]

Get a list of specimen_names in pmodata[“specimen_info”] in the order they appear

Parameters:

pmodata – the PMO to get specimen_names from

Returns:

a list of all specimen_names

static get_target_names(pmodata) list[str][source]

Get a list of target_names in pmodata[“target_info”] in the order they appear

Parameters:

pmodata – the PMO to get target_names from

Returns:

a list of all target_names