pmotools.pmo_engine.pmo_exporter module

class pmotools.pmo_engine.pmo_exporter.BedLoc(chrom: str, start: int, end: int, name: str, score: float, strand: str, ref_seq: str, extra_info: str)[source]

Bases: NamedTuple

A single BED-format genomic location.

Used when extracting target / panel insert locations out of a PMO so they can be written to a BED file.

Variables:

chrom – chromosome / contig name
start – 0-based start position
end – end position (exclusive)
name – target name
score – BED score column; here the insert length (end - start)
strand – + or -
ref_seq – reference sequence for the insert, empty string if not loaded
extra_info – free-text key/value annotation, e.g. genome name/version

chrom: str: Alias for field number 0

end: int: Alias for field number 2

extra_info: str: Alias for field number 7

name: str: Alias for field number 3

ref_seq: str: Alias for field number 6

score: float: Alias for field number 4

start: int: Alias for field number 1

strand: str: Alias for field number 5

class pmotools.pmo_engine.pmo_exporter.PMOExporter[source]

Bases: object

A collection of functions to export information out of a PMO

class SheetConfig(sheet_name: str, df: DataFrame, max_row_check: int | None = None, specific_cols: list[str] | None = None)[source]

Bases: object

Configuration for writing a DataFrame to an Excel sheet.

df: DataFrame

max_row_check: int | None = None

sheet_name: str

specific_cols: list[str] | None = None

static export_bioinformatics_methods_info_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the bioinformatics_methods_info meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the library_sample metadata

static export_bioinformatics_run_info_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the bioinformatics_run_info meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the library_sample metadata

static export_library_sample_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the library_sample meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the library_sample metadata

static export_panel_info_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the panel meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the panel metadata

static export_pmo_header_table(pmodata, separator: str = ',') → DataFrame[source]

Export the pmo header meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the genomes metadata

static export_project_info_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the project_info meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the project_info metadata

static export_sequencing_info_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the sequencing_info meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the sequencing_info metadata

static export_specimen_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the specimen meta information of a PMO to a dataframe Currently avoiding exporting values of complex object types like TravelInfo or Parasite densities, best to export such values in their own tables

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the specimen metadata

static export_specimen_travel_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the specimen meta information of a PMO to a dataframe Currently avoiding exporting values of complex object types like TravelInfo or Parasite densities, best to export such values in their own tables

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the specimen metadata

static export_target_info_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the target meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the panel metadata

static export_targeted_genomes_meta_table(pmodata, separator: str = ',') → DataFrame[source]

Export the targeted genomes meta information of a PMO to a dataframe

Parameters:

pmodata – the pmo export the information from
separator – the separator to use for list values

Returns:

a pandas dataframe of the genomes metadata

static export_to_excel(pmo, output_path: str) → None[source]

Export a PMO object to a multi-sheet Excel file.

Parameters:

pmo – The PMO object to export.
output_path – The path to write the Excel file to.

static extract_alleles_per_sample_table(pmodata, additional_specimen_info_fields: list[str] = None, additional_library_sample_info_fields: list[str] = None, additional_microhap_fields: list[str] = None, additional_representative_info_fields: list[str] = None, default_base_col_names: list[str] = ['library_sample_name', 'target_name', 'seq'], jsonschema_fnp='/home/runner/work/pmotools-python/pmotools-python/src/schemas/portable_microhaplotype_object_v1.1.0.schema.json', validate_pmo: bool = False) → DataFrame[source]

Create a pd.Dataframe of sample, target and allele. Can optionally add on any other additional fields

Parameters:

pmodata – the data to write from
additional_specimen_info_fields – any additional fields to write from the specimen_info object
additional_library_sample_info_fields – any additional fields to write from the library_samples object
additional_microhap_fields – any additional fields to write from the microhap object
additional_representative_info_fields – any additional fields to write from the representative_microhaplotype_sequences object
default_base_col_names – The default column name for the library_sample_name, target_name and seq
jsonschema_fnp – path to the jsonschema schema file to validate the PMO against
validate_pmo – whether to validate the PMO with a jsonschema

Returns:

pandas dataframe

static extract_panels_insert_bed_loc(pmodata, select_panel_ids: list[int] = None, sort_output: bool = True)[source]

Extract out of a PMO the insert location for panels, will add ref seq if loaded into PMO

Parameters:

pmodata – the PMO to extract from
select_panel_ids – a list of panels ids to select, if None will select all panels
sort_output – whether to sort output by genomic location

Returns:

a list of target inserts, with named tuples with fields: chrom, start, end, name, score, strand, ref_seq, extra_info

static extract_targets_insert_bed_loc(pmodata, select_target_ids: list[int] = None, sort_output: bool = True)[source]

Extract out of a PMO the insert location for targets, will add ref seq if loaded into PMO

Parameters:

pmodata – the PMO to extract from
select_target_ids – a list of target ids to select, if None will select all targets
sort_output – whether to sort output by genomic location

Returns:

a list of target inserts, with named tuples with fields: chrom, start, end, name, score, strand, ref_seq, extra_info

static list_library_sample_names_per_specimen_name(pmodata, select_specimen_ids: list[int] = None, select_specimen_names: list[str] = None) → DataFrame[source]

List all the library_sample_names per specimen_name

Parameters:

pmodata – the PMO
select_specimen_ids – a list of specimen_ids to select, if None, all specimen_ids are used
select_specimen_names – a list of specimen_names to select, if None, all specimen_names are used

Returns:

a pandas dataframe with 3 columns, specimen_id, library_sample_id, and library_sample_id_count(the number of library_sample_ids per specimen_id)

static write_bed_locs(bed_locs: list[pmotools.pmo_engine.pmo_exporter.BedLoc], fnp, add_header: bool = False)[source]

Write out a list of BedLoc to a file, will auto overwrite it

Parameters:

bed_locs – a list of BedLoc
fnp – output file path, will be overwritten if it exists
add_header – add header of #chrom,start end,name,score,strand,ref_seq,extra_info, starts with comment so tools will treat it as a comment line