Portable Microhaplotype Object (PMO)
  • Home
  • Format Info
    • Development of Format
    • PMO fields overview
    • PMO Examples
    • Format Overview For Developers
  • Tools Installation
    • pmotools-python installation
  • pmotools-python usages
    • Command line interface

    • pmotools-python
    • Command line interface to pmotools-python with pmotools-python
    • Extracting out of PMO
    • Extracting allele tables using pmotools-python
    • Subset PMO
    • Subsetting from a PMO using pmotools-python
    • Getting sub info from PMO
    • Getting basic info out of PMO using pmotools-python
    • Getting panel info out of PMO using pmotools-python
    • Handling Multiple PMOs
    • Handling multiple PMOs pmotools-python
    • Validating PMO files
    • Validating PMOs pmotools-python

    • Python interface
    • Getting basic info out of a PMO
    • Creating a PMO File
  • Resources
    • References
    • Documentation
    • Documentation Source Code
    • Comment or Report an issue for Documentation

    • pmotools-python
    • pmotools-python Source Code
    • Comment or Report an issue for pmotools-python

Contents

  • Extract basic info counts from PMO
    • list_specimen_meta_fields
    • count_specimen_meta
    • list_library_sample_names_per_specimen_name
    • list_bioinformatics_run_names
    • count_targets_per_library_sample
    • count_library_samples_per_target

Getting basic info out of PMO using pmotools-python

  • Show All Code
  • Hide All Code

  • View Source

Extract basic info counts from PMO

To get simple counts of number of targets with sample counts, samples with target counts, the counts of meta fields

Most of these basic info extractor can be found underneath extract_basic_info_from_pmo

Code
pmotools-python
pmotools-python v0.1.0 - A suite of tools for interacting with Portable Microhaplotype Object (PMO) file format

Available functions organized by groups are
convertors_to_json
    text_meta_to_json_meta - Convert text file meta to JSON Meta
    excel_meta_to_json_meta - Convert Excel file meta to JSON Meta
    microhaplotype_table_to_json_file - Convert microhaplotype table to a JSON file
    terra_amp_output_to_json - Convert Terra output to JSON sequence table

extractors_from_pmo
    extract_pmo_with_selected_meta - Extract samples + haplotypes using selected meta
    extract_pmo_with_select_specimen_names - Extract specific samples from the specimens table
    extract_pmo_with_select_library_sample_names - Extract experiment sample names from experiment_info table
    extract_pmo_with_select_targets - Extract specific targets
    extract_pmo_with_read_filter - Extract with a read filter
    extract_allele_table - Extract allele tables for tools like dcifer or moire
    extract_insert_of_panels - Extract inserts of panels from a PMO
    extract_refseq_of_inserts_of_panels - Extract ref_seq of panel inserts from a PMO

working_with_multiple_pmos
    combine_pmos - Combine multiple PMOs of the same panel

extract_basic_info_from_pmo
    list_library_sample_names_per_specimen_name - List experiment_sample_ids per specimen_id
    list_specimen_meta_fields - List specimen meta fields in the specimen_info section
    list_bioinformatics_run_names - List all tar_amp_bioinformatics_info_ids in a PMO
    count_specimen_meta - Count values of selected specimen meta fields
    count_targets_per_library_sample - Count number of targets per sample
    count_library_samples_per_target - Count number of samples per target

validation
    validate_pmo - Validate a PMO file against a JSON Schema

Getting files for examples

Code
cd example 

wget https://plasmogenepi.github.io/PMO_Docs/format/moz2018_PMO.json.gz
wget https://plasmogenepi.github.io/PMO_Docs/format/PathWeaverHeome1_PMO.json.gz

list_specimen_meta_fields

This will list all the meta fields within the specimen_infos section of a PMO file. Since not all meta fields are always present in all specimens, this will list the count of samples each field appears in and the number of total specimens

Code
pmotools-python list_specimen_meta_fields -h 
usage: pmotools-python list_specimen_meta_fields [-h] --file FILE
                                                 [--output OUTPUT]
                                                 [--delim DELIM] [--overwrite]

options:
  -h, --help       show this help message and exit
  --file FILE      PMO file
  --output OUTPUT  output file
  --delim DELIM    the delimiter of the output text file, examples input
                   tab,comma but can also be the actual delimiter
  --overwrite      If output file exists, overwrite it

The python code for list_specimen_meta_fields script is below

Code
pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_specimen_meta_fields.py
#!/usr/bin/env python3
import argparse
import sys


from pmotools.pmo_engine.pmo_processor import PMOProcessor
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.utils.small_utils import Utils


def parse_args_list_specimen_meta_fields():
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", type=str, required=True, help="PMO file")
    parser.add_argument(
        "--output", type=str, default="STDOUT", required=False, help="output file"
    )
    parser.add_argument(
        "--delim",
        default="tab",
        type=str,
        required=False,
        help="the delimiter of the output text file, examples input tab,comma but can also be the actual delimiter",
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="If output file exists, overwrite it"
    )

    return parser.parse_args()


def list_specimen_meta_fields():
    args = parse_args_list_specimen_meta_fields()

    # check files
    output_delim, output_extension = Utils.process_delimiter_and_output_extension(
        args.delim, gzip=args.output.endswith(".gz")
    )
    args.output = (
        args.output
        if "STDOUT" == args.output
        else Utils.appendStrAsNeeded(args.output, output_extension)
    )
    Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)

    # read in PMO
    pmo = PMOReader.read_in_pmo(args.file)

    # count fields
    counts_df = PMOProcessor.count_specimen_per_meta_fields(pmo)

    # output
    counts_df.to_csv(
        sys.stdout if "STDOUT" == args.output else args.output,
        sep=output_delim,
        index=False,
    )


if __name__ == "__main__":
    list_specimen_meta_fields()
Code
cd example 
pmotools-python list_specimen_meta_fields --file ../../format/moz2018_PMO.json.gz
field   present_in_specimens_count  total_specimen_count
collection_country  124 124
collection_date 124 124
geo_admin3  124 124
host_taxon_id   124 124
lat_lon 124 124
parasite_density_info   123 124
project_id  124 124
specimen_collect_device 124 124
specimen_name   124 124
specimen_store_loc  124 124
specimen_taxon_id   124 124
storage_plate_info  81  124
Code
cd example 
pmotools-python list_specimen_meta_fields --file ../../format/moz2018_PMO.json.gz --output spec_fields_moz2018_PMO.tsv --overwrite

count_specimen_meta

This will list all the meta values (and the combinations) for the meta fields within the specimen_infos section of a PMO file.

Code
pmotools-python count_specimen_meta -h 
usage: pmotools-python count_specimen_meta [-h] --file FILE [--output OUTPUT]
                                           [--delim DELIM] [--overwrite]
                                           --meta_fields META_FIELDS

options:
  -h, --help            show this help message and exit
  --file FILE           PMO file
  --output OUTPUT       output file
  --delim DELIM         the delimiter of the output text file, examples input
                        tab,comma but can also be the actual delimiter
  --overwrite           If output file exists, overwrite it
  --meta_fields META_FIELDS
                        the fields to count the subfields of, can supply
                        multiple separated by commas, e.g. --meta_fields
                        collection_country,collection_date

The python code for count_specimen_meta script is below

Code
pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_specimen_meta.py
#!/usr/bin/env python3
import argparse
import sys


from pmotools.pmo_engine.pmo_processor import PMOProcessor
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.utils.small_utils import Utils


def parse_args_count_specimen_meta():
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", type=str, required=True, help="PMO file")
    parser.add_argument(
        "--output", type=str, default="STDOUT", required=False, help="output file"
    )
    parser.add_argument(
        "--delim",
        default="tab",
        type=str,
        required=False,
        help="the delimiter of the output text file, examples input tab,comma but can also be the actual delimiter",
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="If output file exists, overwrite it"
    )
    parser.add_argument(
        "--meta_fields",
        type=str,
        required=True,
        help="the fields to count the subfields of, can supply multiple separated by commas, e.g. --meta_fields collection_country,collection_date",
    )

    return parser.parse_args()


def count_specimen_meta():
    args = parse_args_count_specimen_meta()

    # check files
    output_delim, output_extension = Utils.process_delimiter_and_output_extension(
        args.delim, gzip=args.output.endswith(".gz")
    )
    args.output = (
        args.output
        if "STDOUT" == args.output
        else Utils.appendStrAsNeeded(args.output, output_extension)
    )
    Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)

    # process the meta_fields argument
    meta_fields_toks = args.meta_fields.split(",")

    # read in PMO
    pmo = PMOReader.read_in_pmo(args.file)

    # count sub-fields
    counts_df = PMOProcessor.count_specimen_by_field_value(pmo, meta_fields_toks)

    # write out
    counts_df.to_csv(
        sys.stdout if "STDOUT" == args.output else args.output,
        sep=output_delim,
        index=False,
    )


if __name__ == "__main__":
    count_specimen_meta()
Code
cd example 
pmotools-python count_specimen_meta --file ../../format/moz2018_PMO.json.gz --meta_fields collection_country
collection_country  specimens_count specimens_freq  total_specimen_count
Mozambique  81  0.6532258064516129  124
NA  43  0.3467741935483871  124
Code
cd example 
pmotools-python count_specimen_meta --file ../../format/moz2018_PMO.json.gz --meta_fields collection_country --overwrite --output collection_country_count_moz2018_PMO.tsv.gz 
Code
cd example 
pmotools-python count_specimen_meta --file ../../format/moz2018_PMO.json.gz --meta_fields collection_country,geo_admin3
collection_country  geo_admin3  specimens_count specimens_freq  total_specimen_count
Mozambique  Inhassoro   27  0.21774193548387097 124
Mozambique  Mandlakazi  28  0.22580645161290322 124
Mozambique  Namaacha    26  0.20967741935483872 124
NA  NA  43  0.3467741935483871  124
Code
cd example 
pmotools-python count_specimen_meta --file ../../format/PathWeaverHeome1_PMO.json.gz --meta_fields collection_country,collection_date | head
collection_country  collection_date specimens_count specimens_freq  total_specimen_count
Bangladesh  2008    15  0.0007718828796377296   19433
Bangladesh  2009    16  0.000823341738280245    19433
Bangladesh  2012    8   0.0004116708691401225   19433
Bangladesh  2012-04-19  1   5.145885864251531e-05   19433
Bangladesh  2012-06-05  2   0.00010291771728503062  19433
Bangladesh  2012-06-13  1   5.145885864251531e-05   19433
Bangladesh  2012-06-17  1   5.145885864251531e-05   19433
Bangladesh  2012-07-17  1   5.145885864251531e-05   19433
Bangladesh  2012-07-23  1   5.145885864251531e-05   19433
Traceback (most recent call last):
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/bin/pmotools-python", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/pmotools-python/src/pmotools/cli.py", line 366, in main
    handler()
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_specimen_meta.py", line 61, in count_specimen_meta
    counts_df.to_csv(
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/core/generic.py", line 3967, in to_csv
    return DataFrameRenderer(formatter).to_csv(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/io/formats/format.py", line 1014, in to_csv
    csv_formatter.save()
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/io/formats/csvs.py", line 270, in save
    self._save()
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/io/formats/csvs.py", line 275, in _save
    self._save_body()
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/io/formats/csvs.py", line 313, in _save_body
    self._save_chunk(start_i, end_i)
  File "/Users/nick/projects/plasmodium/falciparum/PMO_Docs/PMO_Docs_deployment/PMO_Docs/env/lib/python3.12/site-packages/pandas/io/formats/csvs.py", line 324, in _save_chunk
    libwriters.write_csv_rows(
  File "writers.pyx", line 73, in pandas._libs.writers.write_csv_rows
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe

list_library_sample_names_per_specimen_name

Specimens can have multiple library sample names, so it might be helpful to list out all the library_names per specimens

Code
pmotools-python list_library_sample_names_per_specimen_name -h 
usage: pmotools-python list_library_sample_names_per_specimen_name
       [-h] --file FILE [--output OUTPUT] [--delim DELIM] [--overwrite]

options:
  -h, --help       show this help message and exit
  --file FILE      PMO file
  --output OUTPUT  output file
  --delim DELIM    the delimiter of the output text file, examples input
                   tab,comma but can also be the actual delimiter
  --overwrite      If output file exists, overwrite it

The python code for list_library_sample_names_per_specimen_name script is below

Code
pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_library_sample_names_per_specimen_name.py
#!/usr/bin/env python3
import argparse
import sys


from pmotools.pmo_engine.pmo_processor import PMOProcessor
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.utils.small_utils import Utils


def parse_args_list_library_sample_names_per_specimen_name():
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", type=str, required=True, help="PMO file")
    parser.add_argument(
        "--output", type=str, default="STDOUT", required=False, help="output file"
    )
    parser.add_argument(
        "--delim",
        default="tab",
        type=str,
        required=False,
        help="the delimiter of the output text file, examples input tab,comma but can also be the actual delimiter",
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="If output file exists, overwrite it"
    )

    return parser.parse_args()


def list_library_sample_names_per_specimen_name():
    args = parse_args_list_library_sample_names_per_specimen_name()

    # check files
    output_delim, output_extension = Utils.process_delimiter_and_output_extension(
        args.delim, gzip=args.output.endswith(".gz")
    )
    args.output = (
        args.output
        if "STDOUT" == args.output
        else Utils.appendStrAsNeeded(args.output, output_extension)
    )
    Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)

    # read in PMO
    pmo = PMOReader.read_in_pmo(args.file)

    # count fields
    info_df = PMOProcessor.list_library_sample_names_per_specimen_name(pmo)

    # output
    info_df.to_csv(
        sys.stdout if "STDOUT" == args.output else args.output,
        sep=output_delim,
        index=False,
    )


if __name__ == "__main__":
    list_library_sample_names_per_specimen_name()
Code
cd example 
pmotools-python list_library_sample_names_per_specimen_name --file ../../format/moz2018_PMO.json.gz 
specimen_name   library_sample_name library_sample_count
8025874217  8025874217  1
8025874237  8025874237  1
8025874250  8025874250  1
8025874300  8025874300  1
8025874321  8025874321  1
8025874349  8025874349  1
8025874366  8025874366  1
8025874377  8025874377  1
8025874411  8025874411  1
8025874421  8025874421  1
8025874463  8025874463  1
8025874482  8025874482  1
8025874507  8025874507  1
8025874578  8025874578  1
8025874627  8025874627  1
8025874637  8025874637  1
8025874665  8025874665  1
8025874669  8025874669  1
8025874714  8025874714  1
8025874729  8025874729  1
8025874809  8025874809  1
8025874865  8025874865  1
8025874899  8025874899  1
8025874940  8025874940  1
8034209589  8034209589  1
8034209790  8034209790  1
8034209228  8034209228  1
8025874253  8025874253  1
8025874261  8025874261  1
8025874382  8025874382  1
8025874457  8025874457  1
8025874502  8025874502  1
8025874526  8025874526  1
8025874537  8025874537  1
8025874586  8025874586  1
8025874589  8025874589  1
8025874636  8025874636  1
8025874829  8025874829  1
8025874849  8025874849  1
8025874875  8025874875  1
8025874975  8025874975  1
8025874988  8025874988  1
8025875052  8025875052  1
8025875059  8025875059  1
8025875140  8025875140  1
8025875144  8025875144  1
8025875145  8025875145  1
8025875146  8025875146  1
8025875166  8025875166  1
8034209115  8034209115  1
8034209281  8034209281  1
8034209465  8034209465  1
8034209834  8034209834  1
8025874494  8025874494  1
8025874536  8025874536  1
8025874266  8025874266  1
8025874271  8025874271  1
8025874316  8025874316  1
8025874330  8025874330  1
8025874340  8025874340  1
8025874357  8025874357  1
8025874376  8025874376  1
8025874380  8025874380  1
8025874387  8025874387  1
8025874419  8025874419  1
8025874435  8025874435  1
8025874447  8025874447  1
8025874672  8025874672  1
8025874706  8025874706  1
8025874738  8025874738  1
8025874928  8025874928  1
8025874933  8025874933  1
8025874973  8025874973  1
8025875029  8025875029  1
8025875042  8025875042  1
8025875121  8025875121  1
8025875170  8025875170  1
8034209773  8034209773  1
8034209803  8034209803  1
8034209818  8034209818  1
8025874297  8025874297  1
8025874231  8025874231  1
8025874234  8025874234  1
8025874286  8025874286  1
8025874343  8025874343  1
8025874348  8025874348  1
8025874352  8025874352  1
8025874396  8025874396  1
8025874405  8025874405  1
8025874437  8025874437  1
8025874452  8025874452  1
8025874454  8025874454  1
8025874484  8025874484  1
8025874491  8025874491  1
8025874499  8025874499  1
8025874568  8025874568  1
8025874585  8025874585  1
8025874591  8025874591  1
8025874613  8025874613  1
8025874662  8025874662  1
8025874675  8025874675  1
8025874676  8025874676  1
8025874701  8025874701  1
8025874705  8025874705  1
8025874720  8025874720  1
8025874872  8025874872  1
8025874877  8025874877  1
8025874878  8025874878  1
8025874879  8025874879  1
8025874886  8025874886  1
8025874888  8025874888  1
8025874903  8025874903  1
8025874931  8025874931  1
8025874948  8025874948  1
8025874956  8025874956  1
8025875065  8025875065  1
8025875112  8025875112  1
8025875165  8025875165  1
SS1-10K-C1  SS1-10K-C1  1
SS2-10K-C1  SS2-10K-C1  1
SS3-10K-C1  SS3-10K-C1  1
SS4-10K-C1  SS4-10K-C1  1
SS5-10K-C1  SS5-10K-C1  1
8025875168  8025875168  1
Code
cd example 
pmotools-python list_library_sample_names_per_specimen_name --file ../../format/moz2018_PMO.json.gz --overwrite --output library_samples_per_specimen_moz2018_PMO.tsv.gz 

list_bioinformatics_run_names

This will simply list out all the analyses (all the bioinformatics_run_namess) stored within a PMO

Code
pmotools-python list_bioinformatics_run_names -h 
usage: pmotools-python list_bioinformatics_run_names [-h] --file FILE
                                                     [--output OUTPUT]
                                                     [--overwrite]

options:
  -h, --help       show this help message and exit
  --file FILE      PMO file
  --output OUTPUT  output file
  --overwrite      If output file exists, overwrite it

The python code for list_bioinformatics_run_names script is below

Code
pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_bioinformatics_run_names.py
#!/usr/bin/env python3
import argparse
import sys


from pmotools.pmo_engine.pmo_processor import PMOProcessor
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.utils.small_utils import Utils


def parse_args_list_bioinformatics_run_names():
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", type=str, required=True, help="PMO file")
    parser.add_argument(
        "--output", type=str, default="STDOUT", required=False, help="output file"
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="If output file exists, overwrite it"
    )

    return parser.parse_args()


def list_bioinformatics_run_names():
    args = parse_args_list_bioinformatics_run_names()

    # check files
    Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)

    # read in PMO
    pmo = PMOReader.read_in_pmo(args.file)

    # extract all bio run names
    bio_run_names = PMOProcessor.get_bioinformatics_run_names(pmo)

    # write
    output_target = sys.stdout if args.output == "STDOUT" else open(args.output, "w")
    with output_target as f:
        f.write("\n".join(bio_run_names) + "\n")


if __name__ == "__main__":
    list_bioinformatics_run_names()
Code
cd example 
pmotools-python list_bioinformatics_run_names --file ../../format/moz2018_PMO.json.gz  
Mozambique2018-SeekDeep
Code
cd example 
pmotools-python list_bioinformatics_run_names --file ../../format/PathWeaverHeome1_PMO.json.gz 
PathWeaver-Heome1

This can be helpful after combining PMOs

Code
cd example 

pmotools-python combine_pmos --pmo_files ../../format/moz2018_PMO.json.gz,../../format/PathWeaverHeome1_PMO.json.gz --output combined_Heome1_PMO.json.gz --overwrite

pmotools-python list_bioinformatics_run_names --file combined_Heome1_PMO.json.gz 
Mozambique2018-SeekDeep
PathWeaver-Heome1

count_targets_per_library_sample

Count up the number targets each library_sample_id has. A read filter can be applied to see how targets would be kept if such a filter was applied

Code
pmotools-python count_targets_per_library_sample -h 
usage: pmotools-python count_targets_per_library_sample [-h] --file FILE
                                                        [--output OUTPUT]
                                                        [--delim DELIM]
                                                        [--overwrite]
                                                        [--read_count_minimum READ_COUNT_MINIMUM]

options:
  -h, --help            show this help message and exit
  --file FILE           PMO file
  --output OUTPUT       output file
  --delim DELIM         the delimiter of the output text file, examples input
                        tab,comma but can also be the actual delimiter
  --overwrite           If output file exists, overwrite it
  --read_count_minimum READ_COUNT_MINIMUM
                        the minimum read count (inclusive) to be counted as
                        covered by sample

The python code for count_targets_per_library_sample script is below

Code
pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_targets_per_library_sample.py
#!/usr/bin/env python3
import argparse
import sys


from pmotools.pmo_engine.pmo_processor import PMOProcessor
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.utils.small_utils import Utils


def parse_args_count_targets_per_library_sample():
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", type=str, required=True, help="PMO file")
    parser.add_argument(
        "--output", type=str, default="STDOUT", required=False, help="output file"
    )
    parser.add_argument(
        "--delim",
        default="tab",
        type=str,
        required=False,
        help="the delimiter of the output text file, examples input tab,comma but can also be the actual delimiter",
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="If output file exists, overwrite it"
    )
    parser.add_argument(
        "--read_count_minimum",
        default=0.0,
        type=float,
        required=False,
        help="the minimum read count (inclusive) to be counted as covered by sample",
    )

    return parser.parse_args()


def count_targets_per_library_sample():
    args = parse_args_count_targets_per_library_sample()

    # check files
    output_delim, output_extension = Utils.process_delimiter_and_output_extension(
        args.delim, gzip=args.output.endswith(".gz")
    )
    args.output = (
        args.output
        if "STDOUT" == args.output
        else Utils.appendStrAsNeeded(args.output, output_extension)
    )
    Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)

    # read in PMO
    pmo = PMOReader.read_in_pmo(args.file)

    # count
    counts_df = PMOProcessor.count_targets_per_library_sample(
        pmo, args.read_count_minimum
    )

    # write out
    counts_df.to_csv(
        sys.stdout if "STDOUT" == args.output else args.output,
        sep=output_delim,
        index=False,
    )


if __name__ == "__main__":
    count_targets_per_library_sample()
Code
cd example 

pmotools-python count_targets_per_library_sample --file ../../format/moz2018_PMO.json.gz  | head
bioinformatics_run_id   library_sample_name target_number
0   8025875168  47
0   8025874536  10
0   8025874494  23
0   8025874297  8
0   SS4-10K-C1  99
0   SS3-10K-C1  98
0   SS2-10K-C1  98
0   8034209818  99
0   8034209790  98

Apply a read count minimum filter (this a total read count summed for a target and not on a haplotype level)

Code
cd example 

pmotools-python count_targets_per_library_sample --read_count_minimum 3000 --file ../../format/moz2018_PMO.json.gz  | head
bioinformatics_run_id   library_sample_name target_number
0   8025875168  0
0   8025874536  0
0   8025874494  0
0   8025874297  0
0   SS4-10K-C1  97
0   SS3-10K-C1  96
0   SS2-10K-C1  97
0   8034209818  97
0   8034209790  97

count_library_samples_per_target

Count up the number of library_sample_ids each target has. A read filter can be applied to see how many samples a taget would have if a filter was applied

Code
pmotools-python count_library_samples_per_target -h 
usage: pmotools-python count_library_samples_per_target [-h] --file FILE
                                                        [--output OUTPUT]
                                                        [--delim DELIM]
                                                        [--overwrite]
                                                        [--read_count_minimum READ_COUNT_MINIMUM]

options:
  -h, --help            show this help message and exit
  --file FILE           PMO file
  --output OUTPUT       output file
  --delim DELIM         the delimiter of the output text file, examples input
                        tab,comma but can also be the actual delimiter
  --overwrite           If output file exists, overwrite it
  --read_count_minimum READ_COUNT_MINIMUM
                        the minimum read count (inclusive) to be counted as
                        covered by sample

The python code for count_library_samples_per_target script is below

Code
pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_library_samples_per_target.py
#!/usr/bin/env python3
import argparse
import sys


from pmotools.pmo_engine.pmo_processor import PMOProcessor
from pmotools.pmo_engine.pmo_reader import PMOReader
from pmotools.utils.small_utils import Utils


def parse_args_count_library_samples_per_target():
    parser = argparse.ArgumentParser()
    parser.add_argument("--file", type=str, required=True, help="PMO file")
    parser.add_argument(
        "--output", type=str, default="STDOUT", required=False, help="output file"
    )
    parser.add_argument(
        "--delim",
        default="tab",
        type=str,
        required=False,
        help="the delimiter of the output text file, examples input tab,comma but can also be the actual delimiter",
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="If output file exists, overwrite it"
    )
    parser.add_argument(
        "--read_count_minimum",
        default=0.0,
        type=float,
        required=False,
        help="the minimum read count (inclusive) to be counted as covered by sample",
    )

    return parser.parse_args()


def count_library_samples_per_target():
    args = parse_args_count_library_samples_per_target()

    # check files
    output_delim, output_extension = Utils.process_delimiter_and_output_extension(
        args.delim, gzip=args.output.endswith(".gz")
    )
    args.output = (
        args.output
        if "STDOUT" == args.output
        else Utils.appendStrAsNeeded(args.output, output_extension)
    )
    Utils.inputOutputFileCheck(args.file, args.output, args.overwrite)

    # read in PMO
    pmo = PMOReader.read_in_pmo(args.file)

    # count
    counts_df = PMOProcessor.count_library_samples_per_target(
        pmo, args.read_count_minimum
    )

    # write out
    counts_df.to_csv(
        sys.stdout if "STDOUT" == args.output else args.output,
        sep=output_delim,
        index=False,
    )


if __name__ == "__main__":
    count_library_samples_per_target()
Code
cd example 

pmotools-python count_library_samples_per_target --file ../../format/moz2018_PMO.json.gz  | head
bioinformatics_run_id   target_name sample_count
0   t1  119
0   t10 117
0   t100    124
0   t11 120
0   t12 119
0   t13 124
0   t14 118
0   t15 119
0   t16 121

Apply a read count minimum filter (this a total read count summed for a target and not on a haplotype level)

Code
cd example 

pmotools-python count_library_samples_per_target --read_count_minimum 3000 --file ../../format/moz2018_PMO.json.gz  | head
bioinformatics_run_id   target_name sample_count
0   t1  108
0   t10 107
0   t100    107
0   t11 111
0   t12 104
0   t13 105
0   t14 110
0   t15 110
0   t16 106
Source Code
---
title: Getting basic info out of PMO using pmotools-python
---

```{r setup, echo=F}
source("../common.R")
```

# Extract basic info counts from PMO 


To get simple counts of number of targets with sample counts, samples with target counts, the counts of meta fields 

Most of these basic info extractor can be found underneath `extract_basic_info_from_pmo`

```{bash, eval = F}
pmotools-python
```

```{bash, echo = F}
pmotools-python | perl -pe 's/\e\[[0-9;]*m(?:\e\[K)?//g'
```


Getting files for examples 


```{bash, eval = F}
cd example 

wget https://plasmogenepi.github.io/PMO_Docs/format/moz2018_PMO.json.gz
wget https://plasmogenepi.github.io/PMO_Docs/format/PathWeaverHeome1_PMO.json.gz



```




## list_specimen_meta_fields


This will list all the meta fields within the `specimen_infos` section of a PMO file. Since not all meta fields are always present in all specimens, this will list the count of samples each field appears in and the number of total specimens  

```{bash}
pmotools-python list_specimen_meta_fields -h 
```


The python code for `list_specimen_meta_fields` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_specimen_meta_fields.py
#| file: ../pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_specimen_meta_fields.py
```

```{bash}
cd example 
pmotools-python list_specimen_meta_fields --file ../../format/moz2018_PMO.json.gz

```

```{bash}
cd example 
pmotools-python list_specimen_meta_fields --file ../../format/moz2018_PMO.json.gz --output spec_fields_moz2018_PMO.tsv --overwrite

```




## count_specimen_meta


This will list all the meta values (and the combinations) for the meta fields within the `specimen_infos` section of a PMO file.


```{bash}
pmotools-python count_specimen_meta -h 
```


The python code for `count_specimen_meta` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_specimen_meta.py
#| file: ../pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_specimen_meta.py
```


```{bash}
cd example 
pmotools-python count_specimen_meta --file ../../format/moz2018_PMO.json.gz --meta_fields collection_country

```

```{bash}
cd example 
pmotools-python count_specimen_meta --file ../../format/moz2018_PMO.json.gz --meta_fields collection_country --overwrite --output collection_country_count_moz2018_PMO.tsv.gz 

```

```{bash}
cd example 
pmotools-python count_specimen_meta --file ../../format/moz2018_PMO.json.gz --meta_fields collection_country,geo_admin3

```

```{bash}
cd example 
pmotools-python count_specimen_meta --file ../../format/PathWeaverHeome1_PMO.json.gz --meta_fields collection_country,collection_date | head

```


## list_library_sample_names_per_specimen_name

Specimens can have multiple library sample names, so it might be helpful to list out all the library_names per specimens 

```{bash}
pmotools-python list_library_sample_names_per_specimen_name -h 
```


The python code for `list_library_sample_names_per_specimen_name` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_library_sample_names_per_specimen_name.py
#| file: ../pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_library_sample_names_per_specimen_name.py
```


```{bash}
cd example 
pmotools-python list_library_sample_names_per_specimen_name --file ../../format/moz2018_PMO.json.gz 

```

```{bash}
cd example 
pmotools-python list_library_sample_names_per_specimen_name --file ../../format/moz2018_PMO.json.gz --overwrite --output library_samples_per_specimen_moz2018_PMO.tsv.gz 

```

## list_bioinformatics_run_names 


This will simply list out all the analyses (all the `bioinformatics_run_names`s) stored within a PMO 


```{bash}
pmotools-python list_bioinformatics_run_names -h 
```


The python code for `list_bioinformatics_run_names` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_bioinformatics_run_names.py
#| file: ../pmotools-python/src/pmotools/scripts/extract_info_from_pmo/list_bioinformatics_run_names.py
```



```{bash}
cd example 
pmotools-python list_bioinformatics_run_names --file ../../format/moz2018_PMO.json.gz  

```

```{bash}
cd example 
pmotools-python list_bioinformatics_run_names --file ../../format/PathWeaverHeome1_PMO.json.gz 

```

This can be helpful after combining PMOs 


```{bash, eval = F}
cd example 

pmotools-python combine_pmos --pmo_files ../../format/moz2018_PMO.json.gz,../../format/PathWeaverHeome1_PMO.json.gz --output combined_Heome1_PMO.json.gz --overwrite

pmotools-python list_bioinformatics_run_names --file combined_Heome1_PMO.json.gz 

```

```{bash, echo = F}
cd example 
pmotools-python list_bioinformatics_run_names --file combined_Heome1_PMO.json.gz 

```



## count_targets_per_library_sample 

Count up the number targets each library_sample_id has. A read filter can be applied to see how targets would be kept if such a filter was applied 

```{bash}
pmotools-python count_targets_per_library_sample -h 
```


The python code for `count_targets_per_library_sample` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_targets_per_library_sample.py
#| file: ../pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_targets_per_library_sample.py
```

```{bash}
cd example 

pmotools-python count_targets_per_library_sample --file ../../format/moz2018_PMO.json.gz  | head

```

Apply a read count minimum filter (this a total read count summed for a target and not on a haplotype level) 

```{bash}
cd example 

pmotools-python count_targets_per_library_sample --read_count_minimum 3000 --file ../../format/moz2018_PMO.json.gz  | head

```

## count_library_samples_per_target  


Count up the number of library_sample_ids each target has. A read filter can be applied to see how many samples a taget would have if a filter was applied 


```{bash}
pmotools-python count_library_samples_per_target -h 
```


The python code for `count_library_samples_per_target` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_library_samples_per_target.py
#| file: ../pmotools-python/src/pmotools/scripts/extract_info_from_pmo/count_library_samples_per_target.py
```



```{bash}
cd example 

pmotools-python count_library_samples_per_target --file ../../format/moz2018_PMO.json.gz  | head

```

Apply a read count minimum filter (this a total read count summed for a target and not on a haplotype level) 

```{bash}
cd example 

pmotools-python count_library_samples_per_target --read_count_minimum 3000 --file ../../format/moz2018_PMO.json.gz  | head

```