Portable Microhaplotype Object (PMO)
  • Home
  • Format Info
    • Development of Format
    • PMO fields overview
    • PMO Examples
    • Format Overview For Developers
  • Tools Installation
    • pmotools-python installation
  • pmotools-python usages
    • Command line interface

    • pmotools-runner.py
    • Command line interface to pmotools-python with pmotools-runner.py
    • Extracting out of PMO
    • Extracting allele tables using pmotools-python
    • Subset PMO
    • Subsetting from a PMO using pmotools-python
    • Getting sub info from PMO
    • Getting basic info out of PMO using pmotools-python
    • Getting panel info out of PMO using pmotools-python
    • Handling Multiple PMOs
    • Handling multiple PMOs pmotools-python

    • Python interface
    • Getting basic info out of a PMO
    • Creating a PMO File
  • Resources
    • References

Contents

  • pmotools-runner.py extract_allele_table
    • Creating output for MOIRE
    • Creating output for dcifer

Extracting allele tables using pmotools-python

  • Show All Code
  • Hide All Code

  • View Source

pmotools-runner.py extract_allele_table

To extract allele table information from a PMO the command line interactive script with pmotools-runner.py extract_allele_table can be used

  • pmotools-runner.py extract_allele_table

  • Required arguments

    • --file - the PMO file to extract from
    • --bioid - the tar_amp_bioinformatics_info_id id to extract the data from
    • --output - the output stub of the files to be created
  • Optional arguments

By default only 3 fields are extracted by this extractor, 1) sampleID (experiment_sample_id), 2) locus (target_id), and 3) allele (microhaplotype_id) with those default column names. This can be controlled by --default_base_col_names and if you supply 3 comma separated values you can change the default header.

You can also add to the table any values from the other portions of the PMO file by using the following arguments

  • adding fields arguments
    • --specimen_info_meta_fields - Meta Fields if any to include from the specimen table
    • --experiment_info_meta_fields - Meta Fields if any to include from the experiment table
    • --microhap_fields - additional optional fields from the detected microhaplotype object to include
    • --representative_haps_fields - additional optional fields from the detected representative object to include

Other optional arguments have to do with the ouput file over writing and delimiter being used, use -h to see all arguments

Code
pmotools-runner.py extract_allele_table -h
usage: pmotools-runner.py extract_allele_table [-h] --bioid BIOID --file FILE
                                               [--delim DELIM] --output OUTPUT
                                               [--overwrite]
                                               [--allele_freqs_output ALLELE_FREQS_OUTPUT]
                                               [--specimen_info_meta_fields SPECIMEN_INFO_META_FIELDS]
                                               [--experiment_info_meta_fields EXPERIMENT_INFO_META_FIELDS]
                                               [--microhap_fields MICROHAP_FIELDS]
                                               [--representative_haps_fields REPRESENTATIVE_HAPS_FIELDS]
                                               [--default_base_col_names DEFAULT_BASE_COL_NAMES]

options:
  -h, --help            show this help message and exit
  --bioid BIOID         bio ID to extract for
  --file FILE           PMO file
  --delim DELIM         the delimiter of the input text file, examples
                        tab,comma
  --output OUTPUT       Output allele table file name path
  --overwrite           If output file exists, overwrite it
  --allele_freqs_output ALLELE_FREQS_OUTPUT
                        if also writing out allele frequencies, write to this
                        file
  --specimen_info_meta_fields SPECIMEN_INFO_META_FIELDS
                        Meta Fields if any to include from the specimen table
  --experiment_info_meta_fields EXPERIMENT_INFO_META_FIELDS
                        Meta Fields if any to include from the experiment
                        table
  --microhap_fields MICROHAP_FIELDS
                        additional optional fields from the detected
                        microhaplotype object to include
  --representative_haps_fields REPRESENTATIVE_HAPS_FIELDS
                        additional optional fields from the detected
                        representative object to include
  --default_base_col_names DEFAULT_BASE_COL_NAMES
                        default base column names, must be length 3

The python code for extract_allele_table script is below

Code
pmotools-python/scripts/extractors_from_pmo/extract_allele_table.py
#!/usr/bin/env python3
import argparse

from pmotools.pmo_utils.PMOReader import PMOReader
from pmotools.utils.small_utils import Utils
from pmotools.pmo_utils.PMOChecker import PMOChecker
from pmotools.pmo_utils.PMOExtractor import PMOExtractor



def parse_args_extract_for_allele_table():
    parser = argparse.ArgumentParser()
    parser.add_argument('--bioid', type=str, required=True, help='bio ID to extract for')
    parser.add_argument('--file', type=str, required=True, help='PMO file')
    parser.add_argument('--delim', default="tab", type=str, required=False, help='the delimiter of the input text file, examples tab,comma')
    parser.add_argument('--output', type=str, required=True, help='Output allele table file name path')
    parser.add_argument('--overwrite', action = 'store_true', help='If output file exists, overwrite it')
    parser.add_argument('--allele_freqs_output',type=str, help='if also writing out allele frequencies, write to this file')

    parser.add_argument('--specimen_info_meta_fields', type=str, required=False, help='Meta Fields if any to include from the specimen table')
    parser.add_argument('--experiment_info_meta_fields', type=str, required=False, help='Meta Fields if any to include from the experiment table')
    parser.add_argument('--microhap_fields', type=str, required=False, help='additional optional fields from the detected microhaplotype object to include')
    parser.add_argument('--representative_haps_fields', type=str, required=False, help='additional optional fields from the detected representative object to include')
    parser.add_argument('--default_base_col_names', type=str, required=False, default="sampleID,locus,allele", help='default base column names, must be length 3')

    return parser.parse_args()

def extract_for_allele_table():
    args = parse_args_extract_for_allele_table()

    output_delim, output_extension = Utils.process_delimiter_and_output_extension(args.delim, gzip=args.output.endswith('.gz'))

    allele_per_sample_table_out_fnp = args.output if "STDOUT" == args.output else Utils.appendStrAsNeeded(args.output, output_extension)
    Utils.inputOutputFileCheck(args.file, allele_per_sample_table_out_fnp, args.overwrite)

    allele_freq_output = ""
    if args.allele_freqs_output is not None:
        allele_freq_output = Utils.appendStrAsNeeded(args.allele_freqs_output, output_extension)
        Utils.inputOutputFileCheck(args.file, allele_freq_output, args.overwrite)

    checker = PMOChecker()
    pmodata = PMOReader.read_in_pmo(args.file)

    checker.check_for_required_base_fields(pmodata)
    checker.check_bioinformatics_ids_consistency(pmodata)
    checker.check_for_bioinformatics_id(pmodata, args.bioid)

    if args.specimen_info_meta_fields is not None:
        args.specimen_info_meta_fields = Utils.parse_delimited_input_or_file(args.specimen_info_meta_fields, ",")
    if args.microhap_fields is not None:
        args.microhap_fields = Utils.parse_delimited_input_or_file(args.microhap_fields, ",")
    if args.experiment_info_meta_fields is not None:
        args.experiment_info_meta_fields = Utils.parse_delimited_input_or_file(args.experiment_info_meta_fields, ",")
    if args.representative_haps_fields is not None:
        args.representative_haps_fields = Utils.parse_delimited_input_or_file(args.representative_haps_fields, ",")

    PMOExtractor.write_alleles_per_sample_table(pmodata, args.bioid, allele_per_sample_table_out_fnp,
                                                 additional_specimen_infos_fields = args.specimen_info_meta_fields,
                                                 additional_experiment_infos_fields = args.experiment_info_meta_fields,
                                                 additional_microhap_fields = args.microhap_fields,
                                                 additional_representative_infos_fields = args.representative_haps_fields,
                                                 output_delimiter=output_delim,
                                                 default_base_col_names=args.default_base_col_names.split(","))
    if args.allele_freqs_output is not None:
        allele_counts, allele_freqs, target_totals = PMOExtractor.extract_allele_counts_freq_from_pmo(pmodata, args.bioid)
        PMOExtractor.write_allele_freq_output(allele_freq_output, allele_freqs, output_delimiter=output_delim)

if __name__ == "__main__":
    extract_for_allele_table()

Can download example PMOs here

Code
wget https://plasmogenepi.github.io/PMO_Docs/format/moz2018_PMO.json.gz 

wget https://plasmogenepi.github.io/PMO_Docs/format/PathWeaverHeome1_PMO.json.gz
Code
mkdir -p example
cd example 

# default 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite
Code
cd example 

# changing default column names  
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --default_base_col_names sample,target,hapid

Changing the output file delimiter

Code
cd example 

pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --delim ,

Adding on additional columns from the specimen_infos section

Code
cd example 

# adding other PMO fields 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density | head  
sampleID    locus   allele  collection_country  parasite_density
8025874217  t1  t1.2    Mozambique  477719.34375
8025874217  t1  t1.0    Mozambique  477719.34375
8025874217  t10 t10.0   Mozambique  477719.34375
8025874217  t100    t100.05 Mozambique  477719.34375
8025874217  t11 t11.09  Mozambique  477719.34375
8025874217  t12 t12.1   Mozambique  477719.34375
8025874217  t12 t12.2   Mozambique  477719.34375
8025874217  t13 t13.5   Mozambique  477719.34375
8025874217  t14 t14.00  Mozambique  477719.34375

Can continue to add on more columns from other sections

Code
cd example 

# adding other PMO fields including seq field 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density --representative_haps_fields seq | head 
sampleID    locus   allele  collection_country  parasite_density    seq
8025874217  t1  t1.2    Mozambique  477719.34375    AACTTTTTTTATTTTTTTTGTCAATAGATAAATGATCAATATTTTCTATATTTAATCTATCAAGTATTTTTATATATCTATTATTTCTTTCTTCGATGGATAAATTATATGAATCAATATCCTTTCTTTCATCAACAAACTTTTTTATTGTTAACTCCATTTTTTTATTTA
8025874217  t1  t1.0    Mozambique  477719.34375    AACTTTTTTTATTTTTTTTGTCAATAGATAAATGATCAATATTTTCTATATTTAATCTATCAAGTATTTTTATATATCTATTATTTCTTTCTTCGATGGATAAATTATAAGAATCAATATCCTTTCTTTCATCAACAAACTTTTTTATTGTTAACTCCATTTTTTTATTTA
8025874217  t10 t10.0   Mozambique  477719.34375    ACTTATGTTCATGAGCTAATTTCCCACAAATACTCCATAACGAACTTTTCATTTTATTAAATTTATCTCTCAAAAGAGAATGACTATAATGCCATATTAAATACATATCTTTCCTTTCTAATTTTCCTGGTAATTCTATTATCATTCTTTCTAAATCTTCTTCTGTAACTTTT
8025874217  t100    t100.05 Mozambique  477719.34375    TCGTTTTGAATTGTTAGAATTTAAAATGACGGAGGATTGTTATACAAAAATGTGGTTTGATTTTATGATTGATTTTGGAATAGCTACAATGAATGAAAACGAACATACTAGATCTTTTTATGGCT
8025874217  t11 t11.09  Mozambique  477719.34375    GAATTTTCTTTTTTATGACTTTCTTCTCCTTGTTCAGAAGCTTCTTTTTCATCCTTTTTTTCTGCTGCGTCAGATAAATTGGGGGAAGCACTTGAAGATTCATTTCCTCCAGGAGTATTACTAGTACTTACTCCTTCCACATTTGGTTTTTCTTCCCCTAGAATTCTCA
8025874217  t12 t12.1   Mozambique  477719.34375    TACACATAAGAAAAAAAAAATTTATTTATTCTTACAAAAAGAATATAAAAACAAAATTTTGGGATTTATAAATTTTTATAAACATATAACACACAAAATAAAAAAGAAACAAGAAAATGTTCATGATAAAATCACTTTTTTAAAATGTCTAAAGGAACTCTTTTTTGTCACACATACAAATG
8025874217  t12 t12.2   Mozambique  477719.34375    TACACATAAGAAAAAAAAAATTTATTTATTCTTACAAAAAGAATATAAAAACAAAATTTTGGGATTTATAAATTTTTATAAACATATAACACACAAAATAAAAAAGAAACAAGAAAATGTTCATGATAAAATCACTTTTTTAAAATGTCTAAAGGAACTCTTTTTTGTCACACATACAAATA
8025874217  t13 t13.5   Mozambique  477719.34375    TAATTATGAAGACAGTCTCACGACTGCATGTTATATTGATGAAAACAAATCCGATTCATCCTATAAAACTGAAGAAGAAAATGTAAACTATAATAATAAAATGGGTAAACGCAAAAATTTA
8025874217  t14 t14.00  Mozambique  477719.34375    ACTTTTTAACACTATCATTATAATTATGTCTTTTATTTTCATATTTTTCTTTATAATAATTTATATCCTTTAATTTTTCTTTCATCAAATTTAACCATTTATCATTTAAATTCTCTTTTTCCACAGCTCCAGCATTTTTATTTATATCATCTACAACTACATCTTCCTTCACATAATTATTTATATAAAAATTATTATCATCTA

Creating output for MOIRE

MOIRE is a program that can be used to estimate COI and other population estimates from a population. See it’s github for full usage.

Code
mkdir -p example
cd example 

# default table is all moire needs 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite
Code
df <- read.csv("example/extraction_allele_table.tsv", sep = "\t")

data <- load_long_form_data(df)

# With data in appropriate format, run MCMC as follows
mcmc_results <- moire::run_mcmc(data, is_missing = data$is_missing)

Creating output for dcifer

dcifer is a program that can estimate IBD even from mixed infections. See it’s github for full usage

Code
mkdir -p example
cd example 

# default 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --delim ,

# dcifer can calculate allele frequencies if not provided or you can have extract_allele_table write them as well 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --allele_freqs_output allele_freqs_extraction --delim ,
Code
dsmp <- readDat("example/extraction_allele_table.csv", svar = "sampleID", lvar = "locus", avar = "allele")

lrank <- 2
coi   <- getCOI(dsmp, lrank = lrank)

afreq <- calcAfreq(dsmp, coi, tol = 1e-5) 

dres0 <- ibdDat(dsmp, coi, afreq, pval = TRUE, confint = TRUE, rnull = 0, 
                alpha = 0.05, nr = 1e3)   
Source Code
---
title: Extracting allele tables using pmotools-python
---

```{r setup, echo=F}
source("../common.R")
```



# pmotools-runner.py extract_allele_table

To extract allele table information from a PMO the command line interactive script with `pmotools-runner.py extract_allele_table` can be used 

*  pmotools-runner.py extract_allele_table 

*  **Required arguments**
    *  **\-\-file** - the PMO file to extract from 
    *  **\-\-bioid** - the `tar_amp_bioinformatics_info_id` id to extract the data from 
    *  **\-\-output** - the output stub of the files to be created 

*  **Optional arguments** 

By default only 3 fields are extracted by this extractor, 1) sampleID (experiment_sample_id), 2) locus (target_id), and 3) allele (microhaplotype_id) with those default column names. This can be controlled by **\-\-default_base_col_names** and if you supply 3 comma separated values you can change the default header. 

You can also add to the table any values from the other portions of the PMO file by using the following arguments 

*  adding fields arguments 
    *  **\-\-specimen_info_meta_fields** - Meta Fields if any to include from the specimen table 
    *  **\-\-experiment_info_meta_fields** - Meta Fields if any to include from the experiment table 
    *  **\-\-microhap_fields** - additional optional fields from the detected microhaplotype object to include 
    *  **\-\-representative_haps_fields** - additional optional fields from the detected representative object to include 

Other optional arguments have to do with the ouput file over writing and delimiter being used, use `-h` to see all arguments 

```{bash}
pmotools-runner.py extract_allele_table -h

```

The python code for `extract_allele_table` script is below

```{python}
#| echo: true
#| eval: false
#| code-fold: true
#| code-line-numbers: true
#| filename: pmotools-python/scripts/extractors_from_pmo/extract_allele_table.py
#| file: ../pmotools-python/scripts/extractors_from_pmo/extract_allele_table.py
```


Can download example PMOs here 

```{bash, eval = F}
wget https://plasmogenepi.github.io/PMO_Docs/format/moz2018_PMO.json.gz 

wget https://plasmogenepi.github.io/PMO_Docs/format/PathWeaverHeome1_PMO.json.gz
```


```{bash}
mkdir -p example
cd example 

# default 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite
```


```{bash}
cd example 

# changing default column names  
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --default_base_col_names sample,target,hapid
```

Changing the output file delimiter  

```{bash}
cd example 

pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --delim ,
```

Adding on additional columns from the specimen_infos section 

```{bash, eval = F}
cd example 

# adding other PMO fields 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density | head  

```

```{bash, echo = F}
cd example 

# adding other PMO fields 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density 2>/dev/null | head  

```


Can continue to add on more columns from other sections 

```{bash, eval = F}
cd example 

# adding other PMO fields including seq field 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density --representative_haps_fields seq | head 

```

```{bash, echo = F}
cd example 

# adding other PMO fields including seq field 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density --representative_haps_fields seq 2>/dev/null | head 

```

## Creating output for MOIRE 

MOIRE is a program that can be used to estimate COI and other population estimates from a population. See it's [github](https://github.com/EPPIcenter/moire) for full usage. 

```{bash, eval = F}
mkdir -p example
cd example 

# default table is all moire needs 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite

```

```{r, eval = F}
df <- read.csv("example/extraction_allele_table.tsv", sep = "\t")

data <- load_long_form_data(df)

# With data in appropriate format, run MCMC as follows
mcmc_results <- moire::run_mcmc(data, is_missing = data$is_missing)
```


## Creating output for dcifer 

dcifer is a program that can estimate IBD even from mixed infections. See it's [github](https://github.com/EPPIcenter/dcifer) for full usage

```{bash}
mkdir -p example
cd example 

# default 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --delim ,

# dcifer can calculate allele frequencies if not provided or you can have extract_allele_table write them as well 
pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite --allele_freqs_output allele_freqs_extraction --delim ,

```

```{r, eval = F}
dsmp <- readDat("example/extraction_allele_table.csv", svar = "sampleID", lvar = "locus", avar = "allele")

lrank <- 2
coi   <- getCOI(dsmp, lrank = lrank)

afreq <- calcAfreq(dsmp, coi, tol = 1e-5) 

dres0 <- ibdDat(dsmp, coi, afreq, pval = TRUE, confint = TRUE, rnull = 0, 
                alpha = 0.05, nr = 1e3)   


```