To extract allele table information from a PMO the command line interactive script with pmotools-runner.py extract_allele_table can be used
pmotools-runner.py extract_allele_table
Required arguments
--file - the PMO file to extract from
--bioid - the tar_amp_bioinformatics_info_id id to extract the data from
--output - the output stub of the files to be created
Optional arguments
By default only 3 fields are extracted by this extractor, 1) sampleID (experiment_sample_id), 2) locus (target_id), and 3) allele (microhaplotype_id) with those default column names. This can be controlled by --default_base_col_names and if you supply 3 comma separated values you can change the default header.
You can also add to the table any values from the other portions of the PMO file by using the following arguments
adding fields arguments
--specimen_info_meta_fields - Meta Fields if any to include from the specimen table
--experiment_info_meta_fields - Meta Fields if any to include from the experiment table
--microhap_fields - additional optional fields from the detected microhaplotype object to include
--representative_haps_fields - additional optional fields from the detected representative object to include
Other optional arguments have to do with the ouput file over writing and delimiter being used, use -h to see all arguments
Code
pmotools-runner.py extract_allele_table -h
usage: pmotools-runner.py extract_allele_table [-h] --bioid BIOID --file FILE
[--delim DELIM] --output OUTPUT
[--overwrite]
[--allele_freqs_output ALLELE_FREQS_OUTPUT]
[--specimen_info_meta_fields SPECIMEN_INFO_META_FIELDS]
[--experiment_info_meta_fields EXPERIMENT_INFO_META_FIELDS]
[--microhap_fields MICROHAP_FIELDS]
[--representative_haps_fields REPRESENTATIVE_HAPS_FIELDS]
[--default_base_col_names DEFAULT_BASE_COL_NAMES]
options:
-h, --help show this help message and exit
--bioid BIOID bio ID to extract for
--file FILE PMO file
--delim DELIM the delimiter of the input text file, examples
tab,comma
--output OUTPUT Output allele table file name path
--overwrite If output file exists, overwrite it
--allele_freqs_output ALLELE_FREQS_OUTPUT
if also writing out allele frequencies, write to this
file
--specimen_info_meta_fields SPECIMEN_INFO_META_FIELDS
Meta Fields if any to include from the specimen table
--experiment_info_meta_fields EXPERIMENT_INFO_META_FIELDS
Meta Fields if any to include from the experiment
table
--microhap_fields MICROHAP_FIELDS
additional optional fields from the detected
microhaplotype object to include
--representative_haps_fields REPRESENTATIVE_HAPS_FIELDS
additional optional fields from the detected
representative object to include
--default_base_col_names DEFAULT_BASE_COL_NAMES
default base column names, must be length 3
The python code for extract_allele_table script is below
Can continue to add on more columns from other sections
Code
cd example # adding other PMO fields including seq field pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density --representative_haps_fields seq |head
MOIRE is a program that can be used to estimate COI and other population estimates from a population. See it’s github for full usage.
Code
mkdir-p examplecd example # default table is all moire needs pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite
Code
df<-read.csv("example/extraction_allele_table.tsv", sep ="\t")data<-load_long_form_data(df)# With data in appropriate format, run MCMC as followsmcmc_results<-moire::run_mcmc(data, is_missing =data$is_missing)
Creating output for dcifer
dcifer is a program that can estimate IBD even from mixed infections. See it’s github for full usage
Code
mkdir-p examplecd example # default pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite--delim ,# dcifer can calculate allele frequencies if not provided or you can have extract_allele_table write them as well pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite--allele_freqs_output allele_freqs_extraction --delim ,
---title: Extracting allele tables using pmotools-python---```{r setup, echo=F}source("../common.R")```# pmotools-runner.py extract_allele_tableTo extract allele table information from a PMO the command line interactive script with `pmotools-runner.py extract_allele_table` can be used * pmotools-runner.py extract_allele_table * **Required arguments** * **\-\-file** - the PMO file to extract from * **\-\-bioid** - the `tar_amp_bioinformatics_info_id` id to extract the data from * **\-\-output** - the output stub of the files to be created * **Optional arguments** By default only 3 fields are extracted by this extractor, 1) sampleID (experiment_sample_id), 2) locus (target_id), and 3) allele (microhaplotype_id) with those default column names. This can be controlled by **\-\-default_base_col_names** and if you supply 3 comma separated values you can change the default header. You can also add to the table any values from the other portions of the PMO file by using the following arguments * adding fields arguments * **\-\-specimen_info_meta_fields** - Meta Fields if any to include from the specimen table * **\-\-experiment_info_meta_fields** - Meta Fields if any to include from the experiment table * **\-\-microhap_fields** - additional optional fields from the detected microhaplotype object to include * **\-\-representative_haps_fields** - additional optional fields from the detected representative object to include Other optional arguments have to do with the ouput file over writing and delimiter being used, use `-h` to see all arguments ```{bash}pmotools-runner.py extract_allele_table -h```The python code for `extract_allele_table` script is below```{python}#| echo: true#| eval: false#| code-fold: true#| code-line-numbers: true#| filename: pmotools-python/scripts/extractors_from_pmo/extract_allele_table.py#| file: ../pmotools-python/scripts/extractors_from_pmo/extract_allele_table.py```Can download example PMOs here ```{bash, eval = F}wget https://plasmogenepi.github.io/PMO_Docs/format/moz2018_PMO.json.gz wget https://plasmogenepi.github.io/PMO_Docs/format/PathWeaverHeome1_PMO.json.gz``````{bash}mkdir-p examplecd example # default pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite``````{bash}cd example # changing default column names pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite--default_base_col_names sample,target,hapid```Changing the output file delimiter ```{bash}cd example pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite--delim ,```Adding on additional columns from the specimen_infos section ```{bash, eval = F}cd example # adding other PMO fields pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density | head ``````{bash, echo = F}cd example # adding other PMO fields pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density 2>/dev/null | head ```Can continue to add on more columns from other sections ```{bash, eval = F}cd example # adding other PMO fields including seq field pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density --representative_haps_fields seq | head ``````{bash, echo = F}cd example # adding other PMO fields including seq field pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output STDOUT --specimen_info_meta_fields collection_country,parasite_density --representative_haps_fields seq 2>/dev/null | head ```## Creating output for MOIRE MOIRE is a program that can be used to estimate COI and other population estimates from a population. See it's [github](https://github.com/EPPIcenter/moire) for full usage. ```{bash, eval = F}mkdir -p examplecd example # default table is all moire needs pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite``````{r, eval = F}df <- read.csv("example/extraction_allele_table.tsv", sep = "\t")data <- load_long_form_data(df)# With data in appropriate format, run MCMC as followsmcmc_results <- moire::run_mcmc(data, is_missing = data$is_missing)```## Creating output for dcifer dcifer is a program that can estimate IBD even from mixed infections. See it's [github](https://github.com/EPPIcenter/dcifer) for full usage```{bash}mkdir-p examplecd example # default pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite--delim ,# dcifer can calculate allele frequencies if not provided or you can have extract_allele_table write them as well pmotools-runner.py extract_allele_table --file ../../format/moz2018_PMO.json.gz --bioid Mozambique2018-SeekDeep --output extraction --overwrite--allele_freqs_output allele_freqs_extraction --delim ,``````{r, eval = F}dsmp <- readDat("example/extraction_allele_table.csv", svar = "sampleID", lvar = "locus", avar = "allele")lrank <- 2coi <- getCOI(dsmp, lrank = lrank)afreq <- calcAfreq(dsmp, coi, tol = 1e-5) dres0 <- ibdDat(dsmp, coi, afreq, pval = TRUE, confint = TRUE, rnull = 0, alpha = 0.05, nr = 1e3) ```