PMO fields overview
Goal
Creating fields with efforts to be consistent with MIxS standards. These are the standards that the short read archive (SRA) use to validate metadata upon submission. This also helps to keep data standards to adhere to FAIR (Findable, Accessible, Interoperable, and Reusable).
Format was developed in order to achieve an efficient/low-weight format that contains the minimum amount of information about a targeted amplicon analysis without losing any important data. Tools are generated around this table to generate certain fields that are important but not necessary to keep constantly stored in this base class (e.g. SNP/INDEL calls). To increase portability and to keep data internally consistent, format was desinged to be contained within a singular file in JSON format which removes the limitation of storing data only in a tabular format. Output generated from this file can be a table for downstream usage but storing in the flexible JSON format allows storage in non-redundnat organiation (e.g. storing an ID only once while storing other data in lists).
Format is defined by utilizing LinkML to generate a general data scheme which creates various validation outputs like JSON Schema for validation tools. LinkML generates a website for viewing all fields defined in the format, https://github.com/PlasmoGenEpi/portable-microhaplotype-object.
Other notable users of LinkML/MIxS National Microbiome Data Collaborative Schema
Overview
Below is an overview of the entire format currently in development which is under active development and optimization. Please send questions to info@plasmogenepi.org
PortableMicrohaplotypeObject
https://plasmogenepi.github.io/portable-microhaplotype-object/PortableMicrohaplotypeObject/
Required
- experiment_info (type=list of ExperimentInfo)
- a list of experiments of all the seq/amp of the specimens within this project
- a list of experiments of all the seq/amp of the specimens within this project
- specimen_info (type=list of SpecimenInfo)
- a list of all the specimens within this project
- a list of all the specimens within this project
- sequencing_info (type=list of SequencingInfo)
- a list of sequencing info for this project
- a list of sequencing info for this project
- panel_info (type=list of PanelInfo)
- a list of info on the panels
- a list of info on the panels
- target_info (type=list of TargetInfo)
- a list of info on the targets
- a list of info on the targets
- targeted_genomes (type=list of GenomeInfo)
- a list of genomes that the targets in TargetInfo refer to
- a list of genomes that the targets in TargetInfo refer to
- microhaplotypes_info (type=RepresentativeMicrohaplotypes)
- a list of the information on the representative microhaplotypes
- a list of the information on the representative microhaplotypes
- bioinformatics_methods_info (type=list of BioinformaticsMethodInfo)
- the bioinformatics pipeline/methods used to generated the amplicon analysis for this project
- the bioinformatics pipeline/methods used to generated the amplicon analysis for this project
- bioinformatics_run_info (type=list of BioinformaticsRunInfo)
- the runtime info for the bioinformatics pipeline used to generated the amplicon analysis for this project
- the runtime info for the bioinformatics pipeline used to generated the amplicon analysis for this project
- microhaplotypes_detected (type=list of MicrohaplotypesDetected)
- the microhaplotypes detected in this projects
- the microhaplotypes detected in this projects
- pmo_header (type=PmoHeader)
- the PMO information for this file including version etc
Optional
- read_counts_by_stage (type=list of ReadCountsByStage)
- the read counts for different stages of the pipeline
Example
Code
BioMethod
https://plasmogenepi.github.io/portable-microhaplotype-object/BioMethod/Show BioMethod fields
Required
- program_version (type=string)
- the version of generation method, should be in the format of v[MAJOR].[MINOR].[PATCH]
- the version of generation method, should be in the format of v[MAJOR].[MINOR].[PATCH]
- program (type=string)
- name of the program used for this portion of the pipeline
Optional
- additional_argument (type=list of string)
- any additional arguments that differ from the default
- any additional arguments that differ from the default
- program_description (type=string)
- a short description of what this method does
Example
Code
BioinformaticsMethodInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/BioinformaticsMethodInfo/Show BioinformaticsMethodInfo fields
Required
Optional
- additional_methods (type=list of BioMethod)
- any additional methods used to analyze the data
- any additional methods used to analyze the data
- bioinformatics_method_name (type=string)
- name of the collection of methods is called, e.g. pipeline
Example
Code
BioinformaticsRunInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/BioinformaticsRunInfo/Show BioinformaticsRunInfo fields
Required
- bioinformatics_methods_id (type=integer)
- the index into the bioinformatics_methods_info list
- the index into the bioinformatics_methods_info list
- run_date (type=string)
- the date when the run was done, should be YYYY-MM-DD
Optional
- bioinformatics_run_name (type=string)
- a name to for this run
Example
Code
ExperimentInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/ExperimentInfo/Show ExperimentInfo fields
Required
- sequencing_info_id (type=integer)
- the index into the sequencing_info list
- the index into the sequencing_info list
- specimen_id (type=integer)
- the index into the specimen_info list
- the index into the specimen_info list
- panel_id (type=integer)
- the index into the panel_info list
- the index into the panel_info list
- experiment_sample_name (type=string)
- a unique identifier for this sequence/amplification run on a specimen_name
Optional
Example
Code
GenomeInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/GenomeInfo/Show GenomeInfo fields
Required
- name (type=string)
- name of the genome
- name of the genome
- genome_version (type=string)
- the genome version
- the genome version
- taxon_id (type=integer)
- the NCBI taxonomy number
- the NCBI taxonomy number
- url (type=string)
- a link to the where this genome file could be downloaded
Optional
- chromosomes (type=list of string)
- a list of chromosomes found within this genome
- a list of chromosomes found within this genome
- gff_url (type=string)
- a link to the where this genome’s annotation file could be downloaded
Example
Code
GenomicLocation
https://plasmogenepi.github.io/portable-microhaplotype-object/GenomicLocation/Show GenomicLocation fields
Required
- genome_id (type=integer)
- the index to the genome in the targeted_genomes list that this location refers to
- the index to the genome in the targeted_genomes list that this location refers to
- chrom (type=string)
- the chromosome name
- the chromosome name
- start (type=integer)
- the start of the location, 0-based positioning
- the start of the location, 0-based positioning
- end (type=integer)
- the end of the location, 0-based positioning
Optional
- ref_seq (type=string)
- the reference sequence of this genomic location
- the reference sequence of this genomic location
- strand (type=string)
- which strand the location is, either + for plus strand or - for negative strand
Example
Code
MarkerOfInterest
https://plasmogenepi.github.io/portable-microhaplotype-object/MarkerOfInterest/Show MarkerOfInterest fields
Required
- marker_location (type=GenomicLocation)
- the genomic location
Optional
- associations (type=list of string)
- a list of associations with this marker, e.g. SP resistance, etc
Example
Code
MaskingInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/MaskingInfo/Show MaskingInfo fields
Required
- seq_start (type=integer)
- the start of the masking
- the start of the masking
- seq_segment_size (type=integer)
- the size of the masking
- the size of the masking
- replacement_size (type=integer)
- the size of replacement mask
Example
Code
MicrohaplotypeForTarget
https://plasmogenepi.github.io/portable-microhaplotype-object/MicrohaplotypeForTarget/Show MicrohaplotypeForTarget fields
Required
- mhap_id (type=integer)
- the index for a microhaplotype for a target in the microhaplotypes_info list, e.g. microhaplotypes_info[mhaps_target_id][mhap_id]
- the index for a microhaplotype for a target in the microhaplotypes_info list, e.g. microhaplotypes_info[mhaps_target_id][mhap_id]
- reads (type=integer)
- the read count associated with this microhaplotype
Optional
- umis (type=integer)
- the unique molecular identifier (umi) count associated with this microhaplotype
Example
Code
MicrohaplotypesDetected
https://plasmogenepi.github.io/portable-microhaplotype-object/MicrohaplotypesDetected/Show MicrohaplotypesDetected fields
Required
- bioinformatics_run_id (type=integer)
- the index into bioinformatics_run_info list
- the index into bioinformatics_run_info list
- experiment_samples (type=list of MicrohaplotypesForSample)
- a list of the microhaplotypes detected for a sample by targets
Example
Code
MicrohaplotypesForSample
https://plasmogenepi.github.io/portable-microhaplotype-object/MicrohaplotypesForSample/Show MicrohaplotypesForSample fields
Required
- experiment_sample_id (type=integer)
- the index into the experiment_info list
- the index into the experiment_info list
- target_results (type=list of MicrohaplotypesForTarget)
- a list of the microhaplotypes detected for a list of targets
Example
Code
MicrohaplotypesForTarget
https://plasmogenepi.github.io/portable-microhaplotype-object/MicrohaplotypesForTarget/Show MicrohaplotypesForTarget fields
Required
- mhaps_target_id (type=integer)
- the index for a target in the microhaplotypes_info list
- the index for a target in the microhaplotypes_info list
- haps (type=list of MicrohaplotypeForTarget)
- a list of the microhaplotypes detected for this target
Example
Code
PanelInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/PanelInfo/Show PanelInfo fields
Required
- reactions (type=list of ReactionInfo)
- a list of 1 or more reactions that this panel contains, each reactions list the targets that were amplified in that reaction, e.g. pool1, pool2
- a list of 1 or more reactions that this panel contains, each reactions list the targets that were amplified in that reaction, e.g. pool1, pool2
- panel_name (type=string)
- a name for the panel
Example
Code
ParasiteDensity
https://plasmogenepi.github.io/portable-microhaplotype-object/ParasiteDensity/Show ParasiteDensity fields
Required
- method (type=string)
- the method of how this density was obtained
- the method of how this density was obtained
- density (type=number)
- the density in microliters
Example
Code
PlateInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/PlateInfo/Show PlateInfo fields
Required
Optional
- plate_col (type=integer)
- the column the specimen was in
- the column the specimen was in
- plate_name (type=string)
- a name of plate the specimen was in
- a name of plate the specimen was in
- plate_row (type=string)
- the row the specimen was in
Example
Code
PmoGenerationMethod
https://plasmogenepi.github.io/portable-microhaplotype-object/PmoGenerationMethod/Show PmoGenerationMethod fields
Required
- program_version (type=string)
- the version of generation method, should be in the format of v[MAJOR].[MINOR].[PATCH]
- the version of generation method, should be in the format of v[MAJOR].[MINOR].[PATCH]
- program_name (type=string)
- the name of the program
Example
Code
PmoHeader
https://plasmogenepi.github.io/portable-microhaplotype-object/PmoHeader/Show PmoHeader fields
Required
- pmo_version (type=string)
- the version of the PMO file, should be in the format of v[MAJOR].[MINOR].[PATCH]
Optional
- creation_date (type=string)
- the date of when the PMO file was created or modified, should be YYYY-MM-DD
- the date of when the PMO file was created or modified, should be YYYY-MM-DD
- generation_method (type=PmoGenerationMethod)
- the generation method to create this PMO
Example
Code
PrimerInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/PrimerInfo/Show PrimerInfo fields
Required
- seq (type=string)
- the DNA sequence
Optional
- location (type=GenomicLocation)
- what the intended genomic location of the primer is
Example
Code
ReactionInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/ReactionInfo/Show ReactionInfo fields
Required
- panel_targets (type= list of integer)
- a list of the target indexes in the target_info list
- a list of the target indexes in the target_info list
- reaction_name (type=string)
- a name for this reaction
Example
Code
ReadCountsByStage
https://plasmogenepi.github.io/portable-microhaplotype-object/ReadCountsByStage/Show ReadCountsByStage fields
Required
- bioinformatics_run_id (type=integer)
- the index into bioinformatics_run_info list
- the index into bioinformatics_run_info list
- read_counts_by_experimental_sample_by_stage (type=list of ReadCountsByStageForExperimentalSample)
- a list by experiment_sample for the counts at each stage
Example
Code
ReadCountsByStageForExperimentalSample
https://plasmogenepi.github.io/portable-microhaplotype-object/ReadCountsByStageForExperimentalSample/Show ReadCountsByStageForExperimentalSample fields
Required
- experiment_sample_id (type=integer)
- the index into the experiment_info list
- the index into the experiment_info list
- total_raw_count (type=integer)
- the raw counts off the sequencing machine that a sample began with
Optional
- read_counts_for_targets (type=list of ReadCountsByStageForTarget)
- a list of counts by stage for a target
Example
Code
ReadCountsByStageForTarget
https://plasmogenepi.github.io/portable-microhaplotype-object/ReadCountsByStageForTarget/Show ReadCountsByStageForTarget fields
Required
- target_id (type=integer)
- the index into the target_info list
- the index into the target_info list
- stages (type=list of StageReadCounts)
- the read counts by each stage
Example
Code
RepresentativeMicrohaplotype
https://plasmogenepi.github.io/portable-microhaplotype-object/RepresentativeMicrohaplotype/Show RepresentativeMicrohaplotype fields
Required
- seq (type=string)
- the DNA sequence
Optional
- alt_annotations (type=list of string)
- a list of additional annotations associated with this microhaplotype, e.g. wildtype, amino acid changes etc
- a list of additional annotations associated with this microhaplotype, e.g. wildtype, amino acid changes etc
- masking (type=list of MaskingInfo)
- masking info for the sequence
- masking info for the sequence
- microhaplotype_name (type=string)
- an optional name for this microhaplotype
- an optional name for this microhaplotype
- pseudocigar (type=string)
- the pseudocigar of the haplotype
- the pseudocigar of the haplotype
- quality (type=string)
- the ansi fastq per base quality score for this sequence, this is optional
Example
Code
RepresentativeMicrohaplotypes
https://plasmogenepi.github.io/portable-microhaplotype-object/RepresentativeMicrohaplotypes/Show RepresentativeMicrohaplotypes fields
Required
- targets (type=list of RepresentativeMicrohaplotypesForTarget)
- a list of the microhaplotype for each targets
Example
Code
RepresentativeMicrohaplotypesForTarget
https://plasmogenepi.github.io/portable-microhaplotype-object/RepresentativeMicrohaplotypesForTarget/Show RepresentativeMicrohaplotypesForTarget fields
Required
- target_id (type=integer)
- the index into the target_info list
- the index into the target_info list
- microhaplotypes (type=list of RepresentativeMicrohaplotype)
- a list of the microhaplotypes detected for a target
Example
Code
SequencingInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/SequencingInfo/Show SequencingInfo fields
Required
- sequencing_info_name (type=string)
- a name of for the sequencing done, e.g. batch1
- a name of for the sequencing done, e.g. batch1
- seq_platform (type=string)
- the sequencing technology used to sequence the run, e.g. ILLUMINA, NANOPORE, PACBIO
- the sequencing technology used to sequence the run, e.g. ILLUMINA, NANOPORE, PACBIO
- seq_instrument_model (type=string)
- the sequencing instrument model used to sequence the run, e.g. NextSeq 2000, MinION, Revio
- the sequencing instrument model used to sequence the run, e.g. NextSeq 2000, MinION, Revio
- seq_date (type=string)
- the date of sequencing, should be YYYY-MM or YYYY-MM-DD
- the date of sequencing, should be YYYY-MM or YYYY-MM-DD
- library_layout (type=string)
- Specify the configuration of reads, e.g. paired-end, single
- Specify the configuration of reads, e.g. paired-end, single
- library_strategy (type=string)
- what the nuceloacid sequencing/amplification strategy was (common names are AMPLICON, WGS)
- what the nuceloacid sequencing/amplification strategy was (common names are AMPLICON, WGS)
- library_source (type=string)
- Source of amplification material (common names GENOMIC, TRANSCRIPTOMIC)
- Source of amplification material (common names GENOMIC, TRANSCRIPTOMIC)
- library_selection (type=string)
- how amplification was done (common are PCR=Source material was selected by designed primers, RANDOM =Random selection by shearing or other method)
Optional
- library_kit (type=string)
- Name, version, and applicable cell or cycle numbers for the kit used to prepare libraries and load cells or chips for sequencing. If possible, include a part number, e.g. MiSeq Reagent Kit v3 (150-cycle), MS-102-3001
- Name, version, and applicable cell or cycle numbers for the kit used to prepare libraries and load cells or chips for sequencing. If possible, include a part number, e.g. MiSeq Reagent Kit v3 (150-cycle), MS-102-3001
- library_screen (type=string)
- Describe enrichment, screening, or normalization methods applied during amplification or library preparation, e.g. size selection 390bp, diluted to 1 ng DNA/sample
- Describe enrichment, screening, or normalization methods applied during amplification or library preparation, e.g. size selection 390bp, diluted to 1 ng DNA/sample
- nucl_acid_amp (type=string)
- Link to a reference or kit that describes the enzymatic amplification of nucleic acids
- Link to a reference or kit that describes the enzymatic amplification of nucleic acids
- nucl_acid_amp_date (type=string)
- the date of the nucleoacid amplification
- the date of the nucleoacid amplification
- nucl_acid_ext (type=string)
- Link to a reference or kit that describes the recovery of nucleic acids from the sample
- Link to a reference or kit that describes the recovery of nucleic acids from the sample
- nucl_acid_ext_date (type=string)
- the date of the nucleoacid extraction
- the date of the nucleoacid extraction
- pcr_cond (type=string)
- the method/conditions for PCR, List PCR cycles used to amplify the target
- the method/conditions for PCR, List PCR cycles used to amplify the target
- seq_center (type=string)
- Name of facility where sequencing was performed (lab, core facility, or company)
Example
Code
SpecimenInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/SpecimenInfo/Show SpecimenInfo fields
Required
- specimen_name (type=string)
- an identifier for the specimen, should be unique within this sample set
- an identifier for the specimen, should be unique within this sample set
- specimen_taxon_id (type= list of integer)
- the NCBI taxonomy number of the organism in specimen, can list multiple if a mixed sample
- the NCBI taxonomy number of the organism in specimen, can list multiple if a mixed sample
- host_taxon_id (type=integer)
- the NCBI taxonomy number of the host that the specimen was collected from
- the NCBI taxonomy number of the host that the specimen was collected from
- collection_date (type=string)
- the date of the specimen collection, can be YYYY, YYYY-MM, or YYYY-MM-DD
- the date of the specimen collection, can be YYYY, YYYY-MM, or YYYY-MM-DD
- collection_country (type=string)
- the name of country collected in, would be the same as admin level 0
- the name of country collected in, would be the same as admin level 0
- project_name (type=string)
- a name of the project under which the specimen is organized
Optional
- alternate_identifiers (type=list of string)
- a list of optional alternative names for the specimens
- a list of optional alternative names for the specimens
- collector_chief_scientist (type=string)
- can be collection of names separated by a semicolon if multiple people involved or can just be the name of the primary person managing the specimen
- can be collection of names separated by a semicolon if multiple people involved or can just be the name of the primary person managing the specimen
- drug_usage (type=list of string)
- Any drug used by subject and the frequency of usage; can include multiple drugs used
- Any drug used by subject and the frequency of usage; can include multiple drugs used
- env_broad_scale (type=string)
- the broad environment from which the specimen was collected, e.g. highlands, lowlands, mountainous region
- the broad environment from which the specimen was collected, e.g. highlands, lowlands, mountainous region
- env_local_scale (type=string)
- the local environment from which the specimen was collected, e.g. jungle, urban, rural
- the local environment from which the specimen was collected, e.g. jungle, urban, rural
- env_medium (type=string)
- the environment medium from which the specimen was collected from
- the environment medium from which the specimen was collected from
- geo_admin1 (type=string)
- geographical admin level 1, the secondary large demarcation of a nation (nation = admin level 0)
- geographical admin level 1, the secondary large demarcation of a nation (nation = admin level 0)
- geo_admin2 (type=string)
- geographical admin level 2, the third large demarcation of a nation (nation = admin level 0)
- geographical admin level 2, the third large demarcation of a nation (nation = admin level 0)
- geo_admin3 (type=string)
- geographical admin level 3, the third large demarcation of a nation (nation = admin level 0)
- geographical admin level 3, the third large demarcation of a nation (nation = admin level 0)
- host_age (type=number)
- if specimen is from a person, the age in years of the person, can be float value so for 3 month old put 0.25
- if specimen is from a person, the age in years of the person, can be float value so for 3 month old put 0.25
- host_sex (type=string)
- if specimen is from a person, the sex listed for that person
- if specimen is from a person, the sex listed for that person
- host_subject_id (type=integer)
- an identifier for the individual a specimen was collected from
- an identifier for the individual a specimen was collected from
- lat_lon (type=string)
- the latitude and longitude of the collection site of the specimen
- the latitude and longitude of the collection site of the specimen
- parasite_density_info (type=list of ParasiteDensity)
- one or more parasite densities in microliters for this specimen
- one or more parasite densities in microliters for this specimen
- plate_info (type=PlateInfo)
- plate location of where specimen is stored if stored in a plate
- plate location of where specimen is stored if stored in a plate
- specimen_collect_device (type=string)
- the way the specimen was collected, e.g. whole blood, dried blood spot
- the way the specimen was collected, e.g. whole blood, dried blood spot
- specimen_comments (type=list of string)
- any additional comments about the specimen
- any additional comments about the specimen
- specimen_store_loc (type=string)
- the specimen store site, address or facility name
- the specimen store site, address or facility name
- specimen_type (type=string)
- what type of specimen this is, e.g. negative_control, positive_control, field_sample
- what type of specimen this is, e.g. negative_control, positive_control, field_sample
- travel_out_six_month (type=list of string)
- Specification of the countries travelled in the last six months; can include multiple travels
Example
Code
StageReadCounts
https://plasmogenepi.github.io/portable-microhaplotype-object/StageReadCounts/Show StageReadCounts fields
Required
- read_count (type=integer)
- the read counts
- the read counts
- stage (type=string)
- the stage of the pipeline, e.g. demultiplexed, denoised, etc
Example
Code
TargetInfo
https://plasmogenepi.github.io/portable-microhaplotype-object/TargetInfo/Show TargetInfo fields
Required
- target_name (type=string)
- an identifier for this target
- an identifier for this target
- forward_primers (type=list of PrimerInfo)
- A list of forward primers associated with this target
- A list of forward primers associated with this target
- reverse_primers (type=list of PrimerInfo)
- A list of reverse primers associated with this target
Optional
- gene_name (type=string)
- an identifier of the gene, if any, is being covered with this targeted
- an identifier of the gene, if any, is being covered with this targeted
- insert_location (type=GenomicLocation)
- the intended genomic location of the insert of the amplicon (the location between the end of the forward primer and the beginning of the reverse primer)
- the intended genomic location of the insert of the amplicon (the location between the end of the forward primer and the beginning of the reverse primer)
- markers_of_interest (type=list of MarkerOfInterest)
- a list of covered markers of interest
- a list of covered markers of interest
- target_attributes (type=list of string)
- a list of classification type for the primer target
Example
Code