IgPhyML - B cell phylogenetic inference package¶
IgPhyML is a program designed to build phylogenetic trees and test evolutionary hypotheses regarding B cell affinity maturation.
The biology of B cell somatic hypermutation (SHM) violates important assumptions in most standard phylogenetic substitution models; further, while most phylogenetics programs are designed to analyze single lineages, B cell repertoires typically contain thousands of lineages. IgPhyML addresses both of these issues by implementing substitution models that correct for the context-sensitive nature of SHM, and combines information from multiple lineages to give more precisely estimated repertoire-wide model parameter estimates.
IgPhyML is integrated into the Immcantation framework, meaning users can easily build and visualize B cell lineage trees and test hypotheses about mutation and selection in B cell repertoires.
Overview¶
This is the instruction guide for using the new, repertoire-wide version of IgPhyML.
The most up-to-date description of IgPhyML can be found here:
Hoehn, KB, Vander Heiden, JA, Zhou JQ, Lunter, G, Pybus, OG, & Kleinstein SH. Repertoire-wide phylogenetic models of B cell molecular evolution reveal evolutionary signatures of aging and vaccination. Proceedings of the National Academy of Sciences, 116(45), 22664-22672. doi: https://doi.org/10.1073/pnas.1906020116
and here:
Hoehn, KB, Lunter, G, & Pybus, OG. A phylogenetic codon substitution model for antibody lineages. Genetics, 206(1), 417-427. doi: http://dx.doi.org/10.1534/genetics.116.196303
Installation¶
Docker Image (recommended for non-Linux systems)¶
The simplest way to use IgPhyML is via the
Immcantation Docker image.
4.3.0
is used for these instructions, but you should use the latest version on the Immcantation site.
Briefly, all the example commands on rest of this site can be run by first installing Docker and downloading the Immcantation Docker image. Note for some operating systems it may be necessary to have Docker Desktop running before entering these commands. In a terminal, enter:
# pull Immcantation Docker image
docker pull immcantation/suite:4.3.0
# clone IgPhyML repository to get example files
git clone https://bitbucket.org/kleinstein/igphyml
You may also need to install git
to complete the last command. Then, move to the examples directory and load it into the Docker image depending on your operating system:
# move to examples directory
cd igphyml/examples
# load Docker, Linux/ Mac OS X:
docker run -it --workdir /data -v $(pwd):/data:z immcantation/suite:4.3.0 bash
# or load Docker, Windows:
docker run -it --workdir /data -v %cd%:/data:z immcantation/suite:4.3.0 bash
Once inside the container, check everything is properly configured:
# should give example.fasta example.tab part.example.txt
ls
# should be IgPhyML 1.1.3 052020
igphyml --version
More generally, use this command to load the Docker image on the current directory of a Linux/Max OS X system:
docker run -it --workdir /data -v $(pwd):/data:z immcantation/suite:4.3.0 bash
or for a Windows Command Prompt:
docker run -it --workdir /data -v %cd%:/data:z immcantation/suite:4.3.0 bash
For further information on using the Immcantation Docker image, see Immcantation Docker image.
Compiling from source (recommended for Linux)¶
If using Linux, or if the Docker image is not possible or preferable, the source code of the current development version can be downloaded using git and compiled:
git clone https://bitbucket.org/kleinstein/igphyml
cd igphyml
./make_phyml_omp
If compilation (the last step) was successful, you should see the executable file src/igphyml
.
If this step fails, there is still hope. See below for more detailed requirements and
options.
Requirements¶
- GNU autoconf
- GNU automake
- (optional) Change-O >0.4.6
- (optional) Alakazam >0.3.0
- (optional) OpenMP-enabled C compiler (e.g. gcc or LLVM)
- (optional) BLAS and LAPACK optimization libraries
For Ubuntu systems, you can install automake tools and BLAS/LAPACK packages using:
apt-get install automake autoconf libblas-dev liblapack-dev libatlas-base-dev
This make require sudo permission. Other Linux distros should have similar pacakges.
Linux¶
On Linux operating systems with the above requirements installed, you can usually just run:
./make_phyml_omp
If you have BLAS and LAPACK installed,
which provide libraries for faster and more accurate matrix exponentiation
operations. In Ubuntu Linux, these are provided in the packages
libblas-dev
, liblapack-dev
, and libatlas-base-dev
. Other distros probably have
similar package names. To compile IgPhyML with BLAS and LAPACK
support, run:
./make_phyml_blas_omp
If OpenMP is also not available, IgPhyML may be compiled without it, but this will substantially reduce performance on multicore machines:
./make_phyml
Once everything is compiled, add the igphyml/src directory to your
PATH
variable, and IgPhyML should work. Importantly, some directory
files are hardcoded in at compilation, so re-compile IgPhyML if you move
the installation directory. Alternatively, you can set the IGPHYML_PATH
environment variable to the location of the igphyml/src/motifs folder for
the same effect.
Mac OS X¶
Installation on Mac OS X is trickier, but possible. The primary issue
is gaining OpenMP support, and installing some GNU command line tools.
The best way is to just install the latest version of llvm
available through homebrew
, as well as autoconf
and
automake
. To do these you’ll need to install
homebrew. If it’s already installed be
sure it’s at the latest version (brew update
). You may need to install
Xcode as well. Next, install autoconf
, automake
, and llvm
:
brew install autoconf
brew install automake
brew install llvm
Specify the llvm
version of clang
in Makefile.am
and
src/Makefile.am
by adding the line CC=<path to llvm clang>
to the beginning of both files. You will also need to add
MACOMP=<path to omp.h>
and MACLLVM=<path to llvm lib>
to
src/Makefile.am
. For instance, if you’ve install llvm 3.9.1
via homebrew, you will likely need to add the line
CC=/usr/local/Cellar/llvm/3.9.1/bin/clang
to Makefile.am
and the lines:
CC=/usr/local/Cellar/llvm/3.9.1/bin/clang
MACOMP=/usr/local/Cellar/llvm/3.9.1/lib/clang/3.9.1/include/omp.h
MACLLVM=/usr/local/Cellar/llvm/3.9.1/lib
to src/Makefile.am
.
Your specific path may look different, but you can check locations
of these files and folders by looking around in
/usr/local/Cellar/llvm/
. The directory structure should be
similar. Run ./make_blas_phyml_omp
, or other versions, as desired, and add
the src
folder to your PATH
variable.
On some versions of OS X it may be necessary to install XCode command line tools using:
xcode-select --install
cd /Library/Developer/CommandLineTools/Packages/
open macOS_SDK_headers_for_macOS_<OS X version>.pkg
Quick start¶
IgPhyML is easiest to use when run indirectly through the Change-O program
BuildTrees
by specifying the --igphyml
option.
Most of these instructions require Change-O 0.4.6 or higher, Alakazam 0.3.0 or higher,
and IgPhyML to be installed, with the executable in your PATH
variable.
If these aren’t possible, see IgPhyML standalone operation
To view all options for BuildTrees , run the command:
BuildTrees.py --help
The following commands should work as a first pass on many reasonably sized datasets, but if you really want to understand what’s going on or make sure what you’re doing makes sense, please check out the rest of the website.
Build trees and estimate model parameters¶
Move to the examples
subfolder and run:
BuildTrees.py -d example.tsv --outname ex --log ex.log --collapse \
--sample 3000 --igphyml --clean all --nproc 1
This command processes an AIRR-formatted dataset of BCR sequences that have been
clonally clustered
with germlines reconstructed.
It then quickly builds trees using the GY94 model and, using these
fixed topologies, estimates HLP19 model parameters. This can be sped up by
increasing the --nproc
option. Subsampling using the --sample
option in isn’t
strictly necessary, but IgPhyML will run slowly when applied to large datasets.
Here, the --collapse
flag is used to collapse identical sequences. This is
highly recommended because identical sequences slow down calculations without
affecting likelihood values in IgPhyML.
Visualize results¶
The output file of the above command can be read using the
readIgphyml
function of
Alakazam.
After opening an R
session, enter the following commands. Note that
when using the Docker container, you’ll need to run dev.off()
after
plotting the tree to create a pdf plot in the examples
directory:
library(alakazam)
library(igraph)
db = readIgphyml("ex_igphyml-pass.tab")
#plot largest lineage tree
plot(db$trees[[1]],layout=layout_as_tree)
#show HLP10 parameters
print(t(db$param[1,]))
CLONE "REPERTOIRE"
NSEQ "4"
NSITE "107"
TREE_LENGTH "0.286"
LHOOD "-290.7928"
KAPPA_MLE "2.266"
OMEGA_FWR_MLE "0.5284"
OMEGA_CDR_MLE "2.3324"
WRC_2_MLE "4.8019"
GYW_0_MLE "3.4464"
WA_1_MLE "5.972"
TW_0_MLE "0.8131"
SYC_2_MLE "-0.99"
GRS_0_MLE "0.2583"
Using Dowser¶
For simply building trees, the easiest way to use IgPhyML is through the R package Dowser. Dowser supports multiple methods for building B cell lineage trees, including IgPhyML.
To build trees with Dowser, follow this tutorial. For building IgPhyML trees specifically, follow this link once formatting steps are complete.
In either case, IgPhyML will need to either have already been compiled from source code or contained within the Docker container.
While BuildTrees currently offers more building and model options, and also masks codons split by V-gene reference alignment, these features also under in development within Dowser and should be available soon.
Using BuildTrees¶
IgPhyML can be run indirectly through the Change-O program
BuildTrees
by specifying the --igphyml
option.
Most of these instructions require Change-O 0.4.6 or higher, Alakazam 0.3.0 or higher,
and IgPhyML to be installed, with the executable in your PATH
variable.
If these aren’t possible, see IgPhyML standalone operation
To view all options for BuildTrees , run the command:
BuildTrees.py --help
Controlling input¶
The process begins with an AIRR formatted data file, in
which each sequence has been clustered into a clonal group,
which has subsequently had its unmutated V and J sequence predicted germlines.
The following column names are required in the input file: fields: sequence_id
,
sequence
, sequence_alignment
, germline_alignment_d_mask
,
v_call
, j_call
, and clone_id
. productive
is recommended.
Use BuildTrees.py to break this file into separate sequence alignment files that can be used with IgPhyML. This program will:
- Filter out nonfunctional sequences.
- Mask codons split by insertions.
- Separate clonal groups into separate alignment files (aligned by IMGT site) and information files
- Create the repertoire files for this dataset.
Create IgPhyML input files from examples/example.tsv
without running IgPhyML:
cd examples
BuildTrees.py -d example.tsv --outname ex --log ex.log --collapse
Here the --collapse
flag is used to
collapse identical sequences. This is highly recommended because
identical sequences slow down calculations without actually affecting
likelihood values in IgPhyML.
Note
IgPhyML requires at least three sequences in a lineage, so in the case that there is only one observed sequence within a clone, that sequence is duplicated. This will not affect the likelihood calculation because these sequences will have a branch length of zero, but it will affect metrics that take sequence frequency into account.
Subsampling AIRR datasets¶
IgPhyML runs slowly with more than a few thousand sequences. You can
subsample your dataset using the --sample
and --minseq
options,
which will subsample your dataset to the specified depth and then remove
all clones below the specified size cutoff:
BuildTrees.py -d example.tsv --collapse --sample 5 --minseq 2 --igphyml
Selecting individual clones¶
Often only particular clones are of interest for lineage tree analysis. To
restrict IgPhyML analysis to particular clones, use the --clones
option:
#Only use sequences from clone 2:
BuildTrees.py -d example.tsv --collapse --clone 2 --igphyml
#Only use sequences from clones 1 and 2:
BuildTrees.py -d example.tsv --collapse --clone 1 2 --igphyml
Cleaning up intermediates¶
BuildTrees produces many intermediate files that are usually not useful to the user.
To delete them after IgPhyML is run, use --clean all
:
BuildTrees.py -d example.tsv --log ex.log --collapse --igphyml --clean all
Removing the CDR3 region¶
If you plan to analyze model parameters to study things such as SHM and
selection, it’s important to remove
the CDR3 region to avoid known model biases in estimating \(\omega\). To
do this, use --ncdr3
:
BuildTrees.py -d example.tsv --log ex.log --collapse --ncdr3 --igphyml
Building lineage trees¶
If you’re simply interested in getting some tree topologies, the fastest
option is to just use the GY94 and not estimate any parameters
under HLP19. This is done using the --optimize n
option:
BuildTrees.py -d example.tsv --collapse --igphyml --clean all --optimize n
The trees can then be visualized using
igraph and Alakazam.
Open an R
session:
library(alakazam)
library(igraph)
db = readIgphyml("example_igphyml-pass.tab")
#plot largest lineage tree
plot(db$trees[[1]],layout=layout_as_tree)
Run dev.off()
after plotting if using the Docker image to create a pdf.
In these plots, edge labels represent the expected number of substitutions between
nodes in the tree. See the Alakazam
documentation for plotting this style of trees.
Alternatively, more traditional bifurcating tree topologies can
be used:
library(alakazam)
library(ape)
db = readIgphyml("example_igphyml-pass.tab",format="phylo")
#plot largest lineage tree
plot(db$trees[[1]])
Of course, these are quite simple trees. A more interesting tree can be visualized from a different provided dataset:
library(alakazam)
library(ape)
db = readIgphyml("sample1_igphyml-pass.tab",format="phylo")
#plot largest lineage tree
plot(ladderize(db$trees[[1]]),cex=0.7,no.margin=TRUE)
Alternatively, to estimate ML tree topologies using the HLP19 model, use:
BuildTrees.py -d example.tsv --collapse --igphyml --clean all --optimize tlr
This will be slower than using the GY94 model but does return meaningful HLP19 parameter estimates. These results can be visualized in the same manner using Alakazam.
Evolutionary hypothesis testing¶
The HLP19 model¶
The HLP19 model is the heart of IgPhyML and adjusts for features of affinity maturation that violate the assumptions of most other phylogenetic models. It uses four sets of parameters to characterize the types of mutations the occurred over a lineage’s development, and to help build the tree.
\(\omega\): Also called dN/dS, or the ratio of nonsynonymous (amino acid replacement) and synonymous (silent) mutation rates. This parameter generally relates to clonal selection, with totally neutral amino acid evolution having an \(\omega \approx 1\), negative selection indicated by \(\omega < 1\) and diversifying selection indicated by \(\omega > 1\). Generally, we find a lower \(\omega\) for FWRs than CDRs, presumably because FWRs are more structurally constrained.
\(\kappa\): Ratio of transitions (within purines/pyrimidines) to transversions (between purines/pyrimidines). For normal somatic hypermutation this ratio is usually \(\approx 2\).
Motif mutability (e.g. \(h^{WRC}\)): Mutability parameters for specified hot- and coldspot motifs. These estimates are equivalent to the fold-change in mutability for that motif compared to regular motifs, minus one. So, \(h^{WRC} > 0\) indicates at hotspot, \(h^{WRC} < 0\) indicates a coldspot, and \(h^{WRC} = 2\) indicates a 3x increase in WRC substitution rate. The HLP19 model by default estimates six motif mutability parameters: four hotspots (WRC, GYW, WA, and TW) and two coldspots (SYC and GRS).
Specifying parameters¶
Substitution parameters are specified using the -t
for \(\kappa\)
(transition/transverion rate), --omega
for \(\omega\)
(nonsynonymous/synonymous mutation rate), and --motifs
and
--hotness
for specifying the motif mutability models. The default
for all of these is to estimate shared parameter values across all
lineages, which is also specified by e
.
Due to default parameter settings, the following two commands are equivalent:
BuildTrees.py -d example.tsv --collapse --igphyml
BuildTrees.py -d example.tsv --collapse --igphyml -t e --omega e,e \
--motifs WRC_2:0,GYW_0:1,WA_1:2,TW_0:3,SYC_2:4,GRS_0:5 \
--hotness e,e,e,e,e,e --optimize lr
Note that here we use --optimize lr
, which will keep tree topologies the
same and only estimate branch lengths and substitution parameters. This will keep topologies
the same as the GY94, but will estimate substitution parameters much
more quickly. Using --optimize tlr
will also optimize tree topology, using
--optimize r
will only optimize model parameters, and --optimize n
will not
optimize topology, branch lengths, or model parameters.
The default setting is to estimate a separate \(\omega\) parameter for FWR and CDR regions. If you want one \(\omega\) for all regions, use:
BuildTrees.py -d example.tsv --collapse --igphyml --omega e
You can also constrain motifs to have the same mutabilities
by altering the indexes after the ‘:’ in the --motifs
option.
For motif mutability, each value
in the --hotness
option corresponds to the index specified in
the --motifs
option. For example, to estimate a model in
which WRC/GYW, WA/TW, and SYC/GRS motifs are respectively constrained
to have the same mutabilities, use:
BuildTrees.py -d example.tsv --collapse --igphyml \
--motifs WRC_2:0,GYW_0:0,WA_1:1,TW_0:1,SYC_2:2,GRS_0:2 \
--hotness e,e,e
Confidence interval estimation¶
It is possible to estimate 95% confidence intervals for any of these parameters by adding a ‘c’ to the parameter specification. For example, to estimate a 95% confidence interval for \(\omega _{CDR}\) but not \(\omega _{FWR}\), use:
BuildTrees.py -d example.tsv --collapse --ncdr3 --clean all --igphyml --omega e,ce
To estimate a 95% confidence interval for \(\omega _{FWR}\) but not \(\omega _{CDR}\), use:
BuildTrees.py -d example.tsv --collapse --ncdr3 --clean all --igphyml --omega ce,e
Any combination of confidence interval specifications can be used for the above parameter options. For instance, to estimate confidence intervals for GYW mutability, use:
BuildTrees.py -d example.tsv --collapse --ncdr3 --clean all --igphyml --hotness e,ce,e,e,e,e
which is equivalent to:
BuildTrees.py -d example.tsv --collapse --ncdr3 --clean all --igphyml \
--motifs WRC_2:0,GYW_0:1,WA_1:2,TW_0:3,SYC_2:4,GRS_0:5 \
--hotness e,ce,e,e,e,e
Remember it is important to remove the CDR3 region for this kind of analysis. You can find further explanation of the different options in the commandline help page of BuildTrees, including controlling output directories and file names.
Visualizing results¶
Model hypothesis testing can be easily accomplished with the Alakazam functions readIgphyml and combineIgphyml. In this example, we first run IgPhyML on an example file and estimate confidence intervals on \(\omega _{CDR}\):
BuildTrees.py -d example.tsv --collapse --nproc 2 --ncdr3 --clean all --igphyml --omega e,ce
Then, open an R
session, where we load the example result and two other samples.
To compare maximum likelihood parameter estimates for all samples, use (run
dev.off()
after plotting if using the Docker image to create a pdf):
#!/usr/bin/R
library(alakazam)
library(ggplot2)
#read in three different samples
ex = readIgphyml("example_igphyml-pass.tab",id="EX")
s1 = readIgphyml("sample1_igphyml-pass.tab",id="S1")
s2 = readIgphyml("sample2_igphyml-pass.tab",id="S2")
#print out parameter values
print(ex$param[1,])
#combine objects into a dataframe
comb = combineIgphyml(list(ex,s1,s2),format="long")
ggplot(comb[grepl("MLE",comb$variable),],
aes(x=ID,y=variable,fill=value)) + geom_tile() +
theme_bw() + scale_fill_distiller(palette="RdYlBu")
Maximum likelihood point estimates of each parameter are specified with “_MLE”, while upper and lower confidence interval bounds of a parameter are specified with “_UCI” and “_LCI” respectively. Which estimates are available is controlled by the model specified and whether confidence intervals were estimated when running IgPhyML.
properly test the hypothesis that \(\omega _{CDR}\) parameter estimates are significantly different among these datasets, use:
#!/usr/bin/R
library(alakazam)
library(ggplot2)
#read in three different samples
ex = readIgphyml("example_igphyml-pass.tab",id="EX")
s1 = readIgphyml("sample1_igphyml-pass.tab",id="S1")
s2 = readIgphyml("sample2_igphyml-pass.tab",id="S2")
#combine objects into a dataframe
comb = combineIgphyml(list(ex,s1,s2),format="wide")
#compare CDR dN/dS for three samples
ggplot(comb,aes(x=ID,y=OMEGA_CDR_MLE, ymin=OMEGA_CDR_LCI,
ymax=OMEGA_CDR_UCI)) + geom_point() +
geom_errorbar(width=0.1) + theme_bw()
Where we can see that the confidence interval for Sample 1 does not overlap with the confidence interval for Sample 2, meanging we conclude Sample 1 has significantly lower \(\omega _{CDR}\) than Sample 2. However, the confidence intervals for our example file (ex) are too wide to reach any firm conclusion.
Optimizing performance¶
IgPhyML is a computationally intensive program. There are some ways to make calculations more practical, detailed below.
Data subsampling: IgPhyML runs slowly with more than a few thousand sequences. You can
subsample your dataset using the --sample
and --minseq
options in
BuildTrees.py, which will subsample your dataset to the specified depth and
then remove all clones below the specified size cutoff (see Subsampling
Change-O datasets).
Analyzing specific clones: The --clone
option can be used to analyze only the
specified clones.
Parallelizing computations: It is possible to parallelize likelihood
calculations by splitting computations across multiple cores using the
--nproc
option. Currently, calculations are parallelized by tree,
so there is no use in using more threads than lineages.
Enforcing minimum lineage size: Many repertoires often contain huge
numbers of small lineages that can make computations impractical. To
limit the size of lineages being analyzed, specify a cutoff with
--minseq
when running BuildTrees.py.
Standalone operation¶
While IgPhyML is easiest to use when run indirectly through the Change-O program
BuildTrees,
this is not always possible or desirable. This
section details how IgPhyML may be run directly as a standalone program. As before,
this section requires IgPhyML to be installed, with the executable in your PATH
variable.
This is already done in the Docker image.
Model specification and confidence interval estimation are the same as when running IgPhyML from BuildTrees. The means of specifying input, however, are different.
To view all options for IgPhyML directly, run the command:
igphyml --help
Analyzing a single lineage¶
In its most basic and flexible operation, IgPhyML operates on single lineage. Its only required input is an in-frame multiple sequence alignment, without any stop codons, in FASTA or PHYLIP format. The model must be specified. The most basic is GY94, which is fast but doesn’t account for SHM context sensitivity. The HLP19 model corrects for SHM context sensitivity but is slower requires the root sequence be specified:
cd examples
#Estimate parameters and topology using GY94 model
igphyml -i example.fasta -m GY --run_id gy
#Estimate parameters and topology using HLP19 model
igphyml -i example.fasta -m HLP --root V4-59 --run_id hlp
It is also possible to use a fixed tree topology:
igphyml -i example.fasta -m HLP --root V4-59 -u example.fasta_igphyml_tree_gy.txt
Or estimate separate \(\omega\) values for CDR and FWR partitions:
igphyml -i example.fasta -m HLP --root V4-59 --partfile part.example.txt
Generally, all the following features of repertoire-wide phylogenetic analysis can also be performed on single lineages by specifying input files in the above manner.
Analyzing repertoires¶
These commands should work as a first pass on many reasonably sized datasets, but if you really want to understand what’s going on or make sure what you’re doing makes sense, please check out the rest of the manual.
Convert Change-O files into IgPhyML inputs
Move to the examples
subfolder and run, in order:
BuildTrees.py -d example.tsv --outname ex --log ex.log --collapse
This will create the directory ex
and the file
ex_lineages.tsv
. Each ex/<clone ID>.fasta
contains the IMGT
mutliple sequence alignemt for a particular clone, and each
ex/<clone ID>.part.txt
file contains information about V and J
germline assignments, as well as IMGT unique numbering for each site.
The file ex.log
will contain information about whether or not each
sequence was included in the analysis. The file ex_lineages.tsv
is
the direct input to IgPhyML. Each line represents a clone and shows
the multiple sequence alignment, starting tree topology (N if
ignored), germline sequence ID in alignment file, and partition file
(N if ignored). These repertoire files start with the number of
lineages in the repertoire, and lineages are arranged from most to
least number of sequences. Here, it is important to not use
--igphyml
or --clean all
to prevent IgPhyML from being run
by BuildTrees, and to prevent these intermediate files from being deleted.
Build lineage trees using the GY94 model
This option is fast and makes good starting topologies, but doesn’t correct
for SHM mutation biases. Use the --outrep
option to make a modified
igphyml --repfile ex_lineages.tsv -m GY --outrep ex_lineages_gy.tsv --run_id gy
Build lineage trees using the HLP19 model with GY94 starting trees
This option is slower but corrects for mutation biases of SHM:
#HLP topology, branch lengths, and parameters (slow)
igphyml --repfile ex_lineages_gy.tsv -m HLP --threads 1
#GY94 topology, HLP19 branch lengths and parameters (faster)
igphyml --repfile ex_lineages_gy.tsv -m HLP --optimize lr --threads 1
Both of these can be parallelized by modifying the
--threads <thread count>
option. Trees files are listed as
ex/<clone id>.fasta_igphyml_tree.txt
, and can be viewed with most
tree viewers (I recommend
FigTree). Parameter
estimates are in ex_lineages.tsv_igphyml_stats.txt
.
Controlling output format¶
Alternatively, run using --oformat tab
to create input readable by
the
readIgphyml
function of
Alakazam.:
#Output can be read using readIgphyml function
igphyml --repfile ex_lineages_gy.tsv -m HLP --optimize lr --threads 1 --oformat tab
Open an R
session, and run the following commands. Note the results are the same as in the quickstart example
library(alakazam)
library(igraph)
db = readIgphyml("ex_lineages_gy.tsv_igphyml_stats.tab")
#plot largest lineage tree
plot(db$trees[[1]],layout=layout_as_tree)
#show HLP10 parameters
print(t(db$param[1,]))
CLONE "REPERTOIRE"
NSEQ "4"
NSITE "107"
TREE_LENGTH "0.286"
LHOOD "-290.7928"
KAPPA_MLE "2.266"
OMEGA_FWR_MLE "0.5284"
OMEGA_CDR_MLE "2.3324"
WRC_2_MLE "4.8019"
GYW_0_MLE "3.4464"
WA_1_MLE "5.972"
TW_0_MLE "0.8131"
SYC_2_MLE "-0.99"
GRS_0_MLE "0.2583"
Release Notes¶
Version 2.0.0: Sept. 22, 2023 + Added support for combined light chain tree building, and scaled branch models + Compatible in Dowser 2.0.0
Version 1.1.4: April 13, 2022¶
- Added new parsimony reconstruction methods and maximum ambiguity polytomy resolution for Dowser compatability.
Version 1.1.3: May 20, 2020¶
- Small bug fixes in preparation for Immcantation 4.0.0 release
Version 1.1.2: May 13, 2020¶
- Added basic intermediate sequence reconstruction function using –ASR or –ASRc 0.9
Version 1.1.1: Jan. 27, 2020¶
- Added more general maximum parsimony reconstruction methods.
Version 1.1.0: Sept. 16, 2019¶
- Added parsimony reconstruction methods for cell subsets. Information for Austin, Buckner et al. (2019) hardcoded. More general methods and documentation to follow.
Version 1.0.7: Feb. 19, 2019¶
- Added HLP19 substitution model, specified by HLP or HLP19.
- Constant B matrix approximation (–cbmat; default for HLP) dramatically improves runtime.
- Default for –motifs is now FCH (free coldspots and hotspots).
- FWR/CDR omega optimization is default behavior if partition files included.
- Output file cleaned up, as well as confidence interval prediction output.
Version 1.0.6¶
- Repertoire-wide HLP17 parameter optimization
- Parallelization by tree using –threads
- Integration to Change-O through BuildTrees.py
Contact¶
If you have questions you can email the Immcantation Group.
If you’ve discovered a bug or have a feature request, you can create an issue on Bitbucket using the Issue Tracker.
Citation¶
To cite IgPhyML in publications please use:
Hoehn, KB, Vander Heiden, JA, Zhou JQ, Lunter, G, Pybus, OG, & Kleinstein SH. Repertoire-wide phylogenetic models of B cell molecular evolution reveal evolutionary signatures of aging and vaccination. Proceedings of the National Academy of Sciences, 116(45), 22664-22672. doi: https://doi.org/10.1073/pnas.1906020116
Hoehn, KB, Lunter, G, & Pybus, OG. A phylogenetic codon substitution model for antibody lineages. Genetics, 206(1), 417-427. doi: http://dx.doi.org/10.1534/genetics.116.196303
References¶
IgPhyML is build using the source code of codonPhyML:
Gil, M, Zanetti, MS, Zoller, S, Anisimova, M. CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models Mol Biol Evol. 2013 Jun;30(6):1270-80. doi: 10.1093/molbev/mst034
which is based off of PhyML:
Guindon, S, Delsuc F, Dufayard JF, Gascuel, O Estimating maximum likelihood phylogenies with PhyML Methods Mol Biol. 2009;537:113-37. doi: 10.1007/978-1-59745-251-9_6
The HLP17 and HLP19 models are based off of the GY94 model:
Nielsen R, Yang, Z Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998 Mar;148(3):929-36
License¶
This work is licensed under the GNU General Public License, verion 2 (GNU GLP v2.0).