From alignments to concordance factors with the TICR pipeline

1. Estimating gene trees with MrBayes

1.1 Input for MrBayes

You need to write a MrBayes block with the parameters for MrBayes. Here, we provide a MrBayes block in mbblock.txt from the TICR repo:

$ cd TICR/example
$ cat mbblock.txt
begin mrbayes;
	set nowarnings=yes;
	set autoclose=yes;
	lset nst=2;
	mcmcp ngen=200000 burninfrac=.25 samplefreq=50 printfreq=10000 [increase these for real]
  diagnfreq=10000 nruns=2 nchains=2 temp=0.40 swapfreq=10;       [increase for real analysis]
  sumt;
	mcmc; 
end;

Make sure to increase the number of generations (ngen) and to modify the other chain parameters for your real analysis.

1.2 Running MrBayes

We are ready to analyze all loci with MrBayes.

Recall that the main working directory (scratch or my-analysis) has the structure:

  • scratch (or my-analysis)
    • snaq-tutorial (with the new analysis folder)
    • PhyloUtilities (with the scripts)

We want to be in the folder with the data:

cd snaq-tutorial/analysis

In this folder, we want to copy the MrBayes block:

cp ../../PhyloUtilities/scripts/mbblock.txt .
$ ../../PhyloUtilities/scripts/mb.pl nexus.tar.gz -m mbblock.txt -o mb-output

Script was called as follows:
perl mb.pl nexus.tar.gz -m mbblock.txt -o mb-output

Appending MrBayes block to each gene... done.

Job server successfully created.

  Analyses complete: 372/372.
  All connections closed.
Total execution time: 12 minutes, 27 seconds.

1.3 Output for MrBayes

The script created a new directory named mb-output (as specified in the command), which contains a compressed tarball of all MrBayes output: mb-output/nexus.mb.tar. We won’t look at the output files, but if you want instructions on how to untar to see them, check the PhyloUtilities website.

We want to check convergence of the runs with (details of this command can be found in the TICR repo):

../../PhyloUtilities/scripts/mb.pl mb-output -c 0.05

We get the output:

MrBayes results available for 372 total genes:
  0.nex.tar.gz  : 0.008404
  1.nex.tar.gz  : 0.003972
  2.nex.tar.gz  : 0.010326
  3.nex.tar.gz  : 0.006065
  4.nex.tar.gz  : 0.004270
  5.nex.tar.gz  : 0.012918
  6.nex.tar.gz  : 0.005891
  7.nex.tar.gz  : 0.007911
  8.nex.tar.gz  : 0.011182
  9.nex.tar.gz  : 0.006456
  10.nex.tar.gz : 0.020304
  11.nex.tar.gz : 0.002545
  12.nex.tar.gz : 0.000892
  13.nex.tar.gz : 0.004311
  14.nex.tar.gz : 0.015637
  15.nex.tar.gz : 0.006034
  16.nex.tar.gz : 0.045463
  17.nex.tar.gz : 0.005502
  18.nex.tar.gz : 0.012064
  19.nex.tar.gz : 0.006001
  20.nex.tar.gz : 0.006912
  21.nex.tar.gz : 0.004215
  22.nex.tar.gz : 0.000471
  23.nex.tar.gz : 0.007386
  24.nex.tar.gz : 0.005687
  25.nex.tar.gz : 0.005101
  26.nex.tar.gz : 0.004211
  27.nex.tar.gz : 0.002796
  28.nex.tar.gz : 0.003882
  29.nex.tar.gz : 0.007405
  30.nex.tar.gz : 0.000785
  31.nex.tar.gz : 0.003428
  32.nex.tar.gz : 0.008115
  33.nex.tar.gz : 0.016229
  34.nex.tar.gz : 0.006179
  35.nex.tar.gz : 0.006960
  36.nex.tar.gz : 0.015773
  37.nex.tar.gz : 0.004938
  38.nex.tar.gz : 0.006480
  39.nex.tar.gz : 0.005311
  40.nex.tar.gz : 0.005746
  41.nex.tar.gz : 0.002670
  42.nex.tar.gz : 0.004320
  43.nex.tar.gz : 0.005487
  44.nex.tar.gz : 0.003707
  45.nex.tar.gz : 0.008095
  46.nex.tar.gz : 0.001948
  47.nex.tar.gz : 0.006394
  48.nex.tar.gz : 0.003927
  49.nex.tar.gz : 0.073622
  50.nex.tar.gz : 0.004490
  51.nex.tar.gz : 0.005794
  52.nex.tar.gz : 0.024522
  53.nex.tar.gz : 0.001744
  54.nex.tar.gz : 0.007799
  55.nex.tar.gz : 0.004663
  56.nex.tar.gz : 0.008674
  57.nex.tar.gz : 0.007055
  58.nex.tar.gz : 0.007422
  59.nex.tar.gz : 0.024262
  60.nex.tar.gz : 0.007691
  61.nex.tar.gz : 0.004271
  62.nex.tar.gz : 0.005841
  63.nex.tar.gz : 0.003888
  64.nex.tar.gz : 0.006199
  65.nex.tar.gz : 0.011478
  66.nex.tar.gz : 0.002003
  67.nex.tar.gz : 0.004661
  68.nex.tar.gz : 0.017043
  69.nex.tar.gz : 0.007706
  70.nex.tar.gz : 0.007038
  71.nex.tar.gz : 0.003634
  72.nex.tar.gz : 0.006210
  73.nex.tar.gz : 0.001704
  74.nex.tar.gz : 0.005985
  75.nex.tar.gz : 0.258608
  76.nex.tar.gz : 0.002633
  77.nex.tar.gz : 0.026598
  78.nex.tar.gz : 0.004639
  79.nex.tar.gz : 0.010828
  80.nex.tar.gz : 0.006947
  81.nex.tar.gz : 0.007614
  82.nex.tar.gz : 0.005631
  83.nex.tar.gz : 0.330515
  84.nex.tar.gz : 0.005616
  85.nex.tar.gz : 0.005627
  86.nex.tar.gz : 0.012935
  87.nex.tar.gz : 0.166260
  88.nex.tar.gz : 0.005143
  89.nex.tar.gz : 0.004061
  90.nex.tar.gz : 0.006205
  91.nex.tar.gz : 0.020998
  92.nex.tar.gz : 0.001445
  93.nex.tar.gz : 0.006037
  94.nex.tar.gz : 0.006861
  95.nex.tar.gz : 0.010697
  96.nex.tar.gz : 0.004464
  97.nex.tar.gz : 0.005593
  98.nex.tar.gz : 0.012156
  99.nex.tar.gz : 0.001822
  100.nex.tar.gz: 0.002089
  101.nex.tar.gz: 0.008157
  102.nex.tar.gz: 0.004877
  103.nex.tar.gz: 0.006332
  104.nex.tar.gz: 0.003313
  105.nex.tar.gz: 0.003186
  106.nex.tar.gz: 0.010525
  107.nex.tar.gz: 0.006692
  108.nex.tar.gz: 0.006362
  109.nex.tar.gz: 0.007441
  110.nex.tar.gz: 0.004290
  111.nex.tar.gz: 0.004849
  112.nex.tar.gz: 0.004435
  113.nex.tar.gz: 0.002755
  114.nex.tar.gz: 0.001602
  115.nex.tar.gz: 0.002388
  116.nex.tar.gz: 0.006761
  117.nex.tar.gz: 0.015125
  118.nex.tar.gz: 0.041567
  119.nex.tar.gz: 0.004015
  120.nex.tar.gz: 0.007964
  121.nex.tar.gz: 0.003676
  122.nex.tar.gz: 0.006461
  123.nex.tar.gz: 0.007245
  124.nex.tar.gz: 0.005796
  125.nex.tar.gz: 0.020264
  126.nex.tar.gz: 0.005581
  127.nex.tar.gz: 0.004589
  128.nex.tar.gz: 0.006057
  129.nex.tar.gz: 0.007540
  130.nex.tar.gz: 0.007983
  131.nex.tar.gz: 0.003396
  132.nex.tar.gz: 0.009551
  133.nex.tar.gz: 0.003240
  134.nex.tar.gz: 0.011938
  135.nex.tar.gz: 0.015596
  136.nex.tar.gz: 0.028200
  137.nex.tar.gz: 0.000634
  138.nex.tar.gz: 0.005074
  139.nex.tar.gz: 0.007229
  140.nex.tar.gz: 0.008499
  141.nex.tar.gz: 0.006705
  142.nex.tar.gz: 0.003756
  143.nex.tar.gz: 0.008511
  144.nex.tar.gz: 0.011433
  145.nex.tar.gz: 0.005089
  146.nex.tar.gz: 0.004175
  147.nex.tar.gz: 0.033313
  148.nex.tar.gz: 0.005147
  149.nex.tar.gz: 0.011441
  150.nex.tar.gz: 0.009593
  151.nex.tar.gz: 0.007187
  152.nex.tar.gz: 0.010866
  153.nex.tar.gz: 0.004449
  154.nex.tar.gz: 0.009454
  155.nex.tar.gz: 0.005355
  156.nex.tar.gz: 0.000652
  157.nex.tar.gz: 0.004830
  158.nex.tar.gz: 0.010633
  159.nex.tar.gz: 0.006209
  160.nex.tar.gz: 0.002859
  161.nex.tar.gz: 0.006403
  162.nex.tar.gz: 0.021987
  163.nex.tar.gz: 0.003489
  164.nex.tar.gz: 0.005756
  165.nex.tar.gz: 0.008196
  166.nex.tar.gz: 0.007049
  167.nex.tar.gz: 0.006362
  168.nex.tar.gz: 0.002190
  169.nex.tar.gz: 0.000652
  170.nex.tar.gz: 0.027653
  171.nex.tar.gz: 0.007181
  172.nex.tar.gz: 0.006205
  173.nex.tar.gz: 0.004973
  174.nex.tar.gz: 0.002827
  175.nex.tar.gz: 0.005827
  176.nex.tar.gz: 0.005446
  177.nex.tar.gz: 0.003534
  178.nex.tar.gz: 0.003204
  179.nex.tar.gz: 0.005581
  180.nex.tar.gz: 0.008524
  181.nex.tar.gz: 0.006611
  182.nex.tar.gz: 0.006951
  183.nex.tar.gz: 0.005850
  184.nex.tar.gz: 0.005289
  185.nex.tar.gz: 0.007582
  186.nex.tar.gz: 0.002341
  187.nex.tar.gz: 0.002686
  188.nex.tar.gz: 0.012507
  189.nex.tar.gz: 0.004061
  190.nex.tar.gz: 0.005529
  191.nex.tar.gz: 0.002199
  192.nex.tar.gz: 0.003344
  193.nex.tar.gz: 0.000673
  194.nex.tar.gz: 0.006899
  195.nex.tar.gz: 0.007742
  196.nex.tar.gz: 0.006886
  197.nex.tar.gz: 0.022006
  198.nex.tar.gz: 0.003968
  199.nex.tar.gz: 0.005581
  200.nex.tar.gz: 0.004323
  201.nex.tar.gz: 0.006676
  202.nex.tar.gz: 0.008141
  203.nex.tar.gz: 0.006625
  204.nex.tar.gz: 0.006440
  205.nex.tar.gz: 0.003841
  206.nex.tar.gz: 0.009490
  207.nex.tar.gz: 0.005738
  208.nex.tar.gz: 0.001498
  209.nex.tar.gz: 0.001665
  210.nex.tar.gz: 0.004214
  211.nex.tar.gz: 0.002315
  212.nex.tar.gz: 0.003635
  213.nex.tar.gz: 0.009543
  214.nex.tar.gz: 0.003755
  215.nex.tar.gz: 0.002345
  216.nex.tar.gz: 0.000580
  217.nex.tar.gz: 0.002283
  218.nex.tar.gz: 0.002469
  219.nex.tar.gz: 0.003521
  220.nex.tar.gz: 0.003215
  221.nex.tar.gz: 0.004178
  222.nex.tar.gz: 0.004539
  223.nex.tar.gz: 0.002121
  224.nex.tar.gz: 0.005905
  225.nex.tar.gz: 0.003487
  226.nex.tar.gz: 0.121647
  227.nex.tar.gz: 0.004724
  228.nex.tar.gz: 0.014636
  229.nex.tar.gz: 0.007938
  230.nex.tar.gz: 0.005395
  231.nex.tar.gz: 0.001555
  232.nex.tar.gz: 0.005667
  233.nex.tar.gz: 0.005648
  234.nex.tar.gz: 0.006205
  235.nex.tar.gz: 0.001834
  236.nex.tar.gz: 0.009178
  237.nex.tar.gz: 0.009624
  238.nex.tar.gz: 0.013141
  239.nex.tar.gz: 0.003181
  240.nex.tar.gz: 0.005983
  241.nex.tar.gz: 0.016494
  242.nex.tar.gz: 0.003063
  243.nex.tar.gz: 0.004114
  244.nex.tar.gz: 0.019036
  245.nex.tar.gz: 0.006715
  246.nex.tar.gz: 0.008731
  247.nex.tar.gz: 0.002971
  248.nex.tar.gz: 0.007335
  249.nex.tar.gz: 0.002749
  250.nex.tar.gz: 0.006541
  251.nex.tar.gz: 0.011128
  252.nex.tar.gz: 0.006588
  253.nex.tar.gz: 0.002814
  254.nex.tar.gz: 0.002827
  255.nex.tar.gz: 0.002003
  256.nex.tar.gz: 0.001555
  257.nex.tar.gz: 0.004383
  258.nex.tar.gz: 0.006201
  259.nex.tar.gz: 0.009979
  260.nex.tar.gz: 0.014990
  261.nex.tar.gz: 0.011752
  262.nex.tar.gz: 0.011274
  263.nex.tar.gz: 0.003243
  264.nex.tar.gz: 0.005207
  265.nex.tar.gz: 0.008594
  266.nex.tar.gz: 0.005920
  267.nex.tar.gz: 0.005757
  268.nex.tar.gz: 0.007471
  269.nex.tar.gz: 0.005666
  270.nex.tar.gz: 0.006597
  271.nex.tar.gz: 0.005726
  272.nex.tar.gz: 0.007832
  273.nex.tar.gz: 0.005364
  274.nex.tar.gz: 0.005600
  275.nex.tar.gz: 0.030423
  276.nex.tar.gz: 0.003741
  277.nex.tar.gz: 0.007501
  278.nex.tar.gz: 0.004536
  279.nex.tar.gz: 0.006362
  280.nex.tar.gz: 0.006435
  281.nex.tar.gz: 0.006287
  282.nex.tar.gz: 0.003842
  283.nex.tar.gz: 0.004663
  284.nex.tar.gz: 0.001582
  285.nex.tar.gz: 0.004957
  286.nex.tar.gz: 0.027391
  287.nex.tar.gz: 0.004385
  288.nex.tar.gz: 0.003299
  289.nex.tar.gz: 0.013534
  290.nex.tar.gz: 0.003786
  291.nex.tar.gz: 0.006461
  292.nex.tar.gz: 0.005778
  293.nex.tar.gz: 0.005140
  294.nex.tar.gz: 0.031477
  295.nex.tar.gz: 0.003393
  296.nex.tar.gz: 0.004055
  297.nex.tar.gz: 0.004355
  298.nex.tar.gz: 0.003988
  299.nex.tar.gz: 0.012357
  300.nex.tar.gz: 0.002315
  301.nex.tar.gz: 0.005251
  302.nex.tar.gz: 0.002577
  303.nex.tar.gz: 0.006806
  304.nex.tar.gz: 0.007930
  305.nex.tar.gz: 0.012183
  306.nex.tar.gz: 0.004129
  307.nex.tar.gz: 0.005166
  308.nex.tar.gz: 0.004047
  309.nex.tar.gz: 0.005882
  310.nex.tar.gz: 0.007147
  311.nex.tar.gz: 0.045486
  312.nex.tar.gz: 0.008107
  313.nex.tar.gz: 0.009618
  314.nex.tar.gz: 0.003637
  315.nex.tar.gz: 0.002289
  316.nex.tar.gz: 0.006670
  317.nex.tar.gz: 0.005779
  318.nex.tar.gz: 0.001750
  319.nex.tar.gz: 0.018932
  320.nex.tar.gz: 0.004887
  321.nex.tar.gz: 0.008093
  322.nex.tar.gz: 0.006452
  323.nex.tar.gz: 0.015659
  324.nex.tar.gz: 0.003386
  325.nex.tar.gz: 0.008097
  326.nex.tar.gz: 0.002962
  327.nex.tar.gz: 0.000956
  328.nex.tar.gz: 0.007952
  329.nex.tar.gz: 0.013827
  330.nex.tar.gz: 0.005720
  331.nex.tar.gz: 0.001704
  332.nex.tar.gz: 0.005920
  333.nex.tar.gz: 0.011864
  334.nex.tar.gz: 0.012763
  335.nex.tar.gz: 0.006663
  336.nex.tar.gz: 0.004560
  337.nex.tar.gz: 0.005724
  338.nex.tar.gz: 0.025426
  339.nex.tar.gz: 0.005296
  340.nex.tar.gz: 0.111822
  341.nex.tar.gz: 0.006921
  342.nex.tar.gz: 0.007060
  343.nex.tar.gz: 0.021951
  344.nex.tar.gz: 0.006315
  345.nex.tar.gz: 0.018638
  346.nex.tar.gz: 0.001696
  347.nex.tar.gz: 0.017267
  348.nex.tar.gz: 0.007676
  349.nex.tar.gz: 0.005320
  350.nex.tar.gz: 0.009351
  351.nex.tar.gz: 0.015969
  352.nex.tar.gz: 0.005057
  353.nex.tar.gz: 0.004182
  354.nex.tar.gz: 0.003452
  355.nex.tar.gz: 0.000308
  356.nex.tar.gz: 0.004061
  357.nex.tar.gz: 0.006352
  358.nex.tar.gz: 0.006899
  359.nex.tar.gz: 0.005364
  360.nex.tar.gz: 0.016444
  361.nex.tar.gz: 0.011251
  362.nex.tar.gz: 0.008269
  363.nex.tar.gz: 0.006016
  364.nex.tar.gz: 0.009048
  365.nex.tar.gz: 0.006813
  366.nex.tar.gz: 0.004241
  367.nex.tar.gz: 0.006854
  368.nex.tar.gz: 0.002055
  369.nex.tar.gz: 0.005239
  370.nex.tar.gz: 0.006021
  371.nex.tar.gz: 0.004851
6 gene(s) failed to meet the threshold of 0.05 (1.61%).

We can run the chain for longer to allow all genes to reach convergence, but for now, we will ignore the lack of convergence of 6 genes.

The TICR repo has the option to delete the genes that did not converge, but it has an error and it can end up deleting the whole mb-output folder, so we are not running that option.

We are using MrBayes here, but we could use any method to estimate gene trees as described in the PhyloUtilities website.

2. Running BUCKy on the posterior distributions of gene trees

We now run BUCKy to estimate the concordance factors from the posterior distributions of gene trees from MrBayes. This script will run BUCKy on every 4-taxon subset (1820 in this case for 16 taxa). Note that this analysis will take ~1 hour on a single laptop.

$ ../../TICR/scripts/bucky.pl mb-output/nexus.mb.tar -o bucky-output

Checking for BUCKy version >= 1.4.4...
  BUCKy version: 1.4.4.
  BUCKy version check passed.

Script was called as follows:
perl bucky.pl mb-output/nexus.mb.tar -o bucky-output

Found 16 taxa shared across all genes in this archive, 1820 of 1820 possible quartets will be run using output from 372 total genes.
Summarizing MrBayes output for 372 genes.
Could not determine external IP address, only local clients will be created.
Job server successfully created.

  Analyses complete: 1820/1820.
  All connections closed.
Total execution time: 48 minutes, 10 seconds.

2.1 Output for BUCKy

The script created a new directory named bucky-output (like we specified in the command above), which contains several tarballs (results from MrBayes, mbsum and from bucky).

$ ls bucky-output/
nexus.CFs.csv  nexus.mb.tar  nexus.mbsum.tar.gz

The .csv file (spreadsheet) lists all the 4-taxon sets and their estimated quartet concordance factors which will be the input for SNaQ.

We use BUCKy to account for gene tree estimation error, but we could skip this step and use gene trees directly as input in SNaQ as described in the PhyloUtilities website.

3. Estimating a population tree with Quartet MaxCut

The optimization algorithm within SNaQ is complex, so a good starting point to help the search in network space would improve the accuracy and running time.

We will use Quartet MaxCut (QMC) to estimate a starting population tree because the input data for QMC is the same table of concordance factors.

$ ../../PhyloUtilities/scripts/get-pop-tree.pl bucky-output/nexus.CFs.csv

Script was called as follows:
perl get-pop-tree.pl bucky-output/nexus.CFs.csv

Parsing major resolution of each 4-taxon set... done.
Running Quartet Max Cut...


Quartet MaxCut version 2.10 by Sagi Snir, University of Haifa

quartet file is nexus.QMC.txt, 

Number of quartets is 1820, max element 16

Number of quartets read: 1820, max ele 16

Quartet Max Cut complete, tree located in 'nexus.QMC.tre'.

We have a new file nexus.QMC.tre with our QMC tree.

Final input files for SNaQ

We ran the following pipeline: From loci alignments in nexus file to estimated posterior distributions of gene trees with MrBayes, and then estimated concordance factors with BUCKy. The CFs were also used to estimate a starting topology for SNaQ with QMC.

The input for SNaQ will then be:

  • starting topology in nexus.QMC.tre
  • table of concordance factors in nexus.CFs.csv

We note that there is an alternative pipeline in which estimated gene trees are used as input directly for SNaQ. In this approach, we can estimate the gene trees with any method like RAxML or IQ-Tree, and it is described in the PhyloUtilities website.

TICR goodness of fit test

As part of the TICR pipeline, we can test whether a tree is a good enough fit for the gene trees. More information about this test can be found in the PhyloUtilities website.

Moving files out of the Docker container into local machine

For those running the commands in a Docker container, we need to bring back these output files into your local machine.

In your local machine (not inside the Docker container), type docker ps to get the Contained ID:

% docker ps
CONTAINER ID   IMAGE                      COMMAND       CREATED        STATUS        PORTS     NAMES
f9331b6cad1a   solislemus/ticr-docker:1   "/bin/bash"   18 hours ago   Up 18 hours             mystifying_banach

We are going to copy only the csv table and the starting tree with the command docker cp containerID:/path/to/find/files /path/to/put/copy. Recall that you need to be outside of the Docker. We will copy these files inside snaq-tutorial/analysis

pwd ## snaq-tutorial/analysis
docker cp f9331b6cad1a:/scratch/snaq-tutorial/analysis/nexus.QMC.tre .
docker cp f9331b6cad1a:/scratch/snaq-tutorial/analysis/bucky-output/nexus.CFs.csv .

Once you exit the Docker container, it does not disappear (unless you ran it with the rm option which we did not). So, you can always log back into the Docker if you know the container ID. More information, see here or in the Docker tutorial.


Copyright Solis-Lemus lab © 2023. Distributed by an MIT license.