Pre-process data before input models

PhyloClustering.standardize_treeFunction
standardize_tree(tree::AbstractMatrix{<:Real})

Standardize tree Matrix that returned by split_weight. It is recommended to standardize the data before inputting it into the model.

Arguments

  • tree: a N * B Matrix containing trees (each row is a B-dimensional tree in bipartiton format).

Output

A standardized B * N tree Matrix with a mean of about 0 and a standard deviation of about 1. This tree Matrix can be the input of model.

source
PhyloClustering.distanceFunction
distance(tree::AbstractMatrix{<:Real})

Get the distance Matrix of a tree Matrix returned by split_weight.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).

Output

A pairwise distance Matrix that can be the input of hc_label.

source

Visualize results

PhyloClustering.plot_clustersFunction
plot_clusters(tree::AbstractMatrix{<:Real}, label::Vector{Int64})

Visualize the result of models.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).
  • label: an N-length Vector containing predicted labels for each tree. People can use the output of the models.

Output

A scatter plot showing tree clusters.

source