Functional Modules
Bioinformatics
Phylogenetic Tree

Here's the updated documentation based on the changes in your component:

🌳 Create Phylogenetic Tree

This module is a dedicated tool for evolutionary analysis. It processes multiple DNA or protein sequences in FASTA format, calculates a distance matrix using Levenshtein distance, and constructs a phylogenetic tree using hierarchical clustering to represent the evolutionary relationships between the sequences.


1. Input and Generation

Input Area

The main input area requires raw sequence data in FASTA format (maximum 50 sequences).

ActionFunction
Generate TreeExecutes sequence comparison using Levenshtein distance, calculates normalized distance matrix, and builds hierarchical clustering tree using average linkage method.
ClearClears the text editor of all input sequences and resets the visualization.
ExamplePre-fills the input box with sample FASTA sequences to demonstrate the required format.

Sequence Validation

  • DNA Sequences: Only characters A, C, G, T, N are allowed
  • Protein Sequences: Only standard amino acid codes (ARNDCQEGHILKMFPSTWYVX) are allowed
  • Duplicate Headers: Sequence names must be unique
  • Minimum: At least 2 sequences required to build a tree

2. Tree Customization Options

The middle panel provides controls for adjusting the visual display of the generated tree.

Plotting Options

ControlTypeDescription
Show LabelsSwitchToggles the visibility of sequence labels on terminal nodes.
Node ColorColor PickerSets the color for the internal and external nodes of the tree.
Line ColorColor PickerSets the color for the branches/lines connecting the nodes.
Line StyleDropdownSets the visual style of the branches (Solid or Dotted).

Size and Scale

Sliders allow for dynamic adjustment of various visual properties of the tree rendering:

SliderRangeDescription
Link Width1-5Controls the thickness of the branches/lines in the tree.
Resize0.5-2.0Scales the entire tree visualization (zoom in/out).
Node Size1-10Controls the size of the points representing the nodes.
Font Size5-20Controls the size of the text labels for sequence names.

Download

The download buttons allow users to export the final tree visualization in various common formats:

  • SVG - Scalable Vector Graphics format for high-quality scaling
  • PNG - Portable Network Graphics format for web use
  • JPEG - Joint Photographic Experts Group format for general use

3. Algorithm Details

Distance Calculation

  • Uses Levenshtein distance (edit distance) to measure sequence similarity
  • Distances are normalized by the maximum sequence length
  • Handles both DNA and protein sequences automatically

Tree Construction

  • Hierarchical Clustering with average linkage method
  • Distance Matrix based approach for accurate phylogenetic relationships
  • Fallback Mechanism: Creates a star tree when sequences are identical

Tree Structure

  • Root Node: The starting point labeled "Root" representing the common ancestor
  • Internal Nodes: Branch points labeled "Node_1", "Node_2", etc. representing inferred common ancestors
  • Terminal Nodes: Leaves of the tree showing the original sequence headers from FASTA input

4. Phylogenetic Tree Output

The bottom section displays the computed tree visualization using D3.js.

  • Tree Layout: Horizontal dendrogram layout with root on the left and leaves on the right
  • Interactive Scaling: Use the resize slider to zoom in/out for better viewing
  • Branch Relationships: Sequences that share closer branching points are more evolutionarily related
  • Distance Representation: Branch lengths represent evolutionary distance based on sequence similarity

Visualization Features

  • Responsive Design: Automatically adjusts to container size
  • High-Quality Export: SVG export maintains vector quality for publications
  • Customizable Appearance: Full control over colors, sizes, and styles
  • Real-time Updates: Changes to customization options update the visualization immediately

5. Technical Notes

  • Maximum Sequences: Limited to 50 for performance reasons
  • Sequence Length: Handles sequences of varying lengths through normalization
  • Memory Efficient: Processes sequences without requiring extensive computational resources
  • Browser-Based: All computations happen client-side in the web browser