Julia implementation of the Needleman-Wunsch pairwise sequence alignment algorithm, along with the Hirschberg space-efficient divide-and-conquer version of the algorithm and a heuristic implementation that approximates the score for the alignment of two sequences.
The project can be loaded into the Julia environment by running
julia --project=.
inside the project root directory. The source code can be exposed and precompiled in the global namespace with using Edist
The project has 3 modules, Full, Hirschberg, and Bounded corresponding to the full dynamic programming implementation,
Hirschberg divide and conquer, and spatially bounded heuristic. For the most part these internal implementations can be ignored
aside from specific parameter tuning.
The main functionality is exposed through the align and score functions, which serve as a wrapper around the various submodules
to expose alignment and scoring in an implementation-agnostic way. Both take a module name as the first argument, as well as two strings,
and returns the alignment/score generated by the implementation specified in the module name, e.g.
julia> align(Bounded, "CACTAG", "ATCA")
(score = -4, seq_alignment = "CACTAG", query_alignment = "-A-TCA", memory_used = 376)score functions similarly but only returns the score
.
├── data
│ ├── graphics
│ └── TP53_cross_species.fasta
├── docs
├── Manifest.toml
├── nbs
│ └── Analysis.ipynb
├── Project.toml
├── README.md
├── src
│ ├── Bounded.jl
│ ├── Edist.jl
│ ├── Full.jl
│ └── Hirschberg.jl
└── test
datacontains any data sources for the code, in this case a FASTA file containing coding sequences for the TP53 protein across speciesdocscontains \LaTeX source and/or PDF slide decks papers documenting research and presentation thereofnbscontains jupyter notebooks for analysis of the projectsrccontains the project source code