AltaiR

AltaiR

AltaiR: a C toolkit for alignment-free and spatial-temporal analysis of multi-FASTA data.

This method provides alignment-free and spatial-temporal analysis of multi-FASTA data through the implementation of a C toolkit highly flexible and with characteristics covering large-scale data, namely extensive collections of genomes/proteomes. This toolkit is ideal for scenarios entangling the presence of multiple sequences from epidemic and pandemic events. AlcoR is implemented in C language using multi-threading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence(s) in (multi-) FASTA format.

The AltaiR toolkit contains one main menu (command: AltaiR) with the six sub menus for computing the features that it provides, namely

  • average: moving average filter of a column float CSV file (the column to use is a parameter);
  • filter: filters FASTA reads by characteristics: alphabet, completeness, length, CG quantity, multiple string patterns and pattern absence;
  • frequency: computes the alphabet frequencies for each FASTA read (it enables alphabet filtering);
  • nc: computes the Normalized Compression (NC) for all FASTA reads according to a compression level;
  • ncd: computes the Normalized Compression Distance (NCD) for all FASTA reads according to a reference;
  • raw: computes Relative Absent Words (RAWs) with CG quantity estimation for all RAWs.