Genskew_univiecube library documentation

The pypi package genskew_univiecube is the backend of all other GenSkew programs. It also functions as a library. It is installable via the pypi page.

Capabilities

With this library you can:

1. scan file and folder paths for fasta and genbank files

2. generate sequences from these files

3. analyze these sequences for theyr nucleotide skew

4. crate skew graphs displaying the skew

Scanning Directories

with the funktion input_files(filenames) you can scan the filenames list for Fasta, Genbank and theyr respective zipped files. You implement them as follows:

import Genskew_univiecube as gs
  
filenames = ['/home/user/sequence.fasta', '/home/user/Downloads']

parsed_filenames = gs.input_files(filenames)

in this case, the Fasta and Genbank files are stored as the list parsed_filenames.

Generating Sequences

with the gen_sequence(file) method you can generate a string with the nucleotides in it. The method uses Biopython to parse the Fasta and Genbank files. It is used as follows:

import Genskew_univiecube as gs
  
filenames = ['/home/user/sequence.fasta', '/home/user/Downloads']

parsed_filenames = gs.input_files(filenames)

sequences = []

for file in parsed_filenames:
    sequences.append(gs.gen_sequence(file))

Here the sequences are stored in the list sequences. The for loop is only needed for multiple files, if you have one single file path then gen_sequence can be applied directly to the pathstring.

Analyzing Sequences

import Genskew_univiecube as gs
  
filename = '/home/user/sequence.fasta'

sequence = gs.gen_sequence(filename)

In order to analyze the sequence, you first need to create an object:

sequence_object = gs.Object(sequence, nucleotide1, nucleotide2, stepsize, windowsize)

The stepsize and windowsize are optional arguments. If they are not specfied they will be calculated by dividing the lenght of the sequence by 1000. You should not set the step and windowsize lower then 100. This is because if they are lower, there could be none of the specified nucleotides in the window and the formula used to calculate the skew would divide by 0. Then you generate the results:

result = gs.Object.gen_results(sequence_object)
skew = result.skew

The data can be extracted from the result variable like in line 6: result.skew for the y values, .x for the x values, .cumulative for the scaled cumulative y values, .cumulative_unscaled , .max_cumulative and .min_cumulative for the y values and .max_cm_position and .min_cm_position for the x values of the max/min cumulative, .stepsize and .windowsize for the respective values and .nuc_1 and .nuc_2 for the first and second nucleotide. Multiple values are stored in lists, single values as numbers and strings. Here is the full code:

import Genskew_univiecube as gs
  
filename = '/home/user/sequence.fasta'

sequence = gs.gen_sequence(filename)

nucleotide1 = 'G'
nucleotide2 = 'C'
stepsize = None
windowsize = None

sequence_object = gs.Object(sequence, nucleotide1, nucleotide2, stepsize, windowsize)

gs.Object.gen_results(sequence_object)

skew = result.skew

Note that you have to specify the two nucleotides. Step and windowsize are optional arguments.

Plotting Graphs

The method for plotting graphs is made with matplotlib.pyplot. The skewi parameter changes the plot to a SkewIT¹ plot. By calling plot_sequence it plots a graph and saves it as a set file:

import Genskew_univiecube as gs
  
filename = '/home/user/sequence.fasta'

sequence = gs.gen_sequence(filename)

nucleotide1 = 'G'
nucleotide2 = 'C'
stepsize = None
windowsize = None

sequence_object = gs.Object(sequence, nucleotide1, nucleotide2, stepsize, windowsize)

result = gs.Object.gen_results(sequence_object)

output_folder = None
out_filetype = 'png'
dpi = None
skewi = False

gs.plot_sequence(result, filename, output_folder, out_filetype, dpi, skewi)

you can save the plot in many different formats, primarily in png and jpg, but every filetype that matpotlib.pyplot supports is supported. The dpi as well as the out_filetype and the outputfolder are optional arguments.

References

1: SkewIT, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008439 (06.04.2022), SkewIT: The Skew Index Test for large scale GC Skew analysis of bacterial genomes, Jennifer Lu, Steven L. Salzberg