RNAstructure logo

RNAstructure Command Line Help
DecoyFinder.py

DecoyFinder applies machine learning trained with features determined by TurboFold to detect sequences that are not homologous with the other sequences in the set.

USAGE: python3 DecoyFinder.py <ct file> [options]

example: python3 /home/username/RNAstructure/scripts/DecoyFinder /home/username/TurboFold.conf

Required parameters:

< configuration file > The name of a file containing required configuration data. This is the TurboFold configuration file that was used for the TurboFold calculation.

Options that do not require added values:

--smp This option is used when TurboFold is not yet run, and TurboFold should be run in parallel.
--FPR False Positive Rate; user can set the threshold for the classification probability. Lower thresholds are more stringent and would lower the False Positive Rate. Default is 0.05.

Notes:

DecoyFinder post-processes the result of a TurboFold calculation. It uses the configuration file from TurboFold. If all the TurboFold output is not available, DecoyFinder calls TurboFold with the configuration file first.

The configuration file must include the option SaveFiles and must include the option StartingSaveFiles. These save the partition function results that are processed by DecoyFinder. The configuration file must also specify the OutAln, the alignment file name, which is also used by DecoyFinder.

The output is to standard out. Identified decoys (non-homologous sequences) are listed. They are named according to the sequence name iside the sequence file.

Required Libraries:DecoyFinder requires installation of the following Python libraries: joblib, numpy, and scikit-learn. We highly recommend the use of conda. A good reference for installing scikit-learn is available at: https://scikit-learn.org/stable/install.html.

Set up RNAstructure: DecoyFinder calls components of the RNAstructure package. Make sure to add the RNAstructure executables to your global path. The following steps show how to do so in Linux:
1- Use a command line-based text editor like nano to open up the bash profile in the users home directory. For example, use "nano ~/.bashrc"
2- Add this line to the file: export PATH=/path/to/RNAstructure/exe/:${PATH}
3- Add this line to the file: export DATAPATH=/path/to/RNAstructure/data_tables
4- Save changes and source the changed bashrc profile. You can use: source ~/.bashrc
On Apple OS, the steps are the same, but edit the file ~/.zshrc .
See Thermodynamics.html .

DecoyFinder Training:

DecoyFinder also provides code for user to train their own model. DecoyFinder training will generate configure file for DecoyFinder to train and output the trained AdaBoost model.

USAGE: python3 DecoyFinder_training.py <sequence file directory> <output file directory> <Model output>

Required parameters:

<sequence file directory>

The directory contains all sequence files.

Note: Please make sure the different RNA family sequence files are saved in different directories, so that the script will be able to build the training set.

<output file directory> The directory that will output all the configure file as well as TurboFold result.
<Model output>                  

The name of the trained machine learning model will be output.

 

References:

  1. Reuter, J.S. and Mathews, D.H.
    "RNAstructure: software for RNA secondary structure prediction and analysis."
    BMC Bioinformatics, 11:129. (2010).
  2. Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M. and Turner, D.H.
    "Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure."
    Proc. Natl. Acad. Sci. USA, 101:7287-7292. (2004).