Commit b21763bd authored by Jana Schor's avatar Jana Schor
Browse files

Example cleaned and tested

parent f67c4f54
Here you find example code for running `deepFPlearn` in all three modes.
The input for each of these scripts can be found in the `data` folder.
The input for each of these scripts can be found in the `example/data` folder.
The pre-computed output of the `train` mode can be found in the assets of the release, for the `predict` mode it is
stored in the respective `example/results_predict` folder.
Trained models that are used in the prediction mode are stored in the `models` folder.
## Train
**Script to use:** `deepFPlearn_train.sh`
**NOTE**: Before you proceed calling `deepFPlearn` activate the conda environment or use the container as described in the main `README.md` of the repository.
Use this script to train a specific autoencoder with a provided data set, and subsequently train feed forward networks
for the targets in this data set.
## Train
The training data contains three targets and you may train models for each using the following command.
Training with the configurations from the `example/train.json` file will take approximately 4min on a single CPU.
```
cd example
./deepFPlearn_train.sh
python -m dfpl train -f example/train.json
```
The trained models, i.e. the stored weights for all neurons as `.hdf5` files, history plots and modelscores are
stored in the `results_train` folder when you run this program.
Pre-computed results can be found in the github release assets.
The trained models, training histories and respective plots, as well as the predictions on the test data are stored in the `example/results_train` folder as defined in the `example/train.json` file (you may change this).
## Predict
**Script to use:** `deepFPlearn_predict.sh`
Use this script to predict the provided set of compounds using generic feature compression and the best AR model for
associations to androgen receptor.
## Predict
Use this command to predict the provided data for prediction.
This will take only a few seconds on single CPU.
```
cd example
./deepFPlearn_predict.sh
python -m dfpl predict -f example/predict.json
```
The compounds are predicted using a *random* model (column `random` in the output) and the *trained* model
(colum `trained` in the output).
The compounds are predicted with the (provided) AR model and results are returned as float number between 0 and 1.
## Convert
**Script to use:** `deepFPlearn_convert.sh`
This mode is used to convert `.csv` or `.tsv` files into `.pkl` files for easy access in Python and to reduce memory on disk.
The `.pkl` files then already contain the binary fingerprints and are ready to use for training oder predicting.
**Note:** Train and Predict modes do not require their inputs to be in .pkl, .csv is also fine but a bit slower.
```
cd example
./deepFPlearn_convert.sh
python -m dfpl convert -f example/data
```
The `.pkl` files are stored in the `examples/data` folder, next to their input files.
......
#!/usr/bin/env bash
script_dir=$(dirname "$0")
D="$script_dir/data/"
# check if conda env exists, create if not, activate it
env=$(conda env list | grep 'dfpl_env' | wc -l)
if [[ $env -ne 1 ]]; then
conda env create -f "$script_dir/../singularity_container/environment.yml"
fi
source activate dfpl_env
if [ -d $D ]; then
export PYTHONPATH="$script_dir/.."
python -m dfpl convert -f $D
fi
#!/usr/bin/env bash
script_dir=$(dirname "$0")
F="$script_dir/predict.json"
# check if conda env exists, create if not, activate it
env=$(conda env list | grep 'dfpl_env' | wc -l)
if [[ $env -ne 1 ]]; then
conda env create -f "$script_dir/../singularity_container/environment.yml"
fi
source activate dfpl_env
# train the models as described in the .json file
if [ -f $F ]; then
export PYTHONPATH="$script_dir/.."
python -m dfpl predict -f $F
fi
conda deactivate
#!/usr/bin/env bash
script_dir=$(dirname "$0")
F="$script_dir/train.json"
# check if conda env exists, create if not, activate it
env=$(conda env list | grep 'dfpl_env' | wc -l)
if [[ $env -ne 1 ]]; then
conda env create -f "$script_dir/../singularity_container/environment.yml"
fi
source activate dfpl_env
# train the models as described in the .json file
if [ -f $F ]; then
export PYTHONPATH="$script_dir/.."
python -m dfpl train -f $F
fi
conda deactivate
#!/usr/bin/env bash
# This script needs to be run from the deepFPlearn directory.
# Importantly, the conda environment needs to be set up and activated! For certain machines/HPC,
# we have a batch-job that does exactly that and then calls this file
function log_error() {
echo "$@" 1>&2
}
function call_convert() {
if [ -d "$1" ]; then
python -m dfpl convert -f "$1"
else
log_error "Could not find directory for data conversion $1"
fi
}
function call_train() {
if [ -f "$1" ]; then
python -m dfpl train -f "$1"
else
log_error "Could not find training file $1"
fi
}
function call_predict() {
if [ -f "$1" ]; then
python -m dfpl predict -f "$1"
else
log_error "Could not find prediction file $1"
fi
}
call_convert "data"
call_train "train.json"
call_predict "predict.json"
......@@ -2,8 +2,8 @@
"py/object": "dfpl.options.Options",
"inputFile": "example/data/train_data.csv",
"outputDir": "example/results_train/",
"ecWeightsFile": "example/models/generic_encoder",
"type": "smiles",
"ecModelDir": "example/models/generic_encoder/",
"type": "smiles",
"fpType": "topological",
"fpSize": 2048,
"encFPSize": 256,
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment