Commit 64dd88dc authored by mai00fti's avatar mai00fti
Browse files

Merge branch 'master' of https://github.com/yigbt/deepFPlearn

parents 6a65d83f 0031fd56
name: Push changes to Gitlab
on: [ push ]
jobs:
job1:
runs-on: ubuntu-latest
steps:
- name: Install Dependencies
run: |
sudo apt-get update && sudo apt-get install -y build-essential\
git
sudo apt-get clean
- name: Push to GitLab
env:
ACCESS_TOKEN: ${{secrets.GITLAB_ACCESS_TOKEN}}
run: |
git clone -b master https://gitlab.hzdr.de/Normo/deepFPlearn.git/
cd deepFPlearn
git config --global user.name $GITHUB_ACTOR
git remote add github https://github.com/yigbt/deepFPlearn.git
git pull github master
git push https://Normo:$ACCESS_TOKEN@gitlab.hzdr.de/Normo/deepFPlearn.git/
validation
modeltraining
predictions
.idea
.pytest_cache
.RDataFiles
tests/DeepChem_results
tests/DFPLwithDC_results
tests/try*
data/B_dataset*
data/combinedSUN-BDB.dataset.4training.csv
data/convert.log
data/*.pkl
data/muv.*
data/S_dataset.pkl
data/S_dataset_extended.pkl
data/Sun_etal_dataset.ARonly.csv
data/Sun_etal_dataset.cids.predictionSet.*
data/SunBDBTox21.merged4training.csv
data/T_dataset*
data/T_tox21ChallengeData_4*
data/tox21*
data/MoleculeNet
singularity_container/dfpl.sif
dfpl/testDFPLwithDCdata.py
__pycache__
dfpl/__pycache__
tests/__pychace__
validation
example/results_*
example/data/convert.log
results
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -604,7 +604,7 @@ def train_nn_models(df: pd.DataFrame, opts: options.TrainOptions) -> None:
row_df = pd.DataFrame([[fold_no,
hist.history['loss'][idx], hist.history['val_loss'][idx],
hist.history['accuracy'][idx], hist.history['val_accuracy'][idx],
hist.history['my_acc'][idx], hist.history['val_my_acc'][idx],
scores[0], scores[1], scores[2]]],
columns=["fold_no", # fold number of k-fold CV
"loss", "val_loss", "acc", "val_acc", # FNN training
......
......@@ -136,14 +136,9 @@ def importDstoxTSV(tsvfilename: str) -> pd.DataFrame:
conversion_rules = {
# "S_dataset.csv": importSmilesCSV,
# "S_dataset_extended.csv": importSmilesCSV,
"D_dataset.tsv": importDstoxTSV,
# MoleculeNet Benchmark data
# "tox21.csv": importSmilesCSV,
# "bace_cols1-3.csv": importSmilesCSV,
# "HIV.csv": importSmilesCSV,
# "muv.csv": importSmilesCSV,
# "pcba.csv": importSmilesCSV,
# "sider_convertedHeader.csv": importSmilesCSVđ
# "D_dataset.tsv": importDstoxTSV,
"train_data.csv": importSmilesCSV,
"predict_data.csv": importDstoxTSV
}
......
Here you find example code for running `deepFPlearn` in all three modes.
The input for each of these scripts can be found in the `data` folder.
The pre-computed output can be found in the `results_[train,predict,convert]` folder.
Trained models that are used in the prediction mode are stored in the `models` folder.
## Train
**Script to use:** `deepFPlearn_train.sh`
Use this script to train a specific autoencoder with a provided data set, and subsequently train feed forward networks
for the targets in this data set.
This script should be called from the `examples` folder:
```
cd example
./deepFPlearn_train.sh
```
The trained models, i.e. the stored weights for all neurons as `.hdf5` files, history plots and modelscores are
stored in the `results_train` folder when you run this program.
Pre-computed results can be found in the github release assets.
## Predict
**Script to use:** `deepFPlearn_predict.sh`
Use this script to predict the provided set of compounds using generic feature compression and the best AR model for
associations to androgen receptor.
This script should be called from the `examples` folder:
```
cd example
./deepFPlearn_predict.sh
```
The compounds are predicted using a *random* model (column `random` in the output) and the *trained* model
(colum `trained` in the output).
## Convert
**Script to use:** `deepFPlearn_convert.sh`
This mode is used to convert `.csv` or `.tsv` files into `.pkl` files for easy access in Python and to reduce memory on disk.
The `.pkl` files then already contain the binary fingerprints and are ready to use for training oder predicting.
**Note:** Train and Predict modes do not require their inputs to be in .pkl, .csv is also fine but a bit slower.
This script should be called from the `examples` folder:
```
cd example
./deepFPlearn_convert.sh
```
The `.pkl` files are stored in the `examples/data` folder, next to their input files.
If you do this with a custom file of a different file name, you have to edit the code like this:
1. open file dfpl/fingerprint.py
2. search for the conversation_rules
3. add your filename in the same style
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
#!/bin/bash
D="example/data/"
# check if conda env exists, create if not, activate it
env=$(conda env list | grep 'dfpl_env' | wc -l)
if [[ $env -ne 1 ]]; then
conda env create -f ../singularity_container/environment.yml
fi
source activate dfpl_env
cd ..
if [ -d $D ]; then
python -m dfpl convert -f $D
fi
#!/bin/bash
F="example/predict.json"
# check if conda env exists, create if not, activate it
env=$(conda env list | grep 'dfpl_env' | wc -l)
if [[ $env -ne 1 ]]; then
conda env create -f ../singularity_container/environment.yml
fi
source activate dfpl_env
cd ..
# train the models as described in the .json file
if [ -f $F ]; then
python -m dfpl predict -f $F
fi
conda deactivate
#!/bin/bash
F="example/train.json"
# check if conda env exists, create if not, activate it
env=$(conda env list | grep 'dfpl_env' | wc -l)
if [[ $env -ne 1 ]]; then
conda env create -f ../singularity_container/environment.yml
fi
source activate dfpl_env
cd ..
# train the models as described in the .json file
if [ -f $F ]; then
python -m dfpl train -f $F
fi
conda deactivate
{
"py/object": "dfpl.options.PredictOptions",
"inputFile": "example/data/predict_data.csv",
"outputDir": "example/results_predict",
"ecWeightsFile": "example/models/generic_encoder.hdf5",
"model": "example/models/AR.model.hdf5",
"type": "smiles",
"fpType": "topological"
}
{
"py/object": "dfpl.options.TrainOptions",
"inputFile": "data/D_dataset.pkl",
"outputDir": "validation/case_00/results_AC_D/",
"ecWeightsFile": "ac_D.encoder.hdf5",
"inputFile": "example/data/train_data.csv",
"outputDir": "example/results_train/",
"ecWeightsFile": "encoder.hdf5",
"type": "smiles",
"fpType": "topological",
"epochs": 3000,
......@@ -13,6 +13,6 @@
"enableMultiLabel": false,
"verbose": 2,
"trainAC": true,
"trainFNN": false,
"compressFeatures": false
}
\ No newline at end of file
"trainFNN": true,
"compressFeatures": true
}
name: rdkit2019
channels:
- defaults
- conda-forge
- bioconda
- r
- anaconda
- defaults
dependencies:
- tensorflow
- ncurses
- numpy
- pcre
- matplotlib
- keras
- pytest
- pyyaml
- python
- matplotlib
- rdkit
- seaborn
- tensorboard
- yaml
- markdown
- scipy
- jsonpickle
- pip
- pcre
- tensorflow
- tensorboard
- scipy
- yaml
- pyyaml
- python
- ncurses
- scikit-learn
- pandas
- keras
- jsonpickle
prefix: /data/conda-envs/rdkit2019
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment