Predicting the bioactivity of 2-alkoxycarbonylallyl esters as potential antiproliferative agents against pancreatic cancer (MiaPaCa-2) cell lines: GFA-based QSAR and ELM-based models with molecular docking

Oyeneyin, Oluwatoba Emmanuel; Obadawo, Babatunde Samuel; Olanrewaju, Adesoji Alani; Owolabi, Taoreed Olakunle; Gbadamosi, Fahidat Adedamola; Ipinloju, Nureni; Modamori, Helen Omonipo

doi:10.1186/s43141-021-00133-2

Research
Open access
Published: 10 March 2021

Predicting the bioactivity of 2-alkoxycarbonylallyl esters as potential antiproliferative agents against pancreatic cancer (MiaPaCa-2) cell lines: GFA-based QSAR and ELM-based models with molecular docking

Oluwatoba Emmanuel Oyeneyin^1,2,
Babatunde Samuel Obadawo ORCID: orcid.org/0000-0001-7292-6564^1,2,
Adesoji Alani Olanrewaju⁴,
Taoreed Olakunle Owolabi³,
Fahidat Adedamola Gbadamosi⁵,
Nureni Ipinloju^1,2 &
…
Helen Omonipo Modamori⁶

Journal of Genetic Engineering and Biotechnology volume 19, Article number: 38 (2021) Cite this article

1965 Accesses
18 Citations
2 Altmetric
Metrics details

Abstract

Background

The number of cancer-related deaths is on the increase, combating this deadly disease has proved difficult owing to resistance and some serious side effects associated with drugs used to combat it. Therefore, scientists continue to probe into the mechanism of action of cancer cells and designing novel drugs that could combat this disease more safely and effectively. Here, we developed a genetic function approximation model to predict the bioactivity of some 2-alkoxyecarbonyl esters and probed into the mode of interaction of these molecules with an epidermal growth factor receptor (3POZ) using the three-dimensional quantitative structure activity relationship (QSAR), extreme learning machine (ELM), and molecular docking techniques.

Results

The developed QSAR model with predicted (R²_pred) of 0.756 showed that the model was fit to be validated parameter for a built model and also proved that the developed model could be used in practical situation, R² for training set (0.9929) and test set (0.8397) confirmed that the model could successfully predict the activity of new compounds due to its correlation with the experimental activity, the models generated with ELM models showed improved prediction of the activity of the molecules. The lead compounds (22 and 23) had binding energies of −6.327 and −7.232 kcalmol⁻¹ for 22 and 23 respectively and displayed better inhibition at the binding sites of 3POZ when compared with that of the standard drug, chlorambucil (−6.0 kcalmol⁻¹). This could be attributed to the presence of double bonds and the α-ester groups.

Conclusion

The QSAR and ELM models had good prognostic ability and could be used to predict the bioactivity of novel anti-proliferative drugs.

Background

In the world, pancreatic cancer (PC) is ranked the fourth highest cause of cancer-related deaths and the fourteenth most common cancer [1]. By the year 2030, it is predicted that it would become the second-most common cause of cancer-related deaths [2]. Pancreatic cancer is mainly divided into two types: (i) pancreatic adenocarcinoma (PA), the most common (85% of cases), arises in exocrine glands of the pancreas, and (ii) pancreatic neuroendocrine tumor (Pan-NET) which occurs in the endocrine tissue of the pancreas and less common [3]. MIAPaCa-2 and PANC-1 are two commonly used cancer cell lines for studies of PA [4]. PA is usually advanced at the time of diagnosis as it is relatively symptom-free [5]. With its poor prognosis, only 24% of people survive 1 year and 9% live for 5 years [3].

Late disease diagnosis, attainment of resistant characteristics, the paucity of effective therapies, and metastatic nature among others are some of the proposed reasons for the poor survival rate of PC patients [6]. Cytotoxic treatments presently used have failed, example of such is gemcitabine [7], with patients hardly survive more than 6 months after therapy, according to study [6]. 5-Fluorouracil, however, has been shown to be effective and boosts survival rate of patients [6]. In the last decades, the repurposing of approved drugs to treat cancer birthed drugs like celecoxib, metformin, sulindac, or TriFluoroPerazine (TFP). Although, there are limitations associated with them, an instance with TFP is that patients develop neurological side effects when it is used to treat cancer [8].

Some compounds such as methyl protodioscin could inhibit proliferation and promote apoptosis of MIAPaCa-2 cells [9]. A small-molecule, trisubstituted naphthalene diimide compound, is potent against PC cell line MIAPaCa-2 [10]. Another compound is ursolic acid, a popular anti-inflammatory and immunosuppressive agent which inhibited growth and induced apoptosis in a dose-dependent manner [11]. Due to this, the necessity of developing better and more effective therapeutics with improved activity at low concentrations that will ensure a longer survival rate and inhibit resistance against the MIAPa-Ca cancer cell lines cannot be overemphasized.

Quantitative structure activity relationship (QSAR) is one of the commonest approaches in ligand-based drug design process among Scaffold hopping, pseudo-receptor and pharmacophore modeling. The key goal of QSAR studies is to determine a mathematical model between the property under investigation, and one or more molecular descriptors [12,13,14]. Using the model, similar bioactivities of compounds not involved in the training set can be predicted from their structural descriptors [15]. Here, we used genetic function approximation (GFA) and ELM models to predict the bioactivity of the molecules under investigation.

ELM belongs to a class of feed forward neural networks characterized with a single hidden layer. The algorithm attains its uniqueness through random determination of the biases as well as the weights connecting hidden and input layers [16]. These features result into fast convergence, high training speed while the simple structure of ELM algorithm is maintained and thereby leads to enhanced computational efficiency coupled with impressive robustness [17].

Molecular docking analyzes the pose of molecules into the binding site of a macromolecular target and also probe into the mechanisms of binding of bioactive compounds and biological targets/receptors [3, 18, 19]. Therefore, this work is aimed at using QSAR to create a mathematical model from the selected training set of 2-alkoxycarbonylallyl esters derivatives (available in the literature) [20] to predict the activity of these compounds as anticancer agents against MIAPaCa-2 cancer cell lines and elucidate the interaction of these compounds as MIAPaCa-2 cancer cell line inhibitor via molecular docking studies.

Methods

QSAR Studies

Data collection

A series of twenty-four compounds of 2-alkoxycarbonylallyl esters [20] as a potential anticancer were collected from the literature. The curative activities of the compounds against given in IC₅₀ (μM) were converted into their corresponding pIC₅₀ values (i.e., - log IC₅₀ = pIC₅₀) in order to make the activities fit to a range of values and also to suit normal distribution curve. The experimental activities, pIC₅₀ values, and compounds are presented (Table 1).

Table 1 2-alkoxycarbonylallyl esters and their (pIC₅₀)

Full size table

The 2D structures of the compounds (Table 1) were drawn using ChemDraw Ultra 12.0 [21] and then saved as cdx file. The cdx file format of the compounds drawn was moved to the Spartan 14 software [22] and optimized using molecular mechanics with the molecular mechanics force field (MMFF) to generate their most stable conformers followed by a restricted hybrid Hartree Fock—density functional theory self-consistent field (HF-DFT SCF) calculation using Pulay’s DIIS and geometric direct minimization with the 6-31G* basis set [23]. The molecular structures were saved as sdf file format after optimizing the structures.

Molecular descriptor calculation

The optimized compounds were subjected into PaDEL-Descriptor software version 2.20 [24] in order to calculate the 1D, 2D, and 3D descriptors of the compounds. After removing salt, detecting tautomer and retaining the file name as molecule name, a total of 1875 descriptors were generated saved as Microsoft Excel Comma Separated value (csv) file.

Mathematical description of the proposed ELM algorithm

ELM formalisms allow three layers with connected neurons. Pre-supposing that the number of neurons in the output, hidden, and input layers are indicated by ξ,η and μ, respectively with the threshold number of neurons for the input layer set as I = [I₁, I₂, .…I_μ]^T. Equation 1 defines the weights linking the hidden with input layer as represented by ψ while Eq. 2 presents the weights linking the hidden with output layer is χ.

$$ \boldsymbol{\uppsi} ={\left[\begin{array}{cccc}{\psi}_{11}& {\psi}_{12}& \dots & {\psi}_{1\eta}\\ {}{\psi}_{21}& {\psi}_{21}& \dots & {\psi}_{2\eta}\\ {}:& :& :& :\\ {}{\psi}_{\mu 1}& {\psi}_{\mu 2}& \dots & {\psi}_{\mu \eta}\end{array}\right]}_{\mu x\eta} $$

(1)

$$ \boldsymbol{\upchi} ={\left[\begin{array}{cccc}{\chi}_{11}& {\chi}_{12}& \dots & {\chi}_{1\xi}\\ {}{\chi}_{21}& {\chi}_{21}& \dots & {\chi}_{2\xi}\\ {}:& :& :& :\\ {}{\chi}_{\mu 1}& {\chi}_{\mu 2}& \dots & {\chi}_{\mu \xi}\end{array}\right]}_{\mu x\xi} $$

(2)

The experimental activities of the investigated compounds as well as the implemented descriptors are presented in matrix form as depicted in Eq. 3 and Eq. 4, respectively.

$$ \mathbf{D}={\left[\begin{array}{cccc}{d}_{11}& {d}_{12}& \dots & {d}_{1j}\\ {}{d}_{21}& {d}_{21}& \dots & {d}_{2j}\\ {}:& :& :& :\\ {}{d}_{\mu 1}& {d}_{\mu 2}& \dots & {d}_{\eta j}\end{array}\right]}_{\eta xj} $$

(3)

$$ A={\left[\begin{array}{cccc}{A}_{11}& {A}_{12}& \dots & {A}_{1j}\\ {}{A}_{21}& {A}_{21}& \dots & Aj\\ {}:& :& :& :\\ {}{A}_{\mu 1}& {A}_{\mu 2}& \dots & {A}_{\mu j}\end{array}\right]}_{\mu xj} $$

(4)

where j is the number of training samples.

With inclusion of a differentiable activation function γ(.)and network output α, the mathematical expression representing the working principles of ELM algorithm is presented in Eq. 5 [25, 26].

$$ \mathbf{P}\boldsymbol{\upchi } ={\boldsymbol{\upalpha}}^T $$

(5)

Equations 6 and 7 respectively presents the matrix form of hidden layer output (P) and network output.

$$ \mathbf{P}={\left[\begin{array}{cccc}\gamma \left({\psi}_1{d}_1+{I}_1\right)& \gamma \left({\psi}_2{d}_1+{I}_2\right)& \dots & \gamma \left({\psi}_{\mu }{d}_1+{I}_{\mu}\right)\\ {}\gamma \left({\psi}_1{d}_2+{I}_1\right)& \gamma \left({\psi}_2{d}_2+{I}_2\right)& \dots & \gamma \left({\psi}_{\mu }{d}_2+{I}_{\mu}\right)\\ {}:& :& :& :\\ {}\gamma \left({\psi}_1{d}_j+{I}_1\right)& \gamma \left({\psi}_2{d}_j+{I}_2\right)& \dots & \gamma \left({\psi}_{\mu }{d}_j+{I}_{\mu}\right)\end{array}\right]}_{jx\mu} $$

(6)

$$ \alpha ={\left[{\alpha}_1,{\alpha}_2,.\dots, {\alpha}_j\right]}_{\mu xj} $$

(7)

$$ {\boldsymbol{\upalpha}}_k=\left[\begin{array}{c}{\alpha}_{1k}\\ {}{\alpha}_{2k}\\ {}:\\ {}{\alpha}_{\mu k}\end{array}\right]={\left[\begin{array}{c}\sum \limits_{i=1}^{\mu }{\lambda}_{i1}\varphi \left({\boldsymbol{\upsigma}}_k{\mathbf{r}}_i+{z}_i\right)\\ {}\sum \limits_{i=1}^{\mu }{\lambda}_{i2}\varphi \left({\boldsymbol{\upsigma}}_k{\mathbf{r}}_i+{z}_i\right)\\ {}:\\ {}\sum \limits_{i=1}^{\mu }{\lambda}_{i\xi}\varphi \left({\boldsymbol{\upsigma}}_k{\mathbf{r}}_i+{z}_i\right)\end{array}\right]}_{\xi}\left(k=1,2,\dots, j\right) $$

Minimization of ‖Pχ − α^T‖ yields Eq. 8.

$$ \boldsymbol{\upchi} ={\mathbf{P}}^{+}{\boldsymbol{\upalpha}}^{\mathbf{T}} $$

(8)

while P⁺stands for the Moore-penrose generalized inverse of P.

Computational details

QSAR methods

Normalization and data pre-treatment

The descriptors were normalized (Eq. 9) [27] and pre-treated using the Data Pre-treatment software [28].

$$ X=\frac{X_1-{X}_{\mathrm{min}}}{X_{\mathrm{max}}-{X}_{\mathrm{min}}} $$

(9)

where X₁ is the descriptor’s value for each molecule, X_min and X_max are minimum and maximum value for each descriptor. This is done in order to filter descriptors with redundant data, highly correlated data, reduce colinearity thereby improving the performance of the model.

Data division

The pre-treated dataset was split into training and test sets by employing Kennard and Stone’s algorithm [29]. The training set, 70% of the data sets (16 compounds) was used to build the model and validated internally while 30% of the data sets (8 compounds) were used to externally validate the built model.

Model development

Material studio 2017 software was used to build the model employing the GFA method while fixing the biological activities (pIC₅₀) as the dependent variable and the physiochemical properties (descriptors) as the independent variables.

Internal validation of model

The training compounds were validated internally using material studio. The validation parameters include the following:

Friedman’s lack of fit (LOF)

The models generated were appraised using LOF to obtain their fitness score (Eq. 10) [30]

$$ LOF=\frac{SEE}{{\left(1-\frac{c+\left(d\times p\right)}{M}\right)}^2} $$

(10)

SEE, the standard error of estimation; p, the number of descriptors used; d is a user-defined smoothing parameter, c is the number of model terms, and M is the number of compound in the training set [31].

For a good model, SEE value must be low, it is given as (Eq. 11):

$$ SEE=\sqrt{\frac{{\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{pred}}\right)}^2}{n-p-1}} $$

(11)

where n is the number compounds that made up the training set, Y_exp is experimental activity and Y_pred is the predicted activity in the training set The correlation coefficient (R ²)

This is the mostly used for internal assessment of QSAR models. The R² (Eq. 12) is expected to be very close to 1 to generate a good model.

$$ {R}^2=1-\frac{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{pred}}\right)}^2}{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{training}}\right)}^2} $$

(12)

where Y_training is the mean of the experimental activity.

Adjusted (R ²)

(Eq. 13) value varies directly with the increase in number of descriptors, it is important because it measures the reliability and stability of a model unlike R².

$$ {R}_{\mathrm{adj}}^2=\frac{\left({R}^2-p\right)\times \left(n-1\right)}{n-p-1} $$

(13)

The cross validation coefficient $ \left({Q}_{cv}^2\right) $

The strength of the QSAR model was cross-validated (Eq. 14).

$$ \left({Q}_{cv}^2\right)=1-\left\{\frac{\sum {\left({Y}_{\mathrm{pred}}-{Y}_{\mathrm{exp}}\right)}^2}{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{training}}\right)}^2}\right\} $$

(14)

Where Y_training, Y_exp, and Y_pred were defined earlier

External validation of the model

The built model was validated externally by the R²_predicted value. The R²_predicted value is the most commonly used parameter to validate a built model. The R²_test is defined by (Eq. 15):

$$ {R}^2=1-\frac{\sum {\left({Y_{\mathrm{pred}}}_{\mathrm{test}}-{Y_{\mathrm{exp}}}_{\mathrm{test}}\right)}^2}{\sum {\left({Y_{\mathrm{pred}}}_{\mathrm{test}}-{\overline{Y}}_{\mathrm{training}}\right)}^2} $$

(15)

Statistical analysis of the descriptor

Variance inflation factor

VIF (Eq. 16) measures the multicolinearity of the descriptors among each other and also measures the degree at which one descriptor correlates with the others in a model.

$$ \mathrm{VIF}=\frac{1}{1-{R}^2} $$

(16)

R² is the multiple correlation coefficient between all variables used in the model. If the VIF = 1, it signifies that there is no intercorrelation in each variable, and if it is between 1 and 5, it is acceptable. VIF value > 10 makes the model unstable.

Mean effect

This correlates the effect of a particular molecular descriptor on the activities of the compounds. A change in the descriptors’ values improves the activity of the compounds. It is defined (Eq. 17):

$$ \mathrm{Mean}\ \mathrm{Effect}=\frac{\ {B}_j{\sum}_i^n{D}_j}{\sum_j^m\left(\ {B}_j{\sum}_i^n{D}_j\right)} $$

(17)

where B_j and D_j are the j-descriptor coefficient in the model and the values of each descriptor in training set, while m and n stand for the number of molecular descriptors and number of molecules in a training set respectively. Therefore, the mean effect of each descriptor used in building the model was calculated to assess the significance of the model [32].

Evaluation of the applicability domain of the model

This is a vital step in establishing that the model is good to make predictions [33]. The leverage approach [34] was applied here. Leverage of a given chemical compound hi, is defined as (Eq. 18):

$$ hi={X}_i{\left({X}^TX\right)}^{-1}{X}_i^T $$

(18)

X_i is the training compounds matrix of i. X is the m × k descriptor matrix of the training set compound and X^T is the transpose matrix of X used to build the model. The warning leverage, h* (Eq. 19) is the limit of normal values for X outliers:

$$ {h}^{\ast }=3\frac{\left(k+1\right)}{n} $$

(19)

where n and k are the descriptors and the training set compounds respectively.

Quality assurance of the model

The validation parameters test the strength, dependability, and predictiveness of a built-in QSAR model. Consequently, Table 2 establishes general minimum specifications for both internal and external validation parameters to validate a QSAR model [34]. The list of descriptors, their descriptions, and dimension are presented (Table 3).

Table 2 Generally recommended values for the validation parameters for a built model

Full size table

Table 3 List of descriptors, their constructors, description and dimension used in building the QSAR model

Full size table

ELM-based models

The modeling and simulation that leads to the development of the proposed ELM-based models was implemented on MATLAB. The descriptors to the model as well as the experimental activities of the investigated compounds were initially randomized prior to separation into training and testing sets of data in the ratio of 4:1, respectively. Randomization enhances even distribution of data points and improves computational efficiency of the algorithm. The training dataset was employed in generating weights which are further validated using testing set of data. The procedures for the computational implementation of the proposed ELM-based models are detained as follow:

Step I: Initial random number generation: Pseudo random number that controls the randomly generated biases as well as the weights linking the hidden with input layers were initiated with the aid of seeding in MATLAB.
Step II: Optimum activation function and corresponding hidden nodes: One activation function was selected after the other within a pool of functions which include sine function, hardlim function, and sigmoid function triangular basis function among others. Hidden nodes were optimized within search space of 1 to 100.
Step III: Computation of hidden layer output matrix: Equation (6) was implemented on training dataset for the calculation of the hidden layer output matrix.
Step IV: Output weight computation: with the aid of the least square solution to the set of the generated linear system of equations, the output weights were computed.
Step V: Evaluation of the developed models: The developed models during the training phase of the simulation were evaluated using testing set of data. The performance of the model was assessed using four different performance measuring parameters which include MAE, RMSE, CC, and MAPD. The models characterized with lowest error (MAE, RMSE, and MAPD) and highest CC was saved as the best models. The hyper-parameters of the best model as well as the weights were also saved for ensuring reproducibility of the results.

Molecular docking and ADME/Tox screening

Protein and ligands preparation

The 3D crystal structure of an epidermal growth factor receptor (EGFR), a kinase domain (PDB ID: 3POZ) was downloaded from the protein data bank (Fig. 1) [35]. The receptor has two ligands (O3P) and sulfate ion (SO₄) in its active sites [36]. The receptor was chosen because it is an EGFR which best works with cancer lines caused by cell proliferation and its high resolution of 1.5 Å [36, 37]. The downloaded protein was refined using the Protein Preparation Wizard [37, 38] by assigning charges and bond orders with the removal of water molecules and addition of hydrogens to the heavy atoms. Energy minimization was done using OPLS3 [38, 39].

Ligand preparation

The lead molecules (22 and 23) were selected for docking and prepared using ligand preparation (ligprep) in Schrödinger Suite 2017-1 with an OPLS3 force field [39] in order to create three dimensional geometries and to assign proper bond orders. Epik 2.2 in Schrödinger Suite at pH 7.0 ± 2.0 was used to generate the ionization state [40]. They, alongside with a standard drug, chlorambucil were docked at the active site of the receptor (x = 20.3785826772, y = 32.8299212598, and z = 14.8903149606).

ADME/Tox screening

The compounds were further screened for their drug-likeness by calculating the absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties using the QikProp program [41]. These properties check drug fitness and save cost involved in bioassay studies [42]. Descriptors like molecular weight (Mw), number of hydrogen bond donors (donorHB) and number of hydrogen bond acceptors (acceptHB), solvent accessible surface area (SASA), octanol/water partition coefficient (QPlog Po/w), and value for serum protein binding (QPlogKHsa) were considered (Table 11).

Results

QSAR results of 2-alkoxycarbonylallyl esters

A QSAR model was developed to predict the activity of MIAPaCa-2 cancer cell lines in 2-alcoxycarbonylallyl esters (pIC50). The multiple linear regression (MLR) for the built model is shown in Eq. 20.

$$ \mathbf{pIC}\mathbf{5}\mathbf{0}=\left(2.115051910\times \boldsymbol{ATSC}\mathbf{3}\boldsymbol{c}\right)-\left(0.917421961\times \boldsymbol{MATS}\mathbf{5}\boldsymbol{p}\right)-\left(\ 0.160590092\times \boldsymbol{minHBint}\mathbf{5}\right)+\left(724494169\times \boldsymbol{ETA}\_\boldsymbol{Shape}\_\boldsymbol{P}\right)+3.419964012 $$

(20)

The list of descriptors, their constructors, description, and dimension used in building the QSAR model are reported in Table 3. Table 4 represents the comparison between experimental activity (pIC50), predicted activity (pIC50), and residual of the developed model which is used in validating the model externally as reported in Tables 5 and 6.

Table 4 Comparison of experimental activity (pIC₅₀), predicted activity (pIC₅₀), and residual of developed model

Full size table

Table 5 External validation of developed model

Full size table

Table 6 Calculation of the predicted R² of developed model

Full size table

Table 7 presents the Pearson’s correlation, mean effect, and variance inflation factor of the descriptors used in building the model. The experimental activity is plotted against the predicted activity for both training set, and test set shown in Fig. 2. The scatter plot between standardized residual activity and experimental activity which explains the randomness of the activities on both negative and positive sides of y-axis is in Fig. 3. While Fig. 4 presents the Williams plot of standardized residual against leverages for the developed model.

Table 7 Pearson’s correlation matrix, VIF and ME of the descriptors used in the built model

Full size table

Comparison of the estimates of developed QSAR and ELM-based models

The performance of the developed ELM-based models was compared with that of QSAR model using four different performance measuring parameters including mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (CC), and mean absolute percentage deviation (MAPD). The outcomes of the comparison are presented in Table 9 and Fig. 5a-d. The comparison of the outcomes of each of the developed models with inclusion of percentage error for each of the investigated compounds is presented in Table 10.

Table 8 Validation parameter for the built model

Full size table

Table 9 Performance comparison of the developed QSAR and ELM-based models

Full size table

Table 10 Comparison of the estimates of the developed models

Full size table

Molecular docking

Molecular docking studies were carried out to understand the mechanism of action of some 2-alkoxycarbonylallyl esters against pancreatic cancer (MiaPaCa-2) cell line targeting the epidermal growth factor receptor (EGFR). The structure of the prepared receptor is shown in Fig. 1. The reference drug (chlorambucil) and lead compounds (22 and 23) were docked with the epidermal receptor growth factor (3POZ) to elucidate the interaction and the binding mode. The binding affinities and ADME/Tox properties of chlorambucil and the lead compounds (22 and 23) are presented in Table 11. While the interactions between the epidermal receptor growth factor (3POZ) with the chlorambucil and lead compounds (22 and 23) are presented respectively in Figs. 5, 6, and 7.

Table 11 Binding affinities and ADME/Tox properties of some of the lead molecules and chlorambucil

Full size table

Discussion

QSAR results of 2-alkoxycarbonylallyl esters

Four different models were generated using the genetic function approximation technique. The first model was selected to be the best model due to its statistical significance and the fact that it satisfied the recommended standard for a stable and reliable model as outlined (Table 2).

2D descriptors played a vital role in predicting the activity of new molecules that can inhibit against MIAPaCa-2 cancer cell line. The positive coefficient of ATSC3c, MATS5p, minHBint5, and ETA_Shape_P descriptors in the model inferred that increase in the coefficient of the descriptor will improve the activity (pIC₅₀) of 2-alkoxycarbonylallyl esters against MIAPaCa-2 cancer cell line. Hence, to design potent compounds with high pIC₅₀ value, the positive coefficient of the descriptors will have to be increased.

The low values of residual recorded in Table 4 affirmed that there is high correlation between the experimental activities and predicted activities. The value of R²_pred (0.7560) signifies that the model has passed the minimum recommended value for validating parameter for a built model presented in Table 2. The idea that once the R²_predicted value is considered satisfied, the remaining parameters will also be satisfied is not always true as additional statistical analyses can help validate the built model such as variance inflation factor (VIF) and mean effect (ME). The closer the value of R²_test to 1.0, the better the stability the model generated. Therefore, in prediction of the behavior of a new compound, the stability will take into account model reliability.

The Pearson’s correlation, ME, and VIF of the descriptors are presented in Table 7. The low correlation values ≤ 0.5 in most descriptors inferred that the descriptors do not correlate with one another. This ascertains that there is no bias in the prediction made by the model. The mean effect of the descriptors shown in Table 7 signifies the effect of molecular descriptors on the activity pIC₅₀ of the compounds. Thus, the order of decreasing effect is ETA_Shape_P > ATSC3c > MATS5p> minHBint5.

The experimental activity was plotted against predicted activity for both training set and test set is reported in Fig. 2. The high value of correlation coefficient R² for training set (0.9929) and test set (0.8397) confirmed that the model can successfully predict the activity of a new compound due to its correlation with the experimental activity. The randomness of the activities on both negative and positive sides of y-axis shown on the scatter plot between standardized residual activity and the experimental activity reported in Fig. 3 confirmed the built model was free from systematic error. To discover outliers and influential compounds in the built model, the standardized residual activity for the entire data set was plotted against the leverages. Williams plot (Fig. 4) confirmed that there was only one influential compound (20) with leverage value of 1.00 which is greater than the warning value (h= 0.9375).

Comparison of the estimates of developed QSAR and ELM-based models

Using MAE metric, the ELM-Sig built model was more efficient than ELM-Sine and QSAR models with a performance improvement of 4.09% and 14.92% respectively. The model performed higher with a performance improvement of 9.53% and 30.68% of the RMSE performance assessment parameter respectively. For CC and MAPD metrics, the developed ELM-Sig model performed better than ELM-Sine and QSAR models with performance enhancement of 0.24% and 0.57% as well as 4.86% and 17.38%, respectively. The developed ELM-Sine model also performed better than QSAR-based model with improved performance of 10.41%, 19.30%, 0.33%, and 11.93%, respectively using MAE, RMSE, CC, and MAPD as performance measuring parameters.

From Table 10, the results of the developed ELM-sig model show persistence closeness with the measured values. The superiority of the developed ELM-based model over the QSAR model can be attributed to the intrinsic feature of ELM algorithm in approximating non-linear and complex relationship linking the descriptors to the target. Ability of sigmoid activation function to approximate well is reveled from the performance of ELM-Sig model over that of ELM-Sine model.

Molecular docking

The lead compounds (22 and 23) showed good docking score in the receptor’s active sites and are better than chlorambucil, a standard drug (Table 11). Compound 22 (−6.327 kcalmol⁻¹) showed conventional hydrogen bond with CYT 737 and ASP 855 from its carbonyl oxygen from the α-ester group (Fig. 5). Compound 23 (−7.232 kcalmol⁻¹) also showed conventional hydrogen bond with ASP 855 from its carbonyl oxygen from the α-ester group (Fig. 6). Chlorambucil (−5.826 kcalmol⁻¹) showed no conventional hydrogen bond with any of the amino acid groups (Fig. 7). The lead molecules have better activity than chlorambucil as reported by Conor et al. [20]; the bonds responsible for their bioactivity according are the double bond and the α-ester groups. The ADME/Tox properties (Table 11) showed that these compounds are fit as drugs, according to Lipinski’s rule of five [19, 43, 44].

Conclusion

In this study, a GFA model, together with two EML models were used to predict the potential activity of 2-alkoxycarbonylallyl esters as anticancer against MIAPaCa-2 pancreatic cancer cell lines while also probing into the mechanisms of interaction of the lead compounds against an epidermal growth factor receptor kinase domain, 3POZ via molecular docking approach. The GFA model generated was thoroughly validated; this model was compared to with ELM-Sig model and ELM-Sine model, with the ELM-Sig model proving the best in bioactivity prediction of these molecules. Binding of the lead compounds with the receptor showed they had better inhibitory potentials than chlorambucil. In the 2D interaction diagrams, it was seen that the compounds bind to the receptor predominantly through the hydrogen bond interaction and also mainly the carbonyl oxygen atoms of the α-esther group were responsible for their interaction, which is in line with what was observed in the experiment.

Availability of data and materials

Structures, experimental activity, molecular descriptors, and statistical analysis during the study are included in this published article.

Abbreviations

GFA:: Genetic function approximation
EGFR:: Epidermal growth factor receptor
ELM:: Extreme learning machine
PA:: Pancreatic adenocarcinoma
Pan-NET:: Pancreatic neuroendocrine tumor
TFP:: TriFluoroPerazine
QSAR:: Quantitative structure activity relationship

References

McGuigan A, Kelly P, Turkington R, Jones C, Coleman H, McCain (2018) Pancreatic cancer: a review of clinical diagnosis, epidemiology, treatment and outcomes. World J Gastroenterol 24(43):4846–4861. https://doi.org/10.3748/wjg.v24.i43.4846
Article Google Scholar
Shiomi Y, Yoshimura M, Kuki K, Hori Y, Tanaka T, Co ZP (2017) Z-360 suppresses tumor growth in MIA PaCa-2-bearing mice via inhibition of gastrin-induced anti-apoptotic effects. Anticancer Res 37:4127–4137. https://doi.org/10.21873/anticanres.11800
Article Google Scholar
Rawla P, Barsouk A (2019) Epidemiology of gastric cancer: global trends , risk factors and prevention. Gastroenterol Rev 14(1):26–38. https://doi.org/10.5114/pg.2018.80001
Article Google Scholar
Shen Y, Pu K, Zheng K, Ma X, Qin J, Jiang L (2019) Differentially expressed microRNAs in MIA PaCa-2 and PANC-1 pancreas ductal adenocarcinoma cell lines are involved in cancer stem cell regulation. Int J Mol Sci 20:1–16. https://doi.org/10.3390/ijms20184473
Article Google Scholar
Deer EL, González-Hernández J, Coursen JD, Shea JE, Ngatia J, Scaife CL, Firpo MA, Mulvihill SJ (2010) Phenotype and genotype of pancreatic cancer cell lines. Pancreas 39(4):425–435. https://doi.org/10.1097/MPA.0b013e3181c15963
Article Google Scholar
Ying H, Dey P, Yao W, Kimmelman AC, Draetta GF, Maitra A, DePinho RA (2016) Genetics and biology of pancreatic ductal adenocarcinoma. Genes Dev. 30:355–385. https://doi.org/10.1101/gad.275776.115
Article Google Scholar
Krulikas LJ, McDonald IM, Lee B, Okumu DO, East MP, Gilbert TSK, Herring LE, Golitz BT, Wells CI, Axtman AD, Zuercher WJ, Willson TM, Kireev D, Yeh JJ, Johnson GL, Baines AT, Graves LM. Application of integrated drug screening/kinome analysis to identify inhibitors of gemcitabine-resistant pancreatic cancer cell growth. SLAS Discov. 2019;23(8):850–61. https://doi.org/10.1177/2472555218773045.
Article Google Scholar
Ductal P, Huang C, Lan W, Fraunho N, Iovanna J, Santofimia-castaño P (2019) Dissecting the anticancer mechanism of trifluoperazine on pancreatic ductal adenocarcinoma. Cancers (Basel). 11(1869):1–15. https://doi.org/10.3390/cancers11121869
Article Google Scholar
Chen LY, Cheng CS, Gao HF, Zhan L, Wang FJ, Qu C, Li Y, Chen HF, Wang P, Chen H, Meng ZQ, Liu LM, Chen Z (2018, 2018) Natural compound methyl protodioscin suppresses proliferation and inhibits glycolysis in pancreatic cancer. Evid Based Complement Altern Med. https://doi.org/10.1155/2018/7343090Research
Ahmed AA, Marchetti C, Ohnmacht SA, Neidle S (2020) A G quadruplex-binding compound shows potent activity in human gemcitabine-resistant pancreatic cancer cells. Sci Rep. 10:12192. https://doi.org/10.1038/s41598-020-68944-w
Article Google Scholar
Li J, Liang X, Yang X (2012) Ursolic acid inhibits growth and induces apoptosis in gemcitabine-resistant human pancreatic cancer via the JNK and PI3K/Akt/NF-κB pathways. Oncol Rep. 28:501–510. https://doi.org/10.3892/or.2012.1827
Article Google Scholar
Abdel-ilah L, Veljovic E, Gurbeta L, Badnjević A (2017) Applications of QSAR study in drug design. Int J Eng Res Technol. 6(06):582–587
Google Scholar
Metibemu DS, Oyeneyin OE, Omotoyinbo DE, Adeniran OY, Metibemu AO, Oyewale MB (2020) Molecular docking and quantitative structure activity relationship for the identification of novel phyto-inhibitors of matrix metalloproteinase-2. Sci Lett 8(2):61–68. https://doi.org/10.47262/sl/8.2.132020005
Article Google Scholar
Obadawo BS, Oyeneyin OE, Anifowose MM, Fagbohungbe KH, Amoko JS (2019) QSAR modeling of novel substituted 4- Phenylisoquinolinones as potent BET bromodomain ( BRD4-BD1 ) inhibitors. Biomed Lett. 4(1):69–78
Google Scholar
Laufkötter O, Sturm N, Bajorath J, Chen H, Engkvist O (2019) Combining structural and bioactivity-based fingerprints improves prediction performance and scaffold hopping capability. J Cheminform. 11(54):1–14. https://doi.org/10.1186/s13321-019-0376-1
Article Google Scholar
Bin Huang G, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Article Google Scholar
Liu ZF, Li LL, Tseng ML, Lim MK (2020) Prediction short-term photovoltaic power using improved chicken swarm optimizer - extreme learning machine model. J Clean Prod. 248:119272. https://doi.org/10.1016/j.jclepro.2019.119272
Article Google Scholar
Olanrewaju AA, Ibeji CU, Oyeneyin OE (2020) Biological evaluation and molecular docking of some newly synthesized 3d-series metal(II) mixed-ligand complexes of fluoro-naphthyl diketone and dithiocarbamate. SN Appl Sci 2(4):678. https://doi.org/10.1007/s42452-020-2482-0
Article Google Scholar
Adejoro IA, Waheed SO, Adeboye OO, Akhigbe FU (2017) Molecular docking of the inhibitory activities of triterpenoids of <i>Lonchocarpus cyanescens</i> against ulcer. J Biophys Chem. 08(01):1–11. https://doi.org/10.4236/jbpc.2017.81001
Article Google Scholar
Ronayne CT, Solano LN, Nelson GL, Lueth EA, Hubbard SL, Schumacher TJ, Gardner ZS, Jonnalagadda SK, Gurrapu S, Holy JM, Mereddy VR (2017) Synthesis and biological evaluation of 2-alkoxycarbonylallyl esters as potential anticancer agents. Bioorganic Med Chem Lett. 27(4):776–780. https://doi.org/10.1016/j.bmcl.2017.01.037
Article Google Scholar
Mills N (2006) ChemDraw Ultra 10.0. CambridgeSoft. Comput Softw Rev 128:13649–13650. https://doi.org/10.1021/ja0697875
Article Google Scholar
Shao Y, Molnar LF, Jung Y, Kussmann J, Ochsenfeld C, Brown ST, Gilbert ATB, Slipchenko LV, Levchenko SV, O'Neill DP, DiStasio RA, Jr, Lochan RC, Wang T, Beran GJO, Besley NA, Herbert JM, Yeh Lin C, Van Voorhis T, Hung Chien S, Sodt A, Steele RP, Rassolov VA, Maslen PE, Korambath PP, Adamson RD, Austin B, Baker J, Byrd EFC, Dachsel H, Doerksen RJ, Dreuw A, Dunietz BD, Dutoi AD, Furlani TR, Gwaltney SR, Heyden A, Hirata S, Hsu C-P, Kedziora G, Khalliulin RZ, Klunzinger P, Lee AM, Lee MS, Liang W, Lotan I, Nair N, Peters B, Proynov EI, Pieniazek PA, Min Rhee Y, Ritchie J, Rosta E, David Sherrill C, Simmonett AC, Subotnik JE, Lee Woodcock H, III, Zhang W, Bell AT, Chakraborty AK, Chipman DM, Keil FJ, Warshel A, Hehre WJ, Schaefer HF, III, Kong J, Krylov AI, Gill PMW, Head-Gordon. Advances in methods and algorithms in a modern quantum chemistry program package. Phys Chem Chem Phys. 2006;8(27):3172-91. https://doi.org/10.1039/b517914a.
Becke AD (1993) Density - functional thermochemistry. III. The role of exact exchange. J Chem Phys 98:5648–5652
Article Google Scholar
Yap CW (2011) PaDEL-Descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474. https://doi.org/10.1002/jcc.21707
Article Google Scholar
Owolabi TO Extreme learning machine and swarm- based support vector regression methods for predicting crystal lattice parameters of pseudo-cubic/cubic perovskites extreme learning machine and swarm-based support vector regression methods for predicting crystal lat. J Appl Phys 245107(2020). https://doi.org/10.1063/5.0008809
Shamsah SMI, Owolabi TO (2020) Modeling the maximum magnetic entropy change of doped manganite using a grid search-based extreme learning machine and hybrid gravitational search-based support vector regression. Crystals 10(4):310. https://doi.org/10.3390/cryst10040310
Article Google Scholar
Singh P (2013) Quantitative structure-activity relationship study Of substituted-[1,2,4] oxadiazoles as S1p1 agonists. I Curr Chem Pharm Sci. 3(1):64–79
Google Scholar
Ambure P, Aher RB, Gajewicz A, Puzyn T, Roy K (2015) “NanoBRIDGES” software: open access tools to perform QSAR and nano-QSAR modeling. Chemom Intell Lab Syst. 147:1–13. https://doi.org/10.1016/j.chemolab.2015.07.007
Article Google Scholar
Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11(1):137–148
Article Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat. 19(1):1–141
Article MathSciNet Google Scholar
Khaled KF (2011) Modeling corrosion inhibition of iron in acid medium by genetic function approximation method: a QSAR model. Corros Sci. 53(11):3457–3465. https://doi.org/10.1016/j.corsci.2011.01.035
Article Google Scholar
Emmanuel Israel E, Adamu U, Stephen Eyije A (2016) Multi-target in-silco study of 5,6-dihydro-2-pyrones, indole β-diketo acid, diketo acid and carboxamide derivatives against various anti- HIV-1 strain at pm3 semi-empirical level. ewemen j pharm. 1(1):1–13
Google Scholar
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22(1):69–77. https://doi.org/10.1002/qsar.200390007
Article Google Scholar
Veerasamy R, Rajak H, Jain A, Sivadasan S, Varghese CP, Agrawal RK. (2011) Validation of QSAR models-strategies and importance. Int J Drug Des Discov 3:511–519.
“3POZ.” https://www.rcsb.org/structure/3POZ. Accessed 23 Sept 2020.
Aertgeerts K, Skene R, Yano J, Sang BC, Zou H, Snell G, Jennings A, Iwamoto K, Habuka N, Hirokawa A, Ishikawa T, Tanaka T, Miki H, Ohta Y, Sogabe S. (2011) Structural analysis of the mechanism of inhibition and allosteric activation of the kinase domain of HER2 protein. J Biol Chem. 286(21):18756–65. https://doi.org/10.1074/jbc.M110.206193.
Dash R, Hosen SMZ, Karim MR, Kabir MSH, Hossain MM, Junaid M, Islam A, Paul A, Khan MA. In silico analysis of indole-3-carbinol and its metabolite DIM as EGFR tyrosine kinase inhibitors in platinum resistant ovarian cancer vis a vis ADME/T property analysis. EkpJ Appl Pharm Sci. 2015;5(11):073–9.
Sastry GM, Adzhigirey M, Day T, Annabhimoju R, Sherman W (2013) Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J Comp Aided Mol Des 27:221–234 [Online]. Available: http://www.schrodinger.com/citations/41/16/1/
Article Google Scholar
Shivakumar D, Williams J, Wu Y, Damm W, Shelley J, Sherman W. Prediction of absolute solvation free energies using molecular dynamics free energy perturbation and the OPLS force field. J Chem Theor Comput. 2010;6(5):1509–19.
Shelley JC, Cholleti A, Frye L, Greenwood JR, Timlin MR, Uchimaya M. Epik: a software program for pK. J Comp Aided Mol Des. 2007;21:681–91.
“qikprop.” https://www.schrodinger.com/qikprop. Accessed 18 July 2020.
Liao C, Sitzmann M, Pugliese A, Nicklaus MC (2011) Software and resources for computational medicinal chemistry. Future Med Chem. 3(8):1057–1085. https://doi.org/10.4155/fmc.11.63
Article Google Scholar
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 2001;46:3–26.
Lipinski CA. Rule of five in 2015 and beyond: Target and ligand structural limitations, ligand chemistry structure and drug discovery project decisions. Adv Drug Deliv Rev. 2016;101:34–41.

Download references

Acknowledgements

Not applicable

Funding

Not applicable

Author information

Authors and Affiliations

Theoretical and Computational Chemistry Unit, Adekunle Ajasin University, Akungba-Akoko, Ondo State, Nigeria
Oluwatoba Emmanuel Oyeneyin, Babatunde Samuel Obadawo & Nureni Ipinloju
Department of Chemical Sciences, Adekunle Ajasin University, Akungba-Akoko, Ondo State, Nigeria
Oluwatoba Emmanuel Oyeneyin, Babatunde Samuel Obadawo & Nureni Ipinloju
Department of Physics and Electronics, Adekunle Ajasin University, Akungba-Akoko, Ondo State, Nigeria
Taoreed Olakunle Owolabi
Chemistry and Industrial Chemistry Programmes, Bowen University, Iwo, Osun State, Nigeria
Adesoji Alani Olanrewaju
Department of Chemistry, University of Ibadan, Ibadan, Oyo State, Nigeria
Fahidat Adedamola Gbadamosi
Federal University of Agriculture, Makurdi, Benue State, Nigeria
Helen Omonipo Modamori

Authors

Oluwatoba Emmanuel Oyeneyin
View author publications
You can also search for this author in PubMed Google Scholar
Babatunde Samuel Obadawo
View author publications
You can also search for this author in PubMed Google Scholar
Adesoji Alani Olanrewaju
View author publications
You can also search for this author in PubMed Google Scholar
Taoreed Olakunle Owolabi
View author publications
You can also search for this author in PubMed Google Scholar
Fahidat Adedamola Gbadamosi
View author publications
You can also search for this author in PubMed Google Scholar
Nureni Ipinloju
View author publications
You can also search for this author in PubMed Google Scholar
Helen Omonipo Modamori
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

OOE: conceptualization, methodology, investigation, data curation. OBS: conceptualization, methodology, data curation, writing—original draft. OAA: investigation writing—review and editing. TOO: methodology, data interpretation, review, and editing. GFA: investigation writing and proofreading. NI: review and editing. MHO: investigation, data curation, proofreading. All authors also read and approved the final manuscript.

Corresponding author

Correspondence to Babatunde Samuel Obadawo.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they had no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Oyeneyin, O.E., Obadawo, B.S., Olanrewaju, A.A. et al. Predicting the bioactivity of 2-alkoxycarbonylallyl esters as potential antiproliferative agents against pancreatic cancer (MiaPaCa-2) cell lines: GFA-based QSAR and ELM-based models with molecular docking. J Genet Eng Biotechnol 19, 38 (2021). https://doi.org/10.1186/s43141-021-00133-2

Download citation

Received: 12 November 2020
Accepted: 04 February 2021
Published: 10 March 2021
DOI: https://doi.org/10.1186/s43141-021-00133-2

Predicting the bioactivity of 2-alkoxycarbonylallyl esters as potential antiproliferative agents against pancreatic cancer (MiaPaCa-2) cell lines: GFA-based QSAR and ELM-based models with molecular docking

Abstract

Background

Results

Conclusion

Background

Methods

QSAR Studies

Data collection

Molecular descriptor calculation

Mathematical description of the proposed ELM algorithm

Computational details

QSAR methods

Normalization and data pre-treatment

Data division

Model development

Internal validation of model

Friedman’s lack of fit (LOF)

where n is the number compounds that made up the training set, Yexp is experimental activity and Ypred is the predicted activity in the training set The correlation coefficient (R 2)

Adjusted (R 2)

The cross validation coefficient \( \left({Q}_{cv}^2\right) \)

External validation of the model

Statistical analysis of the descriptor

Variance inflation factor

Mean effect

Evaluation of the applicability domain of the model

Quality assurance of the model

ELM-based models

Molecular docking and ADME/Tox screening

Protein and ligands preparation

Ligand preparation

ADME/Tox screening

Results

QSAR results of 2-alkoxycarbonylallyl esters

Comparison of the estimates of developed QSAR and ELM-based models

Molecular docking

Discussion

QSAR results of 2-alkoxycarbonylallyl esters

Comparison of the estimates of developed QSAR and ELM-based models

Molecular docking

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

where n is the number compounds that made up the training set, Y_exp is experimental activity and Y_pred is the predicted activity in the training set The correlation coefficient (R ²)

Adjusted (R ²)