This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery

Wen Wang1 , Jianzong Wang1(){}^{1(\textrm{{\char 0\relax}})}, Shijing Si1,2, Zhangcheng Huang1 and Jing Xiao1 Corresponding author: Jianzong Wang, [email protected] 1Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China 2School of Economics and Finance, Shanghai International Studies University, Shanghai, China Email: [email protected], [email protected], [email protected],
[email protected], [email protected]
Abstract

The extraction of sequence patterns from a collection of functionally linked unlabeled DNA sequences is known as DNA motif discovery, and it is a key task in computational biology. Several deep learning-based techniques have recently been introduced to address this issue. However, these algorithms can not be used in real-world situations because of the need for labeled data. Here, we presented RL-MD, a novel reinforcement learning based approach for DNA motif discovery task. RL-MD takes unlabelled data as input, employs a relative information-based method to evaluate each proposed motif, and utilizes these continuous evaluation results as the reward. The experiments show that RL-MD can identify high-quality motifs in real-world data.

Index Terms:
Biomedical AI, Reinforcement Learning, Computational Biology, Bioinformatics, DNA Motif Discovery

I Introduction

The DNA sequences that contain the same motif, a specific sequence pattern, are often bound by a particular transcriptional factor (TF) or TF combination. Biologists have shown that TF are crucial in biological processes such as alternative splicing[1], RNA degradation[2], and transcriptional regulation[3]. Different types of cells express unique combinations of TFs, which might be viewed as the basic mechanism for cell differentiation[4]. Identifying the motif from a collection of unlabeled DNA sequences that hold common regulatory or functional characteristics is a essential task for computational biology known as DNA motif discovery (DMD) (Fig. 1).

The input for the DMD is thousands of sequences, each containing hundreds of nucleotides. Unknown portions of these sequences contain the motif, while the remaining sequences do not. The percentages of sequences containing motifs in different datasets vary due to the poor reproducibility of biological experiments. Input sequences produced by enrichment-based methods like ChIP-seq (chromatin immunoprecipitation followed sequencing)[5] exhibit a dominant motif. Datasets generated by other biological approaches may contain multiple motifs. Here, we focus on the scenario where a dominant motif exists.

Several computational algorithms have been developed to perform DMD. They could be divided into two groups—word-based and profile-based—according to different representation methods for motifs. Word-based approaches utilize (degenerate) consensus IUPAC codes[6] to describe the motif and search the entire input sequence to find the theoretically best motif[7][8][9][10]. The computational complexity increased exponentially when the size of input sequences got a linear increment for these word-based methods. This drawback limits its application in the high-throughput sequencing era. Profile-based techniques represent motifs by a positional weighted matrix (PWM), which can define an infinite number of nucleotide combinations in each position as opposed to a limited number using IUPAC codes[6] (Fig. 1). In DMD, profile-based techniques were employed with variable heuristic search technologies[11][12][13][14]. However, the sub-optimal results generated by these algorithms force users to repeatedly try the DMD process in order to obtain a better motif, which slows down its processing performance.

Refer to caption
Figure 1: The DMD process. DMD receives a set of DNA sequences as input. Some of these sequences have motifs (red text) according to the biological hypothesis. The number of motifs in these sequences is higher than in other types of DNA fragments. DMD is looking for the nucleotide composition of this putative motif. Assuming that the precise location of the motif among the input sequences is known, the motif can be identified by PWM or degenerate consensus IUPAC codes after aligning, counting, and visualizing the motif in a form specifically designed for biologists. In practice, algorithms must perform these operations without position information of the motifs.

Several deep learning-based approaches, including DeepBind[15], basset[16], HOCNN[17], DeepCpG[18], LeNup[19] and DeFine[20], have been introduced to extract sequence features by using the weights in the first convolution layers as the probabilities in PWMs. The training of these discrimination models requires labelled inputs, which distinguish the sequence containing a motif from others. Unfortunately, the existing status of the motif in each sequence in the inputs of DMD is unknown. These limitations make it hard to apply these algorithms in real biological research. Together, a novel unsupervised learning strategy combined with the feature extraction ability of deep learning is needed to improve DMD. Deep reinforcement learning has been considered as a potential approach to address these issues[21].

Motivated by the aforementioned limitations, we propose RL-MD, a novel reinforcement learning approach to solve these questions. First, RL-MD is constructed based on DDPG, which does not require a labeled input. Second, RL-MD uses HKDIC (high order Kullback-Leibler divergence), an enhanced motif evaluation approach, to evaluate the motif. Last but not least, RL-MD can produce superior results with data from the real world.

Our contributions are presented as follows:

  • We propose RL-MD to apply reinforcement learning to the task of discovering DNA motifs (DMD). RL-MD find the right DNA motif by the iteration of proposing potential motifs and update its own parameters based on receiving feedback (evaluation score).

  • We design the new evaluation method HKDIC, which takes the adjacent nucleotides combination information into account. We use 2D convolution to conduct this operation. It could extend to a kk-adjacent nucleotides combination with a low increment in computing time.

  • We test our RL-MD and two other popular algorithms on real-world datasets. In Gcn4 ChIP-seq data, RL-MD was able to obtain a reliable motif, whereas MEME[22] and ChIPMunk[12] were unable to produce valid results.

II Related Works

II-A Biological Background

DMD is one of the most essential computational biology tasks. Korn[23] first introduced a program to analyse the common features among DNA sequences. Following that, researchers introduced algorithms to analyse the motifs in DNA sequences such as the upstream sequence of transcription start sites, transcription termination sites, and satellite DNA[24].

In the high-throughput sequencing era, ChIP-chip and ChIP-seq generate new types of data for DMD input. Both in terms of number and quality, these inputs are superior to the preceding ones. In quantity, millions of raw signals could be summarized into tens of thousands of peaks which could indicate TF binding sites. In quality, these TF binding sites are supported by physical interaction evidence. The input DNA fragments could be reduced to hundreds base pairs (bp). Previous methods take tremendous time for handle this new data. New algorithms are required for computation biology[25][26].

For ChIP-chip data, several algorithms have been developed[24][27]. Nowadays, the most common input for DMD is ChIP-seq or ChIP-seq-like data. MEME[22], the most widely used method, uses a mixture model to describe input sequences and fits this model via expectation maximization to perform DMD. They assume that the input sequences are composed of two sets of sequences. One contains the sequence derived from the motif with random sequences in the flanking regions. The others were formed with random sequences. These two datasets can be described by their model. MEME initiated two models with arbitrary parameters and refined them in the following. In E-steps, MEME calculated the expected value of each sequence derived from models and sent it to the corresponding higher value group. In M-steps, MEME maximizes the model parameters to refine the model. After repeatedly applying these two steps, MEME finally gets the model that can describe the most information about the input sequence. ChIPMunk[12][14] also uses the EM approach. In E-steps, ChIPMunk[12][14] uses the discrete information content (DIC) to measure the similarity between sequences and motif. Its basic version does not consider the background frequency of nucleotides. In an improved version, the discrete version of Kullback–Leibler divergence (Kullback DIC, KDIC) was used, which removed the information content explained by sequence background.

II-B Modern Computational Methods

Convolutional neural network (CNN)-based techniques have been developed recently to address DMD[15][16][17][20]. The inputs for these algorithms are labeled sequences. The labels provide information about whether the corresponding sequence has the target motif. Based on the CNN-based networks, binary classifiers were trained. Benefit from the acceleration of GPU-based tensor calculation, which increases the execution performance of certain methods to a usable level. The weights of the first convolutional layer, which are typically mm * 4 in size, were taken into consideration as the PWM of motifs after the training procedure.

Reinforcement learning (RL) has been employed in a variety of tasks for which the ground truths are unknown. The deterministic policy had been introduced for tasks with continuous action space in order to solve these issues utilizing RL[28]. To utilize the benefits of deep learning in feature extraction and deterministic policy, deep deterministic policy gradient (DDPG)[29] coupled the Actor-Critic technique and deep Q-network (DQN)[30]. Prioritized experience replay, which used the priority queue to learn more data on those high bias experiences, has been presented as a way to speed up the training process[31].

III Methodology

III-A Applied DDPG in DMD

Refer to caption
Figure 2: The overall structure of RL-MD. RL-MD uses DDPG as the general framework. The actor network μ\mu first generate some random actions and received feedback (reward) from the environment. Replay memory DD was used to store these states, actions, rewards, and new states. The actor network was updated according to the policy gradient. The critic network QQ was updated according to the difference between the evaluated value of an action given by the critic network and predicted value using the target network and reward.

The purpose of the RL agent is to maximize the total reward R=t=t0γtt0rtR=\displaystyle\sum_{t=t_{0}}^{\infty}\gamma^{t-t_{0}}r_{t}. Theoretically, there is an optimal policy π\pi^{*} that can guide our choice of action a𝒜a\in\mathscr{A} for each state s𝒮s\in\mathscr{S} in order to accomplish this goal. The evaluation function QQ that describes the reward RR in each state ss and action aa combination is need.

Q(s,a)RQ(s,a)\to R (1)

The optimal policy π\pi^{*} could be obtained by maximizing the rewards received according to QQ.

π(s)=argmaxaQ(s,a)\pi^{*}(s)=\operatorname*{argmax}_{a}Q(s,a) (2)

Given that the action space in RL-MD is continuous, deep learning networks have been employed to approximate the evaluation function QQ as a QπQ^{\pi}. With reference to the bellmen equation, QπQ^{\pi} is updated.

Qπ(s,a)=r+γQπ(s,π(s))Q^{\pi}(s,a)=r+\gamma Q^{\pi}(s^{\prime},\pi(s^{\prime})) (3)

Temporal difference error (TD error) δ\delta describes the difference between the two sides of the equality:

δ=Q(s,a)(r+γmaxaQ(s,a))\delta=Q(s,a)-(r+\gamma\max_{a}Q(s^{\prime},a)) (4)

QπQ^{\pi} will eventually approach QQ by updating the network using the loss at each step (Fig. 2). Taken together, we proposed the RL-MD in algorithm 1.

III-A1 Action

In RL-MD, the actor networks will generate potential motifs as the actions. The motif is a probability matrix with m * 4 sizes, denoted as MM here, and shown in Fig. 1. Mi,jM_{i,j} is the probability of the jj-th nucleotide at the ii-th position.

III-A2 State

The environment of RL-MD is the input sequences, a set of unlabeled DNA sequences. The state SS of the environment is the detailed nucleotides for each input sequence and stays constant throughout the whole process. The state SS are loaded into RL-MD at the beginning rather than being saved in the replay memory. Biological experiments produced DNA fragments of varying lengths for DMD input. We clipped the edges on both sides to reserve the 100 bp around the signal summits for convenience.

III-A3 Reward

Similar to ChIPMunk[12][14], RL-MD evaluated how a motif could explain the information in input sequences. These measurement are used by RL-MD as the reward RR in RL. For the sake of computational simplicity, previous algorithms[12][14] assumed that the distribution of nucleotides between adjacent DNA sites was independent. However, biologists have revealed that the relationship between adjacent nucleotides may be essential for some biological phenomena, such as the under-expectation frequency of the cytosine-phosphate-guanine dinucleotide (CpG)[32]. The bias of DNase-seq data was discovered by methods that took into account the relationship between high-order neighboring nucleotides[33]. As a substitute for KDIC, RL-MD uses HKDIC. The original KDIC could be regarded as zero-order HKDIC, which does not consider adjacent nucleotides. In the first-order HKDIC, we consider the first adjacent nucleotide, so the type of nucleotides in each position is 64 (42×1+14^{2\times 1+1}) rather than 4 (40×1+14^{0\times 1+1}). The order of HKDIC is a hyper-parameter in RL-MD and is set as 1 in the following. The background frequency for each nucleotide were denoted as:

Q=(q1,q2,,qk)Q=(q_{1},q_{2},\cdots,q_{k}) (5)

where kk is the number of different types of nulcleotides in the input sequences. For the first-order HKDIC, k=64k=64.

Suppose that there are NN DNA sequences contained in the input sequences, and that the composition of nucleotides at a specific position is :

C=(c1,c2,,ck)C=(c_{1},c_{2},\cdots,c_{k}) (6)

where i=1kci=N\displaystyle\sum_{i=1}^{k}c_{i}=N. The likelihood of this occurrence is:

P(q1,q2,,qk|Q)=N!c1×c2××ckq1c1×q2c2××qkck\begin{split}P(q_{1},q_{2},\cdots,q_{k}|Q)=\\ \frac{N!}{c_{1}\times c_{2}\times\cdots\times c_{k}}q_{1}^{c_{1}}\times q_{2}^{c_{2}}\times\cdots\times q_{k}^{c_{k}}\end{split} (7)

After the logarithm:

logP(q1,q2,,qk|Q)=logN!c1×c2××ck+i=1kciqi\begin{split}logP(q_{1},q_{2},\cdots,q_{k}|Q)=\\ log\frac{N!}{c_{1}\times c_{2}\times\cdots\times c_{k}}+\displaystyle\sum_{i=1}^{k}c_{i}q_{i}\end{split} (8)

The KDIC is defined as:

KDIC\displaystyle KDIC =1N(i=1klogxi!logN!)i=1kxiNlogqi\displaystyle=\frac{1}{N}(\sum_{i=1}^{k}logx_{i}!-logN!)-\sum_{i=1}^{k}\frac{x_{i}}{N}logq_{i} (9)

where xix_{i} is the proportion of ii-th nucleotide in this position. In HKDIC, the number of nucleotide types kk is different according to the order oo taken into consideration. Thus, the HKDIC for a single position is here:

HKDIC=1N(i=1k2×o+1logxi!logN!)i=1k2×o+1xiNlogqiHKDIC=\frac{1}{N}(\displaystyle\sum_{i=1}^{k^{2\times o+1}}logx_{i}!-logN!)-\displaystyle\sum_{i=1}^{k^{2\times o+1}}\frac{x_{i}}{N}logq_{i} (10)

To calculate the reward for a given motif aa to the input sequences 𝒮\mathscr{S}, an optimal alignment set PP will be generated first. The optimal alignment between each sequence si𝒮s_{i}\in\mathscr{S} and motif matrix MM will be calculated in both strands of the DNA sequence. The DNA fragment aligned to the motif matrix of the better result in these two alignment will be saved as pip_{i}. The pip_{i} is a k×mk\times m sized matrix, where kk is the number of nucleotides and mm is the length of motif. The value in pip_{i} is defined as:

pi,k,m={1,the m-th position is the k-th nucleotide0,otherwisep_{i,k,m}=\begin{cases}1,\quad\text{the $m$-th position is the $k$-th nucleotide}\\ 0,\quad\text{otherwise}\end{cases} (11)

The PP is the sum of each pip_{i}, P=i=1NpiP=\displaystyle\sum_{i=1}^{N}p_{i}. And the reward RR is sum of HKDIC in each position:

R=1N(logxi,m!logN!)xi,mNlogqiR=\frac{1}{N}(\displaystyle\sum logx_{i,m}!-logN!)-\displaystyle\sum\frac{x_{i,m}}{N}logq_{i} (12)

where xi,mx_{i,m} is the count of ii-th nucleotide in the mm-th position in PP, qiq_{i} is the frequency of the ii-th nucleotide in the input sequences.

III-B Network Structure

Refer to caption
Figure 3: Network structure in RL-MD. Actor networks take the state ss (input sequences of DMD) as input and output the proposed motif as action aa. Critic networks take the states ss and actions aa generated by actor networks as input, then output the reward RR.

The state 𝒮\mathscr{S} input to actor networks is a 2×N×100×42\times N\times 100\times 4 sized tensor. Actor networks have two blocks of convolution layers, batch normal layers, and activation layers. Actor networks contain two fully connected layers with softmax function in the following and output a m×4m\times 4 sized matrix action aa. A 2×N×(100+m)×42\times N\times(100+m)\times 4 sized tensor, which contains both the state 𝒮\mathscr{S} and action aa information, is the input of critic networks. Critic networks have two blocks of convolution layers, batch normal layers, and activation layers. Actor networks contain two fully connected layers in the following and output an evaluation score about the input action aa to state 𝒮\mathscr{S} (Fig. 3).

III-C Exploration Enhancement

RL-MD uses the soft-ϵ\epsilon policy to improve the exploration process in order to solve the local optimal problem. Typically, during the exploration process, motifs (action aa) were produced at random. For most positions in these generated motifs, the frequency of each nucleotide is similar. However, in most positions in real-world motifs, the nucleotide frequency is extremely unbalanced. This difference in nucleotide distribution between motifs that are generated at random and those in the real world causes a reduction in exploration efficiency. RL-MD addresses this issue by adopting rejection sampling. The inequality index ee of the whole motif is defined as:

e=i=1meie=\displaystyle\prod_{i=1}^{m}e_{i} (13)

where ii is the position index of the motif. The inequality index eie_{i} for each position in the motif is defined as:

ei=(ki=1kj=1k|mimj|2n2)4e_{i}=(\frac{k\displaystyle\sum_{i=1}^{k}\displaystyle\sum_{j=1}^{k}|m_{i}-m_{j}|}{2n^{2}})^{4} (14)

The greater the nucleotide frequency inequality inside the provided motif, the lower the inequality value will be and the greater the likelihood that this motif will pass the rejection sampling.

III-D Weighted Experience Replay

The main objective of the critic network is to assist the actor network in taking better action with higher reward. Therefore, the action that offers a greater reward is more important for the networks to learn. To focus on these high-value occurrences, RL-MD employs a weighted experience replay strategy. Weighted sampling is used to construct the learning batches when the target networks retrieve the memory, and the reward values are employed as the weights.

IV Experiments

Algorithm 1 RL-MD’s algorithm
1:Read the input sequences and calculate the count each type of nucleotides \triangleright The types of nucleotides are determined by the order of HKDIC oo
2:Randomly initialize the actor network and critic network in evaluation networks with parameters Q(s,a|ΘQ)Q(s,a|\Theta^{Q}) and μ(s|Θμ)\mu(s|\Theta^{\mu})
3:Initialize the actor network and critic network in target networks with parameters ΘQΘQ\Theta^{Q^{\prime}}\leftarrow\Theta^{Q} and ΘμΘμ\Theta^{\mu^{\prime}}\leftarrow\Theta^{\mu}
4:Initialize the replay memory \triangleright This is a heap with fix length DD
5:for episode=1,Eepisode=1,E do
6:     if E<=100E<=100 then
7:         for t=1,Tt=1,T do
8:              Generate a random action
9:              Get the reward from the environment
10:              Save this action-reword pair into replay memory
11:         end for
12:     else
13:         Initialize the adaptive Ornstein-Uhlenbeck process
14:         for t=1,Tt=1,T do
15:              aA𝒰(0,1)a\leftarrow A\sim\mathcal{U}(0,1)
16:              if a<=ϵa<=\epsilon then
17:                  Generate a random action
18:              else
19:                  Generate a action according to the actor network in evaluation networks
20:              end if
21:              Get the reward from the environment
22:              Save this action-reword pair into replay memory
23:              if episode×(T1)+t>Depisode\times(T-1)+t>D then
24:                  Weighted sampling a random minibatch from the replay memory
25:                  Set yi=ri+γQ(si+1,μ(Si+1|Θμ)|ΘQ)y_{i}=r_{i}+\gamma Q^{\prime}(s_{i+1},\mu^{\prime}(S_{i+1}|\Theta^{\mu^{\prime}})|\Theta^{Q^{\prime}})
26:                  Update the critic networks by minimizing the huber loss
27:                  Update the actor networks according to the policy gradient
28:                  Update the target networks:
29:      ΘQτΘQ+(1τ)ΘQ\Theta^{Q^{\prime}}\leftarrow\tau\Theta^{Q}+(1-\tau)\Theta^{Q^{\prime}}
30:      ΘμτΘμ+(1τ)Θμ\Theta^{\mu^{\prime}}\leftarrow\tau\Theta^{\mu}+(1-\tau)\Theta^{\mu^{\prime}}
31:              end if
32:         end for
33:     end if
34:end for

IV-A Experimental Setup

The evaluation was carried out using the CTCF and Gcn4 motifs. There are two different artificial dataset types and one real-world dataset used in the evaluation. The first type of artificial dataset is a collection of all positive (with motif) DNA sequences. The second is a mixture of DNA sequences that are two thirds negative (without a motif) and one third positive. Data from ChIP-seq is used to create the real-world dataset. Full-pos, unbalanced, and ChIP-seq labels were assigned to these datasets individually.

Refer to caption
Figure 4: Test for single-nucleotide recovery. In each individual test, a position was randomly selected and masked in the motif. The output shapes of the actor networks were set to 1×k1\times k to fill in this gap. The synthesised motifs were used in the following process. These tests could access the ability of RL-MD to provide the right nucleotide frequency at a single nucleotide position.
Refer to caption
Figure 5: Example of a training process. The two figures on the left side describe the training loss patterns for the actor network (upper panel) and critic network (lower panel). The probabilities of four nucleotides in the proposed motifs by the actor network are shown in the middle and right figures. These results suggest that throughout the training phase, both actor and critic networks can converge. The navy and orange lines show the raw and 50-length moving average signals, respectively.
TABLE I: E-value measurement of output motif compared with ground truth
Gcn4 CTCF
full-pos imbalanced ChIP-seq full-pos balanced ChIP-seq
MEME[22] 2.7e-07 3.2e-06 N.A. 9.1e-28 1.3e-40 4.6e-18
ChIPMunk[12] 8.0e-06 2.3e-03 N.A. 0 8.6e-36 1.2e-31
RL-MD 2.0e-04 9.4e-05 6.6e-02 6.0e-06 5.9e-10 5.1e-07

The motifs discovered with different algorithms were saved in PWM format. To assess the similarity between two motifs, several assessment criteria have been developed[34][35]. Here, E-value is used to perform this measurement[36]. The more similar the detected motif and ground truth are, the lower the E-value will be. TOMTOM[36] was used to calculate this similarity between motifs discovered by different algorithms and motifs from the JASPAR database (JASPAR 2022 core non-redundant)[37].

IV-A1 Single nucleotide rescue

A simplified method is utilized to assess the capacity of this algorithm before performing the DMD using RL-MD. A position among known motifs will be masked. The algorithm should recover this position using the full-pos datasets derived from this motif. A 1×k1\times k sized matrix that is appropriate for this position is what RL-MD seeks to deliver. The output of actor networks will be used to complete the original masked position in the motif (Fig. 4). This entire motif will be used for both the evaluation of the environment and input for critic networks. The 13-th position of the Gcn4 motif was covered. Within 10,000 steps, RL-MD can retrieve it (Fig. 5). All positions in the CTCF motif as well as other positions in Gcn4 motif were tested using this method. RL-MD was able to pass the test in each positions of these two motifs. These findings demonstrate that RL-MD was able to identify the proper nucleotide frequency composition in the motif.

IV-A2 Artificial Data

The artificial datasets were created in the following steps. First, the target motifs (CTCF and Gcn4) were obtained from the JASPAR database[37]. For the positive sequences (with motif), the central sequence was generated using the PWM of motif, and the flanking sequences were added using random nucleotides to create a 100 bp-length sequence. The negative sequences were made up entirely of random nucleotides. Three hundred positive sequences were used for the full-pos dataset. The unbalanced datasets have 200 negative and 100 positive sequences.

IV-A3 Real World Data

The Gcn4 and CTCF ChIP-seq data were downloaded from Gene Expression Omnibus (GEO)[38] under GEO accession numbers GSE85588[39] and GSE30263[40], respectively. The ChIP-seq peaks were sorted by the enrichment signal and preserve the top 300. The lengths of peaks were extended to 100 bp from the peak summits using BEDtools[41]. The input sequences were generated according to the normalized peaks and genome sequence.

IV-A4 Algorithm and hyper-parameters

We set the hyper-parameters in RL-MD as: the maximum episode E=10000E=10000, the steps in each episode T=10T=10, the capacity of the replay memory D=1000D=1000, the exploration control parameter ϵ=0.01\epsilon=0.01, the target networks update ratio τ=0.1\tau=0.1, the order of HKDIC o=1o=1, and the learning rate for actor and critic networks γ\gamma were selected from {1×102,1×103,1×104,1×105}\{1\times 10^{-2},1\times 10^{-3},1\times 10^{-4},1\times 10^{-5}\} to find the best results.

IV-B Baseline

Two commonly utilized algorithms, MEME[22] and ChIPMunk[12], were used using actual biological data. Each year, the MEME was cited more than 200 times, and in the last five years, ChIPMunk has been cited 50 times. TOMTOM[36] were used to assess the output motif by these two algorithms. In order to shorten the running time of MEME, the number of output motifs was changed from the default value of 20 to 5.

IV-C Results

IV-C1 CTCF

CTCF is a TF that is found throughout many species and is important for high-order chromatin structure[42]. A highly conserved motif with a spacing-CCCG pattern marks the binding sites of CTCF. In comparisons of DMD approaches, the DNA motif discovery process for CTCF was typically used. Here, CTCF was used to compare RL-MD, MEME, and ChIPMunk. A good DMD in biology is defined as an E-value between output motif and ground truth that is less than 1×1031\times 10^{-3}. In the final evaluation process, all three algorithms can provide good results on the two artificial datasets. Besides, RL-MD also produced CTCF motif results that were comparable to those of the MEME and ChIPMunk using real-world data (Table. I).

IV-C2 Gcn4

Gcn4 is a TF with a well-defined variable spacing motif[43]. All three approaches produced reliable motifs in simulated data, however the E-value is much greater than CTCF. These results indicate that the DMD on Gcn4 is a challenging task. The MEME and ChIPMunk both failed in the task using the real-world data. The N.A. indicates that the results generated by algorithms do not satisfy the need for TOMTOM to provide an E-value (Table. I).

V Conclusions

In this paper, we proposed a novel reinforcement learning-based computational framework, RL-MD, for DMD. RL-MD could take the input sequences and output the required motifs. In real-world data, RL-MD produces a better result for Gcn4. However, we are unable to achieve better results in the simulation dataset. These results may resulted from the lack of attention on the motifs that have already been recognized during the exploration process. Future research may focus on improving the noise generation process so that the agent can test additional potential motifs close to the identified motifs with high evaluation feedback. Besides, the more complicated network structures may be used in the future to improve the feature extraction ability and evaluate the final results.

VI Acknowledgement

We thank X. Qu and N. Cheng for advice and suggestions. This paper is supported by the Key Research and Development Program of Guangdong Province under grant No.2021B0101400003. Corresponding author is Jianzong Wang from Ping An Technology (Shenzhen) Co., Ltd ([email protected]).

References

  • [1] J. C. Castle, C. Zhang, J. K. Shah, A. V. Kulkarni, A. Kalsotra, T. A. Cooper, and J. M. Johnson, “Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines,” Nature Genetics, vol. 40, no. 12, pp. 1416–1425, 2008.
  • [2] A. J. Giraldez, Y. Mishima, J. Rihel, R. J. Grocock, S. Van Dongen, K. Inoue, A. J. Enright, and A. F. Schier, “Zebrafish mir-430 promotes deadenylation and clearance of maternal mrnas,” Science, vol. 312, no. 5770, pp. 75–79, 2006.
  • [3] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford, N. M. Hannett, J.-B. Tagne, D. B. Reynolds, J. Yoo et al., “Transcriptional regulatory code of a eukaryotic genome,” Nature, vol. 431, no. 7004, pp. 99–104, 2004.
  • [4] S. A. Lambert, A. Jolma, L. F. Campitelli, P. K. Das, Y. Yin, M. Albu, X. Chen, J. Taipale, T. R. Hughes, and M. T. Weirauch, “The human transcription factors,” Cell, vol. 172, no. 4, pp. 650–665, 2018.
  • [5] D. S. Johnson, A. Mortazavi, R. M. Myers, and B. Wold, “Genome-wide mapping of in vivo protein-dna interactions,” Science, vol. 316, no. 5830, pp. 1497–1502, 2007.
  • [6] A. Cornish-Bowden, “Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984.” Nucleic acids research, vol. 13, no. 9, p. 3021, 1985.
  • [7] T. L. Bailey, “Dreme: motif discovery in transcription factor chip-seq data,” Bioinformatics, vol. 27, no. 12, pp. 1653–1659, 2011.
  • [8] C. Jia, M. B. Carson, Y. Wang, Y. Lin, and H. Lu, “A new exhaustive method and strategy for finding motifs in chip-enriched regions,” PLoS One, vol. 9, no. 1, p. e86044, 2014.
  • [9] M. Thomas-Chollier, C. Herrmann, M. Defrance, O. Sand, D. Thieffry, and J. van Helden, “Rsat peak-motifs: motif analysis in full-size chip-seq datasets,” Nucleic Acids Research, vol. 40, no. 4, pp. e31–e31, 2012.
  • [10] J. Ding, H. Hu, and X. Li, “Siomics: a novel approach for systematic identification of motifs in chip-seq data,” Nucleic Acids Research, vol. 42, no. 5, pp. e35–e35, 2014.
  • [11] S. Sinha and M. Tompa, “A statistical method for finding transcription factor binding sites,” in Proceedings. International Conference on Intelligent Systems for Molecular Biology, vol. 8, 2000, pp. 344–354.
  • [12] I. V. Kulakovskiy, V. Boeva, A. V. Favorov, and V. J. Makeev, “Deep and wide digging for binding motifs in chip-seq data,” Bioinformatics, vol. 26, no. 20, pp. 2622–2623, 2010.
  • [13] H. Ikebata and R. Yoshida, “Repulsive parallel mcmc algorithm for discovering diverse motifs from large sequence sets,” Bioinformatics, vol. 31, no. 10, pp. 1561–1568, 2015.
  • [14] I. Kulakovskiy, V. Levitsky, D. Oshchepkov, L. Bryzgalov, I. Vorontsov, and V. Makeev, “From binding motifs in chip-seq data to improved models of transcription factor binding sites,” Journal of bioinformatics and computational biology, vol. 11, no. 01, p. 1340004, 2013.
  • [15] B. Alipanahi, A. Delong, M. T. Weirauch, and B. J. Frey, “Predicting the sequence specificities of dna-and rna-binding proteins by deep learning,” Nature Biotechnology, vol. 33, no. 8, pp. 831–838, 2015.
  • [16] D. R. Kelley, J. Snoek, and J. L. Rinn, “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks,” Genome Research, vol. 26, no. 7, pp. 990–999, 2016.
  • [17] Q. Zhang, L. Zhu, and D.-S. Huang, “High-order convolutional neural network architecture for predicting dna-protein binding sites,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 16, no. 4, pp. 1184–1192, 2018.
  • [18] C. Angermueller, H. J. Lee, W. Reik, and O. Stegle, “Deepcpg: accurate prediction of single-cell dna methylation states using deep learning,” Genome biology, vol. 18, no. 1, pp. 1–13, 2017.
  • [19] J. Zhang, W. Peng, and L. Wang, “Lenup: learning nucleosome positioning from dna sequences with improved convolutional neural networks,” Bioinformatics, vol. 34, no. 10, pp. 1705–1712, 2018.
  • [20] M. Wang, C. Tai, W. E, and L. Wei, “Define: deep convolutional neural networks accurately quantify intensities of transcription factor-dna binding and facilitate evaluation of functional non-coding variants,” Nucleic Acids Research, vol. 46, no. 11, pp. e69–e69, 2018.
  • [21] Y. He, Z. Shen, Q. Zhang, S. Wang, and D.-S. Huang, “A survey on deep learning in dna/rna motif mining,” Briefings in Bioinformatics, vol. 22, no. 4, p. bbaa229, 2021.
  • [22] T. Bailey and C. Elkan, “Fitting a mixture model by expectation maximization to discover motifs in biopolymers,” in Proceedings. International Conference on Intelligent Systems for Molecular Biology, vol. 2, 1994, pp. 28–36.
  • [23] L. J. Korn, C. L. Queen, and M. N. Wegman, “Computer analysis of nucleic acid regulatory sequences,” Proceedings of the National Academy of Sciences, vol. 74, no. 10, pp. 4401–4405, 1977.
  • [24] X. Liu, D. Brutlag, and J. Liu, “Bioprospector: discovering conserved dna motifs in upstream regulatory regions of co-expressed genes,” in Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2001, pp. 127–138.
  • [25] A. Lihu and Ş. Holban, “A review of ensemble methods for de novo motif discovery in chip-seq data,” Briefings in bioinformatics, vol. 16, no. 6, pp. 964–973, 2015.
  • [26] F. Zambelli, G. Pesole, and G. Pavesi, “Motif discovery and transcription factor binding sites before and after the next-generation sequencing era,” Briefings in bioinformatics, vol. 14, no. 2, pp. 225–237, 2013.
  • [27] X. S. Liu, D. L. Brutlag, and J. S. Liu, “An algorithm for finding protein–dna binding sites with applications to chromatin-immunoprecipitation microarray experiments,” Nature Biotechnology, vol. 20, no. 8, pp. 835–839, 2002.
  • [28] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999.
  • [29] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning.   PMLR, 2014, pp. 387–395.
  • [30] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
  • [31] Y. Hou, L. Liu, Q. Wei, X. Xu, and C. Chen, “A novel ddpg method with prioritized experience replay,” in 2017 IEEE international conference on systems, man, and cybernetics (SMC).   IEEE, 2017, pp. 316–321.
  • [32] J. Sved and A. Bird, “The expected equilibrium of the cpg dinucleotide in vertebrate genomes under a mutation model.” Proceedings of the National Academy of Sciences, vol. 87, no. 12, pp. 4692–4696, 1990.
  • [33] H. H. He, C. A. Meyer, S. S. Hu, M.-W. Chen, C. Zang, Y. Liu, P. K. Rao, T. Fei, H. Xu, H. Long et al., “Refined dnase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification,” Nature Methods, vol. 11, no. 1, pp. 73–78, 2014.
  • [34] S. Pietrokovski, “Searching databases of conserved sequence regions by aligning protein multiple-alignments,” Nucleic Acids Research, vol. 24, no. 19, pp. 3836–3845, 1996.
  • [35] I.-G. Choi, J. Kwon, and S.-H. Kim, “Local feature frequency profile: a method to measure structural similarity in proteins,” Proceedings of the National Academy of Sciences, vol. 101, no. 11, pp. 3797–3802, 2004.
  • [36] S. Gupta, J. A. Stamatoyannopoulos, T. L. Bailey, and W. S. Noble, “Quantifying similarity between motifs,” Genome Biology, vol. 8, no. 2, pp. 1–9, 2007.
  • [37] J. A. Castro-Mondragon, R. Riudavets-Puig, I. Rauluseviciute, R. Berhanu Lemma, L. Turchi, R. Blanc-Mathieu, J. Lucas, P. Boddie, A. Khan, N. Manosalva Pérez et al., “Jaspar 2022: the 9th release of the open-access database of transcription factor binding profiles,” Nucleic Acids Research, vol. 50, no. D1, pp. D165–D173, 2022.
  • [38] T. Barrett, S. E. Wilhite, P. Ledoux, C. Evangelista, I. F. Kim, M. Tomashevsky, K. A. Marshall, K. H. Phillippy, P. M. Sherman, M. Holko et al., “Ncbi geo: archive for functional genomics data sets—update,” Nucleic acids research, vol. 41, no. D1, pp. D991–D995, 2012.
  • [39] N. Mittal, J. C. Guimaraes, T. Gross, A. Schmidt, A. Vina-Vilaseca, D. D. Nedialkova, F. Aeschimann, S. A. Leidel, A. Spang, and M. Zavolan, “The gcn4 transcription factor reduces protein synthesis capacity and extends yeast lifespan,” Nature communications, vol. 8, no. 1, pp. 1–12, 2017.
  • [40] H. Wang, M. T. Maurano, H. Qu, K. E. Varley, J. Gertz, F. Pauli, K. Lee, T. Canfield, M. Weaver, R. Sandstrom et al., “Widespread plasticity in ctcf occupancy linked to dna methylation,” Genome research, vol. 22, no. 9, pp. 1680–1688, 2012.
  • [41] A. R. Quinlan and I. M. Hall, “Bedtools: a flexible suite of utilities for comparing genomic features,” Bioinformatics, vol. 26, no. 6, pp. 841–842, 2010.
  • [42] Z. Tang, O. J. Luo, X. Li, M. Zheng, J. J. Zhu, P. Szalaj, P. Trzaskoma, A. Magalska, J. Wlodarczyk, B. Ruszczycki et al., “Ctcf-mediated human 3d genome architecture reveals chromatin topology for transcription,” Cell, vol. 163, no. 7, pp. 1611–1627, 2015.
  • [43] T. Siggers and R. Gordân, “Protein–dna binding: complexities and multi-protein codes,” Nucleic Acids Research, vol. 42, no. 4, pp. 2099–2111, 2014.