] Author to whom the correspondence should be addressed
DNA Barcodes using a Cylindrical Nanopore
Abstract
We report an accurate method to determine DNA barcodes from the dwell time measurement of protein tags (barcodes) along the DNA backbone using Brownian dynamics simulation of a model DNA and use a recursive theoretical scheme which improves the measurements to almost 100% accuracy. The heavier protein tags along the DNA backbone introduce a large speed variation in the chain that can be understood using the idea of non-equilibrium tension propagation theory. However, from an initial rough characterization of velocities into “fast” (nucleotides) and “slow” (protein tags) domains, we introduce a physically motivated interpolation scheme that enables us to determine the barcode velocities rather accurately. Our theoretical analysis of the motion of the DNA through a cylindrical nanopore opens up the possibility of its experimental realization and carries over to multi-nanopore devices used for barcoding.
A DNA barcode consists of a short strand of DNA sequence taken from a targeted gene like COI or cox I (Cytochrome C Oxidase 1) [1] present in the mitochondrial gene in animals. The unique combination of nucleotide bases in barcode allows us to distinguish one species from another. Unlike relying on the traditional taxonomical identification methods, DNA barcoding provides an alternative and reliable framework to categorize a wide variety of specimens obtained from the natural environment. Though researchers relied on DNA sequencing techniques for the identification of unknown species for a long time, in 2003, Hebert et al. [2] proposed the mictocondrial gene (COI) region barcoding to classify cryptic species [3] from the entire animal population. Since then, several studies have shown the potential applications of barcoding in conserving biodiversity [4], estimating phyletic diversity, identifying disease vectors [5], authenticating herbal products [6], unambiguously labeling the food products [7, 8], and protecting endangered species [4]. Traditional sequencing methods based on chemical analysis are widely used in the biological community to determine the barcodes. Nanopore based sequencing methods [9] are being explored in a dual nanopore system for a cost effective, high throughput, chemical-free, and real time barcode generation.

The possibility of determining DNA barcodes have been demonstrated in a dual nanopore device, by scanning a captured dsDNA multiple times by applying a net periodic bias across the two pores [9, 10, 11, 12]. Theoretical and simulation studies have also been reported in the context of a double nanopore system [13, 14, 15]. In this article, we investigate a similar strategy in silico in a cylindrical nanopore and demonstrate that a cylindrical nanopore can have a competitive advantage over a dual nanopore system. By studying a model dsDNA with barcodes using Brownian dynamics we establish an important result that it is due to the disparate dwell time and speed of the barcodes (“tags”) compared to the nucleotide segments
Tag # | ||||||||
---|---|---|---|---|---|---|---|---|
Position | 154 | 369 | 379 | 399 | 614 | 625 | 696 | 901 |
Separation | 154 | 215 | 10 | 20 | 215 | 11 | 71 | 205 |
(“monomers”) the current blockade time information only is not enough and will lead to an inevitable underestimation of the distance between the barcodes. Furthermore, using the ideas of the tension propagation theory [16, 17], we demonstrate that information about the fast-moving nucleotides in between the barcodes,- not easily accessible experimentally is a key element to resolve the underestimation. We suggest how to obtain this information experimentally and provide a physically motivated “two-step” interpolation scheme for an accurate determination of barcodes, even when the separation of (unknown) tags has a broad distribution.
The Model System: Our in silico coarse-grained (CG) model of a dsDNA consist of 1024 monomers interspersed with 8 barcodes at different locations shown in Fig. 1 and Table-I is motivated by an experimental study by Zhang et al. on a 48500 bp long dsDNA with 75 bp long protein tags at random locations along the chain [10, 11, 12] using a dual nanopore device. Here we explore if a cylindrical nanopore with applied biases at each end can resolve the barcodes with similar accuracy or better. We purposely choose positions of the 8 barcodes (Table-I) to study how the effect of disparate distances among the barcodes affects their measurements. The tags , , are closely spaced and form a group. Likewise, another group consisting of and are put in a closer proximity to . The tags and are further apart from the rest of the tags. The general scheme of the BD simulation strategy for a translocating homo-polymer under alternate bias has been discussed in our recent publication [13, 14] and in the Appendix A.
In this article, tags are introduced by choosing the mass and friction coefficient at tag locations to be different than the rest of the monomers along the chain. This requires modification of the BD algorithm as discussed in the Appendix A. The protein tags used in the experiments [10, 11, 12] translate to about three monomers in the simulation. The heavier and extended tags introduce a larger viscous drag. Instead of explicitly putting side-chains at the tag locations, we made the mass and the friction coefficient of the tags 3 times larger. This we find enough to resolve the distance between the tags. Two forces and at each end of the cylinder in opposite directions keep the DNA straight inside the channel and allows translocation in the direction of the net bias (please see Fig. 1 and Fig. 2).
Barcodes from repeated scanning: As potentially could be done in a nanopore experiments, we switch the differential bias once the first tag or the last tag ( , ) translocates through the nanopore during up()/down() translocation yet having end segments inside the pore (please see Fig. 2) so that the DNA remains captured in the cylindrical pore and the barcodes are scanned multiple times.


The question we ask: can we recover the actual barcode locations from these scanning measurements, so that the method can be applied to determine unknown barcodes ? We monitor two important quantities, - the dwell time of each monomer and the time delay of arrival of two successive monomers at the pore as demonstrated in Fig. 3 and explained below. For each up/down-ward scan we measure the dwell times of the monomer as follows:
(1a) | |||
(1b) |
Here and are the arrival and exit times of the monomer with index as further demonstrated in Fig. 3(a). The corresponding dwell velocities and for the bead (either a monomer or a tag) along the channel axis (please see Fig. 3(a)) can be obtained as follows.
(2a) | |||
(2b) |
In an actual experiment one measures the dwell velocities of the tags only which are equivalent to the current blockade times.
Non uniformity of the dwell velocity: The presence of tags with heavier mass () and larger solvent friction () introduces a large variation in the dwell time and hence a large variation in the dwell velocities of the DNA beads and tags (see Fig. 4). In general, there is no up-down symmetry for the dwell time/velocity as tags are not located symmetrically along the chain backbone. Thus the physical quantities are averaged over and translocation data. The average dwell velocity clearly shows two different velocity envelopes - the tags residing at the lower envelope. Fig. 4 shows that

the dwell velocities of the tags (green circle ) are significantly lower than the velocity of the nucleotides in between the tags, which will underestimate the barcode distances as explained later. We further notice that increasing the pore width resolves the barcodes better.
Barcode estimation using a cylindrical nanopore setup: If the dsDNA with barcodes were a rigid rod, then one could obtain the barcode distances and between tags and from the following equations (shown for downward translocation only):
(3a) | ||||
(3b) | ||||
(3c) |
Here is the time delay of arrivals of and for downward translocation (please see Fig. 3(b) which explains the special case when and ). Similar Equations can be obtained by flipping and with and respectively. In other words, Eqn. 3 gives the shortest distance and not necessarily the contour length (the actual distance) between the tags. However, this is the only data accessible through experiments and likely to provide an underestimation of the barcodes. Fig. 5(a) shows the data for 300 scans. The average with error bars are shown in the 3rd column of Table-II. Excepting for these measurements grossly underestimate the actual positions with large error bars.

Tag | Relative | Barcode | Barcode | Barcode |
---|---|---|---|---|
Label | Distance | (Eqn. 3) | (Method-I) | (Method-II) |
w.r.t | ||||
460 | 373 122 | 459 59 | 460 43 | |
245 | 197 67 | 250 39 | 250 32 | |
235 | 183 63 | 237 38 | 237 32 | |
215 | 167 54 | 211 35 | 211 30 | |
0 | 0 | 0 | 0 | |
11 | 11 3 | 14 4 | 11 3 | |
82 | 68 23 | 86 23 | 86 21 | |
287 | 230 73 | 287 65 | 287 73 |
Tension Propagation (TP) Theory explains the source of discrepancy and provides solution: Unlike a rigid rod, tension propagation governs the semi-flexible chain’s motion in the presence of an external bias. In TP theory and its implementation in Brownian dynamics, the motion of the subchain in the cis side decouples into two domains [16, 17]. In the vicinity of the pore, the tension front affects the motion directly while the second domain remains unperturbed, beyond the reach of the TP front. In our case, after the tag translocates through the pore, preceding monomers are dragged into the pore quickly by the tension front, analogous to the uncoiling effect of a rope pulled from one end. The onset of this sudden faster motion continues to grow and reaches its maximum until the tension front hits the subsequent tag , with larger inertia and viscous drag. At this time (called the tension propagation time [18]) the faster motion of the monomers begins to taper down to the velocity of the tag . This process continues from one segment to the other. Fig. 6 shows an example on how the segment connecting and has non-monotonic velocity under the influence of the tension front.

These contour lengths of faster moving segments in between two barcodes are not accounted for in Eqn 3. The experimental protocols are limited in extracting barcode information through Eqn. 3 (measuring current blockade time) and therefore, likely to underestimate the barcodes, unless the data is corrected to account for the faster moving monomers in between two tags.
How to determine the barcodes correctly ? Fig. 1(b) and the column of Table-II when looked closely provide clues to the solution of the underestimated tag distances. We note that locations of the isolated tags (such as, and ) far from have a larger error bar while which is adjacent to has the correct distance from Eqn. 3. It is simply because in the later case the contour length between and is almost equal to the shortest distance. Evidently, the error bars increase with increased separation.
To compare the barcodes obtained from Eqn. 3 with the actual contour length (see column of Table-II) between tag pairs, we invoke the Flory theory to determine the scaling exponent [19] which reveals the behavior of the segments under translocation. The heatmap in Fig. 7 confirms that when the separation between the tag pairs is less compared to the DNA length, the connecting segment behaves like a rigid rod (). While for the isolated tags, suggests that barcodes are shorter than their respective contour lengths. This clarifies the reason behind the barcode underestimation for the tags which are spaced apart while yielding accurate barcodes for tags located in groups.

Within the experimental set up we suggest the following two methods which will account for the larger velocities of the monomers.
Method 1 - Barcode from known end-to-end Tag distance: In order to measure the barcode distances accurately one thus needs the velocity of the entire chain. If the distance between and ) , then the velocity of the segment will approximately account for the average velocity of the entire chain and correct the problem as demonstrated next. First we estimate the velocity of the chain
(4) |
assuming we know and is the time delay of arrival at the pore between and for translocation. We then estimate the barcode distance between tags and as
(5) |
In the similar fashion one can calculate using and information respectively. How do we know ? One can use and , from Eqn. 6 where is the the average velocity of the scanned length from repeated scanning as discussed in the next paragraph. This method is effective for estimating the long-spaced barcodes but it overestimates the barcode distance if multiple barcodes are close by as evident in Fig. 5(d) and the column of table-II. Thus, we know how to obtain barcode distances accurately when they are close by (from Eqn. 3) and for large separation (Eqn. 5). We now apply the physics behind these two schemes to derive an interpolation scheme that will work for all separations among the barcodes.
Method 2 - Barcode using two-step method: Average scan time for the entire chain (which can be measured experimentally) is a better way to estimate the average velocity of the chain. is the maximum length up to which the dsDNA segment remains captured inside the nanopore gets scanned and denotes the theoretical maximum beyond which the dsDNA will escape from the nanopore, thus, . For example, in our simulation, scanning length . We denote the average scan velocity as
(6) |
where is the scan time for the event, and . To proceed further, we use our established results that the monomers of the dsDNA segments in between the tags move with velocity , while tags move with their respective dwell velocities and (Eqn. 2). We then calculate the segment velocity between two tags by taking the weighted average of the velocities of tags and DNA segment in between as follows.
First, we estimate the approximate number of monomers ( is the bond-length) by considering the tag velocities only using Eqn. 3. We then calculate the segment velocity accurately by incorporating weighted velocity contributions from both the tags and the monomers between the tags.
(7) |
The barcodes are finally estimated by multiplying the calculated 2-step velocity in Eqn. 7 above by the tag time delay as
(8) |
for translocation and repeating the procedure for translocation. This 2-step method accurately captures the distance between the barcodes when the two tags are in proximity or spaced apart from each other. Table-II and Fig. 5 summarize our main results and claims.
Summary & Future work: Motivated by the recent experiments we have designed barcode determination experiment in silico in a cylindrical nanopore using the Brownian dynamics scheme on a model dsDNA with known locations of the barcodes. We have carefully chosen the locations of the barcodes so that the separations among the barcodes span a broad distribution. We discover that if we use the dwell time data only for the barcodes from multiple scans of the dsDNA to calculate the average velocities of the tags then the method underscores the barcode distances for tags further apart. Our simulation guides us to conclude that the source of this underestimation lies in neglecting the information contained in the faster moving DNA segments in between any two tags. We use non-equilibrium tension propagation theory to explain the non-monotonic velocity of the chain segments where the barcodes lie at the lower bound of the velocity envelope as shown in Fig. 4. The emerging picture readily shows the way how to rectify this error by introducing an interpolation scheme that works well to determine barcodes spaced apart for all distances which we validate using simulation data. We suggest how to implement the scheme in an experimental setup. It is important to note that the interpolation scheme-based concept of the TP theory is quite general and we have ample evidence that this will work in a double nanopore system as well.
Conflicts of interest: The authors declare no competing financial interest.
Acknowledgements: The research at UCF has been supported by the grant number 1R21HG011236-01 from the National Human Genome Research Institute at the National Institute of Health. All computations were carried out at the UCF’s high performance computing platform STOKES.
Appendix A The Model and Brownian dynamics simulation
Our BD scheme is implemented on a bead-spring model of a polymer with the monomers interacting via an excluded volume (EV), a Finite Extension Nonlinear Elastic (FENE) spring potential, and a bond-bending potential enabling variation of the chain persistence length (Fig.A1). The model, originally introduced for a fully flexible chain by Grest and Kremer [20], has been studied quite extensively by many groups using both Monte Carlo (MC) and various molecular dynamics (MD) methods [21]. Recently we have generalized the model for a semi-flexible chain and studied both equilibrium and dynamic properties [18, 22, 23] and studied compression dynamics of a model dsDNA inside a nanochannel [24, 25] . The mutual EV interaction among any two monomers are given by the truncated Lennard-Jones (LJ) potential with a cut-off radius
(9) |
where is the effective diameter of a monomer and is the interaction strength. To mimic the connectivity between two adjacent monomers, finite-extensible-non-linear elastic (FENE) potential

(10) |
is used with the maximum bond-stretching length and spring constant . Here, is the separation distance between two adjacent monomers and located at and respectively. Along with these two potentials, we introduce a bending potential
(11) |
with bending rigidity . In three dimensions, for , the persistence length of the chain is related to via [26]
(12) |
where is the Boltzmann constant and is the temperature. Here is the bond angle between two subsequent bond vectors and . A cylindrical nanopore of diameter is drilled through a solid material of thickness consists of immobile and purely repulsive LJ particles. Our model of DNA polymer consists monomer beads along with heavier tags ( - ) located at positions , and respectively (please refer to Fig. 2 and Table-I in the main article). A recent study by Zhang et al. on 48512 bp long dsDNA uses 75 bp long protein tags as barcodes [10]. In simulation, we purposely choose the mass of a tag () three times heavier of a normal monomer to replicate the tags used in the experiments. We proportionally increase the solvent friction of the tags . We use the Brownian dynamics to solve the equation of motion of a monomer having a mass and solvent friction as
(13) |
where is the frictional coefficient arising from solvent-monomer interaction. For the case of a tag, and . The Gaussian white noise arising from thermal fluctuation is delta correlated and expressed as with in three dimension. We express length and energy in units of and respectively such that . The parameters for FENE potential in Eq. (10) are and , and set to be and . The numerical integration of Eq. (13) is implemented using the algorithm introduced by Gunsteren and Berendsen [27]. Our previous experiences with BD simulation suggests that for a time step these parameters values produce stable trajectories over a very long period of time and do not lead to unphysical crossing of a bond by a monomer [22, 23]. The average bond length stabilizes to with negligible fluctuation regardless of the chain size and rigidity [22]. Hence we relate the polymer’s contour length and the number of monomers as .
References
- [1] Hebert, P. D. N.; Ratnasingham, S.; de Waard, J. R. Barcoding Animal Life: Cytochrome c Oxidase Subunit 1 Divergences among Closely Related Species. Proc. R. Soc. Lond. B 2003, 270, 96.
- [2] Hebert, P. D. N.; Cywinska, A.; Ball, S. L.; deWaard, J. R. Biological Identifications through DNA Barcodes. Proc. R. Soc. Lond. B 2003, 270 (1512), 313-321.
- [3] Hebert, P. D. N.; Penton, E. H.; Burns, J. M.; Janzen, D. H.; Hallwachs, W. Ten Species in One: DNA Barcoding Reveals Cryptic Species in the Neotropical Skipper Butterfly Astraptes Fulgerator. Proceedings of the National Academy of Sciences 2004, 101 (41), 14812-14817.
- [4] Vernooy, R.; Haribabu, E.; Muller, M. R.; Vogel, J. H.; Hebert, P. D. N.; Schindel, D. E.; Shimura, J.; Singer, G. A. C. Barcoding Life to Conserve Biological Diversity: Beyond the Taxonomic Imperative. PLoS Biol 2010, 8 (7), e1000417.
- [5] Besansky, N. J.; Severson, D. W.; Ferdig, M. T. DNA Barcoding of Parasites and Invertebrate Disease Vectors: What You Don’t Know Can Hurt You. Trends in Parasitology 2003, 19 (12), 545-546.
- [6] Techen, N.; Parveen, I.; Pan, Z.; Khan, I. A. DNA Barcoding of Medicinal Plant Material for Identification. Current Opinion in Biotechnology 2014, 25, 103-110.
- [7] Xiong, X.; Yuan, F.; Huang, M.; Lu, L.; Xiong, X.; Wen, J. DNA Barcoding Revealed Mislabeling and Potential Health Concerns with Roasted Fish Products Sold across China. 2019, 82 (7), 1200-1209.
- [8] Wong, E. H.-K.; Hanner, R. H. DNA Barcoding Detects Market Substitution in North American Seafood. Food Research International 2008, 41 (8), 828-837.
- [9] Pud, S.; Chao, S.-H.; Belkin, M.; Verschueren, D.; Huijben, T.; van Engelenburg, C.; Dekker, C.; Aksimentiev, A. Mechanical Trapping of DNA in a Double-Nanopore System. Nano Lett. 2016, 16 (12), 8021-8028.
- [10] Zhang, Y.; Liu, X.; Zhao, Y.; Yu, J.-K.; Reisner, W.; Dunbar, W. B. Single Molecule DNA Resensing Using a Two-Pore Device. Small 2018, 14 (47), 1801890.
- [11] Liu, X.; Zhang, Y.; Nagel, R.; Reisner, W.; Dunbar, W. B. Controlling DNA Tug-of-War in a Dual Nanopore Device. Small 2019, 15 (30), 1901704.
- [12] Liu, X.; Zimny, P.; Zhang, Y.; Rana, A.; Nagel, R.; Reisner, W.; Dunbar, W. B. Flossing DNA in a Dual Nanopore Device. Small 2020, 16 (3), 1905379.
- [13] Bhattacharya, A.; Seth, S. Tug of War in a Double-Nanopore System. Phys. Rev. E 2020, 101 (5).
- [14] Seth, S.; Bhattacharya, A. Polymer Escape through a Three Dimensional Double-Nanopore System. J. Chem. Phys. 2020, 153 (10), 104901.
- [15] Choudhary, A.; Joshi, H.; Chou, H.-Y.; Sarthak, K.; Wilson, J.; Maffeo, C.; Aksimentiev, A. High-Fidelity Capture, Threading, and Infinite-Depth Sequencing of Single DNA Molecules with a Double-Nanopore System. ACS Nano 2020, 14 (11), 15566-15576.
- [16] Sakaue, T. Nonequilibrium Dynamics of Polymer Translocation and Straightening. Phys. Rev. E 2007, 76 (2).
- [17] Ikonen, T.; Bhattacharya, A.; Ala-Nissila, T.; Sung, W. Influence of Non-Universal Effects on Dynamical Scaling in Driven Polymer Translocation. The Journal of Chemical Physics 2012, 137 (8), 085101.
- [18] Adhikari, R.; Bhattacharya, A. Driven Translocation of a Semi-Flexible Chain through a Nanopore: A Brownian Dynamics Simulation Study in Two Dimensions. The Journal of Chemical Physics 2013, 138 (20), 204909.
- [19] Rubinstein, M.; Colby, R. H. Polymer physics. Oxford: Oxford University Press 2003.
- [20] Grest, G. S.; Kremer, K. Molecular Dynamics Simulation for Polymers in the Presence of a Heat Bath. Phys. Rev. A 1986, 33 (5), 3628-3631.
- [21] Binder, K. Monte Carlo and Molecular Dynamics Simulations in Polymer Science; Oxford University Press, 1995, Chap. 2.
- [22] Huang, A.; Bhattacharya, A.; Binder, K. Conformations, Transverse Fluctuations, and Crossover Dynamics of a Semi-Flexible Chain in Two Dimensions. The Journal of Chemical Physics 2014, 140 (21), 214902.
- [23] Huang, A.; Adhikari, R.; Bhattacharya, A.; Binder, K. Universal Monomer Dynamics of a Two-Dimensional Semi-Flexible Chain. EPL 2014, 105 (1), 18002.
- [24] Huang, A.; Reisner, W.; Bhattacharya, A. Dynamics of DNA Squeezed Inside a Nanochannel via a Sliding Gasket. Polymers 2016, 8 (10), 352.
- [25] Bernier, S.; Huang, A.; Reisner, W.; Bhattacharya, A. Evolution of Nested Folding States in Compression of a Strongly Confined Semiflexible Chain. Macromolecules 2018, 51 (11), 4012-4022.
- [26] Landau, L. D.; Lifshitz, E. M.Statistical Physics; Pergamon Press 1981.
- [27] van Gunsteren, W. F.; Berendsen, H. J. C. Algorithms for Brownian Dynamics. Molecular Physics 1982, 45 (3), 637-647.