Addressing Trust Challenges in Blockchain Oracles using Asymmetric Byzantine Quorums

Fahad Rahman1, Chafiq Titouna2 and Farid Naït-Abdesselam3
1Université Paris Cité, France
2LIGM, ESIEE Paris, University Gustave Eiffel, France
3University of Missouri Kansas City, USA

Abstract

Distributed computing in Blockchain Technology (BCT) hinges on a trust assumption among independent nodes. Without a third-party interface or what’s known as a ‘Blockchain Oracle’, it can’t interact with the external world. This Oracle plays a crucial role by feeding extrinsic data into the Blockchain, ensuring that Smart Contracts operate accurately in real time. The ‘Oracle problem’ arises from the inherent difficulty in verifying the truthfulness of the data sourced by these Oracles. The genuineness of a Blockchain Oracle is paramount, as it directly influences the Blockchain’s reliability, credibility, and scalability. To tackle these challenges, a strategy rooted in Byzantine fault-tolerance $\phi$ is introduced. Furthermore, an autonomous system for sustainability and audibility, built on heuristic detection, is put forth. The effectiveness and precision of the proposed strategy outperformed existing methods using two real-world datasets, aimed to meet the authenticity standards for Blockchain Oracles.

Index Terms:

Blockchain Oracles, Trust Assumption, Asymmetric Byzantine Quorums, Smart Contracts, Oracle Data Reliability, Blockchain Scalability Solutions, Decentralized Applications (DApps).

I Introduction

The BCT operates as a unified system of distributed ledgers underpinned by consensus mechanisms. Without an intermediary or interface, it remains disconnected from the external environment. Any data recorded on a single node is automatically mirrored across the entire Blockchain network. Digital agreements stored within Blockchain nodes are termed ‘Smart Contracts’ [1] These are self-executing code segments that activate based on specific inputs/ outputs, running autonomously within the Blockchain[2]. The external data channeled into Smart Contracts is termed as ‘Blockchain Oracle’ [3] [4].

The name ‘Oracle’ isn’t tied to a particular device (like IoT) or software. Drawing from Greek mythology, an ‘Oracle’ was an individual or entity believed to have a direct line to the divine, offering predictions of the future. Historically, Oracles served as a source of knowledge beyond human understanding, guiding those who lacked the necessary information to decide [5] [6]. As depicted in Fig. 1, a Smart Contract relies on authentic extrinsic data to operate,

Refer to caption — Figure 1: Data transition, real-world to Blockchain

The significance of studying Blockchain Oracles stems from their pivotal role in linking Blockchain mechanisms, especially Smart Contracts, with real-world data. Various Blockchain-driven systems, including prediction markets for forecasting, currency exchange platforms, sports betting, and weather reporting applications [7] [8] [9], some salient applications of Blockchain Oracle are mentioned in Table I.

At the heart of the Blockchain Oracle dilemma is the need to guarantee that data sourced from the external environment and fed into a Smart Contract through a Blockchain Oracle is both trustworthy and unquestionably reliable for all Smart Contract stakeholders[10]. If a Blockchain Oracle provides erroneous or misleading information, it risks undermining the entire Blockchain system’s trustworthiness. Within a distributed network, even if most nodes operate with integrity, a minority presenting compromised [11], inaccurate, or biased values can introduce inconsistencies in Smart Contract outcomes. Pinpointing a rogue node based on its output is also part of the Oracle challenge [12]. Additionally, scalability, measured by transactions per second (TPS), remains a prominent concern in both public[13] and private Blockchains[14]. The operational efficiency of a Blockchain Oracle significantly influences overall Blockchain scalability [15].

TABLE I: Use of decentralized oracle in real-world applications

Application Area	Description	Use of Oracles
Supply Chain Management[16] [17]	Sensors monitor conditions like temperature, humidity, and location during product shipment.	Ensures data integrity and transparency on product history, reducing disputes among parties
Smart Agriculture[18]	Sensors track soil moisture, weather, and crop health.	Automates farming activities and insurance payouts, based on reliable, real-time data
Energy Grid Management[19]	Sensors record energy production and consumption in decentralized grids.	Balances supply, demand, and pricing in real-time, enabling transparent energy trading
Environmental Monitoring[20]	Sensors worldwide track pollution levels, temperature, and deforestation.	Provides tamper-proof environmental data for reporting and carbon credit trading
Healthcare Monitoring[21]	Wearable devices monitor patient health metrics.	Updates Blockchain records securely, ensuring accurate, immutable medical data for remote healthcare
Automated Insurance[22]	Sensors detect conditions meeting insurance claim criteria, like car accidents or home damage.	Triggers automatic, tamper-proof claims processing and payouts
Smart Property Management [23]	Building sensors monitor occupancy, temperature, and security.	Automates building management, energy efficiency, and lease agreements
Quality Assurance in Manufacturing [24]	Production line sensors identify defects or equipment issues.	Triggers quality control and maintenance, ensuring product standards
Traffic and Urban Planning [25]	City-wide sensors gather traffic, parking, and public transport data.	Informs urban planning and automates traffic management or toll payments
Seismic Activity Reporting [26]	Seismic sensors detect geological events.	Enables rapid, reliable data recording for early warnings and emergency responses

A scalable and precise Blockchain Oracle is crucial to avert lags, inaccuracies, and possible security vulnerabilities when processing Smart Contracts dependent on extrinsic information. This study presents the Asymmetric Byzantine Quorums (ABQ) method, which fully supports Byzantine fault-tolerance, aiming to promptly ascertain trustworthy and accurate Blockchain Oracle values. Additionally, we’ve integrated a heuristic-driven detection system to pinpoint malicious entities and ensure traceability. Our proposed methodology is apt for both Public and Private Blockchains. The model offers extensive potential for data aggregation in Oracle, suitable for a multitude of real-world decentralized Blockchain Oracle applications.

The structure of this paper is outlined as follows: Section II delves into relevant literature and related work. The system model is concisely introduced in Section III, followed by a detailed discussion of the methodology in Section IV. Section V assesses simulations using real-world datasets, while Conclusion and Future Work are discussed in Section VI.

II Related Work

The challenge of the Blockchain Oracle has been tackled in various research papers. This section highlights some notable contributions: Ellis et al., in their work [27], introduced a weightage-based Oracle approach. Each IoT/ input stream was assigned a specific weightage for Oracle input. Problems arise if the $IoT$ device with the highest repute provides notably different data to Oracle, jeopardizing the entire Blockchain system’s integrity. In another study, Adler et al. [28] approached the issue of Oracle through ’Game-theory’. They presented a scheme of bi-layered voting: the first layer is comprised of voters, while the second one consists of certifiers. Voters received lesser rewards compared to certifiers. If a certifier detects discrepancies in the voters’ results, they would earn a substantial reward. While this system often produced accurate outputs, vulnerabilities arose if a certifier was compromised or if a certifier and voter colluded to provide false data, compromising the system’s integrity.

Tian et al. [29] noted the challenge of anticipating attacker strategies due to the attackers’ varied nature. Attackers lack knowledge of the comprehensive reputation management system and the readings from competing IoT devices. The authors used both entity and data-centric schemes to devise a foundational trust management computation model. In this framework, each vehicle and traffic event notification possessed distinct reputation values. Those with zero reputation were deemed unreliable and excluded from the system and broadcast lists. This reputation mechanism was anchored in Game theory principles, with nodes of higher reputation posing greater risks. Lastly, Heiss et al. [30] put forth a voting-centric system. Here, the value of every $IoT$ device received votes, and the value with the highest vote is deemed accurate. A designated time window existed for vote submission, followed by automated vote tallying. Voters were rewarded for accurate values. The system also featured weighted voting, wherein rewards and penalties were determined based on Game theory principles. However, the system faced vulnerabilities if highly-voted nodes were hacked or malfunctioned.

TABLE II: Existing related work

Authors	Scalability	Oracle Type	Methodology	Strengths	Weaknesses
S. Ellis et al. [27]	Yes	Centralized	Weighted scoring approach	Fast processing	Low reliability
Adler et al. [28]	No	Centralized	Cross-layer approach	Game theory	Complex
Tian et al. [29]	No	Centralized	Reputation-based system	Fast processing	Low reliability
Heiss et al. [30]	No	Centralized	Voting-based system	Game theory	Low reliability
Berger et al. [31]	Yes	Centralized	Geographical scalability	Fast processing	Low reliability
Tseng et al. [32]	No	Centralized	Synchronous and Asynchronous model	Fast processing	Entity integrity
Wei et al. [33]	No	Centralized	Neighbour discovery algorithm	Low energy	Data overlapping

Berger et al., in their study [31], introduced the ’Adaptive Wide-Area REplication’ (AWARE) approach. This approach aims to enhance the geographical scalability of consensus among nodes dispersed across vast physical distances. The approach integrates a voting and weightage-based model to form distinct ’Asymmetric Quorums’. The system’s integrity and dependability are influenced by the weightage and quantity of nodes, guiding the creation of quorums. In another work by Tseng et al. [32], two communication models were put forth: Synchronous and Asynchronous. In the synchronous model, communication unfolds in cycles, with each node following a specific sequence. On the other hand, the asynchronous model lacks a predefined communication sequence, allowing nodes to operate randomly and messages to be delayed unpredictably. Nodes in this model communicate with their immediate neighbors via established, dedicated ports. A vulnerability within this system is that if a transmitting device is duplicated and then dispatches malicious data under the original identity, the entire system’s security is jeopardized.

Wei et al., in their paper [33], proposed a dual-direction neighbor discovery mechanism for IoT devices, although its foundation is based on single-direction discovery. The duration of active slots for devices has been minimized, resulting in reduced energy consumption when identifying new IoT neighbors. For neighbor discovery, they employed an asymmetric neighboring model using ‘Pure-Transmitting’ (PS) and ‘Pure-Listening’ (PL) intervals. The overlapping approach discussed in their paper offers a superior solution to the challenges they addressed.

A review of the aforementioned studies and Table II reveals a gap: there is no method that effectively filters out malicious or compromised data. Furthermore, a single erroneous data entry can have catastrophic repercussions for the entire system. Similarly, techniques based on voting and weighting can occasionally exacerbate issues.

III System Model

We introduced the ABQ approach to ensure accurate, reliable, and scalable Oracle for the Smart Contract/ Blockchain. Consider a large-scale farm, equipped with various IoT devices to monitor soil moisture, temperature, and humidity levels for optimized irrigation. The objective is to automate irrigation based on real-time data from IoT devices using a Blockchain-powered Smart Contract. The Blockchain ensures data integrity, and the Smart Contract ensures that water is released only when required. In this scenario ‘Actors’ are IoT devices (soil moisture sensors, temperature sensors, humidity sensors), Blockchain Oracle, Blockchain network with Smart Contracts, Farm owner or manager, and Watering system.

Initially, the farm owner deploys a Smart Contract on the Blockchain. This contract is designed to receive data from Oracle and execute the irrigation process based on predefined conditions. IoT devices are set up across the farm and connected to an IoT platform that collects and sends data to the Oracle. Every hour, IoT devices collect data about soil moisture, temperature, and humidity. This data is sent to the IoT platform. The Oracle retrieves data from the IoT platform. To ensure data reliability, it can pull data from multiple sources or verify it against multiple similar IoT devices. The Oracle then sends this verified data to the Blockchain. Once the Smart Contract on the Blockchain receives the data from Oracle, it checks the conditions. For instance, if soil moisture is below 50% and the temperature is above 30°C, then activate the watering system, if the humidity is above 80%, then delay watering regardless of other conditions. Based on the conditions set in the Smart Contract and the data received from the Oracle, the Smart Contract sends a command to the watering system to start or stop the irrigation process. After irrigation, IoT devices will continue to monitor the conditions. If the soil moisture reaches an acceptable level, the Smart Contract may send a command to stop the watering. This creates a feedback loop ensuring optimal watering based on real-time conditions.

All transactions (data inputs and irrigation commands) are recorded on the Blockchain, ensuring transparency. The farm owner or manager can review the Blockchain records to verify that the irrigation was done based on actual field conditions. This helps in building trust in the automated system. By using Blockchain, the farm owner has a transparent and tamper-proof record of all actions taken by the system. Water resources are used more efficiently, leading to cost savings and sustainable farming. In this scenario the main challenge is the reliability of Blockchain Oracle, the input IoT devices can malfunction/be compromised or hacked. It’s crucial to have reliable and scalable Oracle to ensure that Oracle sends accurate data to Smart Contract. In the case of complex Oracle calculations, there might also be delays in data transmission and processing from the Oracle to the Blockchain. This is also accounted for in the proposed Blockchain Oracle model.

Our suggested Oracle system offers input from the external environment (outside Blockchain) by gathering temperature and other readings from IoT devices. These readings trigger the Smart Contract’s execution. Our algorithm filters out any erroneous or compromised readings from the IoT devices, gathering only authentic readings into a data array. Correct readings from various IoT devices, captured at a particular moment, form the ABQ. The quorum’s average becomes the definitive value for the Oracle [34], as depicted in Fig. 2.

The aim of this study is to address external influences, such as hacking attempts or errors in transmission/ data processing, that might compromise Oracle data. With the described technique, data from $IoTdevices$ undergoes processing across distinct units. Each of these units aggregates readings within a specified time frame to data quorum. Sudden deviations/ irregularities in the readings are instantly detected and removed within the quorums. The proposed Oracle plays an intermediary role between sensors/ $IoTdevices$ and Smart Contracts.

IV Methodology

In this study, a method that leverages a Byzantine Fault Tolerant (BFT) strategy using ’Asymmetric trust’ [35] to filter out deceptive readings from IoT devices, ensuring a consistent and authentic value of Blockchain Oracle is introduced. Data acquired from IoT devices is organized into an array. This data undergoes evaluation based on a pre-set fault-tolerance $\phi$ threshold to establish the proposed $ABQ$ . The collected data, accumulated at regular time intervals from various $IoT-devices$ within a designated time frame, constitutes the quorum(s). A quorum is derived from the IoT data that satisfies the $\phi$ conditions after the spontaneous removal of malicious entries. For instance, if the $\phi$ threshold is set at $2$ and the discrepancy between the chosen median and the data values surpasses this threshold ( $\phi$ $>$ $2$ ) that particular value will be automatically discarded. Any data value falling outside the $\phi$ range is likely to be erroneous or malicious. This study is segmented into 2 primary sections:

A.

Establishment of $ABQ$ and Determining Oracle Values
B.

Detection of faulty/ compromised device(s)

IV-A Establishment of ABQs and Determining Oracle Values

The basic premise of BCT is predicated on the idea that at least two-thirds $(2/3)$ of its participants are acting truthfully. In this study, we have defined an ABQ as an asymmetric aggregation of data units, consisting of a set that exceeds half the sum of the total number of processes ( $N_{p}$ ) and the number of faulty processes ( $f_{p}$ ), mathematically expressed as ( $N_{p}$ + $f_{p}$ )/2. The establishment of a BQ adheres to three core proposed assumptions, with the first being fundamental and the subsequent two being consequential derivations [36].

IV-A1 BQ Primary ( $1^{st}$ ) Property

At least one ( $BQ$ ) exists containing solely accurate readings, as illustrated in Fig. 3 a. A quorum can be characterized by its correct processes. Values within a particular quorum that follow a specific order are deemed correct based on primary ( $1^{st}$ ) property,

IV-A2 BQ’s Secondary ( $2^{nd}$ ) Property

Any pair of BQs should have an intersection that encompasses at least one accurate value. When two divergent clusters overlap, the common value they share, as depicted in Fig. 3 b, stands as the authentic and trustworthy value affirmed by both sets. It’s important to highlight that as the frequency of quorum intersections on a singular value increases (within the same time frame), the resultant values exhibit enhanced accuracy. For any two quorums $Q_{1}$ and $Q_{2}$ in the quorum system, there must be at least one node that belongs to both $Q_{1}$ and $Q_{2}$ . Equation 1, for all $Q_{1}$ and $Q_{2}$ in the quorum system:

\displaystyle Q_{1}\cap Q_{2}

\displaystyle\neq\varnothing

(1)

This property is essential for reaching consensus and preventing conflicts in the system. The quorum system must ensure that any two quorums that intersect (satisfying the Intersection property) must also have at least one node in common that is non-faulty. Equation 2, for all $Q_{1}$ and $Q_{2}$ in the quorum system where $Q_{1}$ $\cap$ $Q_{2}$ $\neq$ $\varnothing$ :

\displaystyle|Q_{1}\cap Q_{2}|>F_{n}

(2)

In Equation 2, the $F_{n}$ represents the set of Byzantine faulty nodes. This property ensures that even if two quorums intersect, they must still have at least one non-faulty node in common, preventing Byzantine nodes from causing conflicts.

IV-A3 BQ’s Tertiary ( $3^{rd}$ ) Property

Within a BQ, the predominant values must be accurate, as illustrated in Fig. 3 c. The distributed system inherently operates on the assumption that most of the nodes, data sources, sensors, or input streams function correctly and contribute to the stability of the system. If the majority were to act maliciously, the viability of the distributed system would be jeopardized. The quorum system must be able to tolerate a certain number of faulty nodes. Specifically, the number of nodes in the quorum system should be greater than three times the maximum number of Byzantine faulty nodes $3(Fm)$ , where $F_{m}$ is the maximum number of faults the system is designed to handle. Equation 3, for all $Q$ in the quorum system and for any $F_{m}$ $\leq$ $n_{n}$ /3 (where $n_{n}$ is the total number of nodes in the system):

\displaystyle|Q|>3F_{m}

(3)

This property ensures that there are enough non-faulty nodes in each quorum to overcome Byzantine faults and maintain the integrity of the system.

In addition to $ABQ$ , our method incorporates the longest-chain rule[37] to derive precise Oracle values, which is a prevalent approach in BCT. This rule mandates that nodes recognize and accept the lengthiest chain among all candidates. If nodes opt for honest behavior, they should either support chains that match the length of previously broadcasted honest chains or exceed them[38]. In our context, we perceive a quorum as a data chain. If a quorum gets fragmented into multiple segments, the lengthiest data segment is identified as the ‘longest-chain’ which is deemed accurate, as depicted in Fig. 4. This approach yields a genuine and reliable data reading since it also aligns with the ( $N_{p}$ + $f_{p}$ )/2 BQ definition. For instance, if $Q_{1}$ represents a quorum of number of nodes $(n_{n})$ , $Q_{2}$ is a quorum of $n_{n}+1$ and $Q_{3}$ is a quorum of $n_{n}+2$ reading, then given that $Q_{3}>Q_{2}>Q_{1}$ , the $Q_{3}$ quorum is preferred over other quorums.

Algorithm 1 Forming ABQs and Determining Oracle Value

0: Input

\leftarrow

IoT Device Stream

(IoT_{1},IoT_{2},\ldots IoT_{n})

Input

\leftarrow

Data Generated by Devices

(dt_{1}

\rightarrow

dt_{n})

Input

\leftarrow

Error Margin (Fault-tolerance):

\phi\in\mathbb{Z}

Duration Period

Tm_{i}

Save data to an array (

Ar_{i}

)

For each data fetching

Tm_{i}

Record data in an array (

Ar_{i}

)

Filtered Readings data in an array (

Rd

)

if (

Ar_{i}

)

\neq

Sorted(

Ar_{i}

) then

Sort

(Ar_{i})

in increasing sequence

end if

Med = Med(

Ar_{i}

)

Rd=[(Med -

d_{1}

), (Med -

d_{2}

),…(Med -

d_{n}

)]

Abslt (Rd)

if (|Rd| fetched readings )

\leq

\phi

then

Approve the |Rd| readings [(

d_{1}),(d_{2})\ldots(d_{n}

)]

Quorum (

Q_{m}

)

\leftarrow

Approve |Rd| readings [(

dt_{1}),(dt_{2})\ldots(dt_{n}

)]

Oracle = Mean-Readings (

Q_{m}

)

end if

end

In this work, to implement the initial segment the input streams mentioned below for the formation of ABQs and determination of the Oracle are required: i) Array containing data from $n_{t}$ number of IoT devices ii) A predetermined threshold $\phi$ . A lower threshold $\phi$ yields more accurate results. This threshold $\phi$ can be computed using the formula “ $\phi$ = $m$ / $n_{t}$ ", where $m$ represents the permissible number of subsystem failures and $n_{t}$ is the total count of subsystems/ $IoT$ devices. Algorithm 1 delineates the process for forming ABQs and determining the Oracle value. The foundational requirements of our proposed system encompass a total of $n$ IoT devices and a predetermined $\phi$ . The system’s operation unfolds as follows: i) IoT device readings are captured and stored in an array accompanied by a timestamp. ii) If not already sorted, the data is arranged in ascending order. iii) The median of the data array is calculated. The difference between the median and each individual data value is computed. Given that the difference between the median and data values can result in negative numbers, the absolute $Abs$ value of the results is taken. The $\phi$ value is then applied to the $Abs$ outcomes. Consequently, values less than or equal to $\phi$ are deemed valid and are stored in a separate array. A quorum is then constituted using these accepted values. The final value of the Blockchain Oracle is determined by computing the mean of the quorum values. A state transition diagram illustrating this process is provided in Fig. 5.

Let $IoT_{1},IoT_{2},\ldots IoT_{n}$ be the pointers of $IoT$ devices having by default score $\Psi$ = $0$ that are remotely connected to the system and transmitting data $dt_{1},dt_{2},\ldots dt_{n}$ , establish a predefined $\phi$ value, denoted as $\phi\in\mathbb{Z}$ . Capture the data in an array $A=[dt_{1},dt_{2},\ldots dt_{n}$ ]. Next, sort array $A$ in ascending order and determine its median $Med$ . Subsequently, compute $D$ which represents the absolute difference between each transmitted data value from $Med$ , values that are less than or equal to $\phi$ are deemed valid. The $Mean$ of these valid values is taken to form the Quorum $Q_{m}$ .

TABLE III: Oracle calculation steps in the proposed model

	$\pagecolor{gray}d_{1}$	$d_{2}$	$d_{3}$	$d_{4}$	$d_{5}$	$d_{6}$	$d_{7}$	$d_{8}$	$d_{9}$	$Ans$
i	10	10	7	10	10	10	11	13	19
ii	10	10	7	10	10	10	11	13	19
iii	0	0	3	0	0	0	-1	-3	-9
iv	0	0	3	0	0	0	1	3	9
v	0	0	3	0	0	0	1	3	9
vi	10	10		10	10	10	11
vii										10

As mentioned in Table III, a total of seven steps are required ( $i-vii$ ) to find $Oracle$ value by using the proposed approach, (where $\phi$ = $2$ . In step $i$ we get the $data$ from different devices (IoTs), find $median$ in step $ii$ , then find $difference$ between $median$ and $data\ values$ in step $iii$ , apply Absolute in step $iv$ , next apply $\phi$ value and discard out of ranged values in step $v$ , put actual data values in remaining slots in step $vi$ , subsequently take a $mean$ of values in step $vii$ , it will be sent as $Oracle$ value.

Algorithm 2 Detection of Malicious Device(s)

0: Input

\leftarrow

IoT

devices-score

(D_{s})

\Psi\in\mathbb{Z}

, Max. score limit

\rightarrow

\varphi

e.g. 5 Initialise

\rightarrow

\Psi\rightarrow 0

Dsct\leftarrow

discarded-value(s)

IoT\leftarrow Dsct

IoT

\Psi++

for each

Dsct

IoT

\Psi++

\varphi

Suspend

(dt_{1},dt_{2}\ldots dt_{n})\leftarrow(IoT_{1},IoT_{2}\ldots IoT_{n}

)

Trigger alert

end

IV-B Detection of faulty/ compromised Device(s)

In addition to determining a precise Oracle value, the proposed system offers the benefit of straightforward, self-sustaining accountability and auditability. In addition to pinpointing an accurate Oracle value, our proposed system simplifies the processes of accountability and auditability, making them inherently self-regulating. Within our framework, there’s a built-in mechanism that keeps track of IoT devices with consistently errant data. These devices are flagged and their data is systematically excluded. Over a set time frame, devices that consistently submit incorrect readings are ranked. For each erroneous reading, a device-specific score (initially set to zero) increases. When a device’s error rate reaches a threshold of $\varphi$ , it’s deemed malicious or compromised. Consequently, the system autonomously recommends its removal. This allows for the seamless replacement of malfunctioning devices with functioning ones. To implement this identification and replacement process, we’ve developed Algorithm 2. This algorithm is in the continuation of Algorithm 1 and uses the ‘Maximum attribute value’ of IoT devices as an input criterion. The process unfolds as follows":

Before running this algorithm, it’s essential to have an attribute value for every IoT device linked to the system. This attribute value, denoted as $(\varphi)$ , represents the maximum count or score within a specified duration after which the data from that specific device will be discarded, spontaneously. At the outset, the attribute value of every device is initialized to $0$ . The proposed system operates in the following manners:

1.

Any values that are discarded are placed in a distinct array, referred to as ’Discarded values’ ( $Dsct$ )
2.

Initially, every device’s attribute value starts at $0$
3.

Whenever reading from an IoT device is discarded, the attribute value for that specific device increases by $1$
4.

If an IoT device’s attribute value hits the pre-set limit within the designated timeframe, the system will suspend accepting readings from that device
5.

The system will then trigger an alert and flag this device as potentially compromised, suggesting it to be replaced

TABLE IV: Devices Score Board, where

\phi=2

\Psi=0

and

\varphi=5

	$d_{1}$	$d_{2}$	$d_{3}$	$d_{4}$	$d_{5}$	$d_{6}$	$d_{7}$	$d_{8}$	$d_{9}$
i			3					3	9
ii	-	-	$+1$	-	-	-	-	$+1$	$+1$
iii	-	-	$\Uparrow$	-	-	-	-		$\Uparrow$
iv	3	-	-	-	-	-	-	-	9
v	$+1$	-	-	-	-	-	-	-	$+1$
vi	$\Uparrow$	-	-	-	-	-	-	-	$\Uparrow$
vii	3								11
viii	+1	-	-	-	-	-	-	-	$+1$
ix	$\Uparrow$	-	-	-	-	-	-	-	$\Uparrow$
x	-								19
xi	-	-	-	-	-	-	-	-	$+1$
xii	-	-	-	-	-	-	-	-	$\Uparrow$
xiii	-								15
xiv	-	-	-	-	-	-	-	-	$+1$
xv	-	-	-	-	-	-	-	-	$\Uparrow$
	$\varphi=2$	$\varphi=0$	$\varphi=1$	$\varphi=0$	$\varphi=0$	$\varphi=0$	$\varphi=0$	$\varphi=1$	$\varphi=5$

In continuation of $Oracle\ calculation\ steps$ , Table IV shows further steps ( $i-xvi$ ) required to increase the attribute value of an $IoT$ device(s) and maintain $score$ . Now in step $i$ we get discard and out of ranged values devices then assign $+1$ for discarded value device in each iteration in a specific time interval, subsequently in case of frequent wrong readings, device(s) will reach their score limit e.g. $\varphi=5$ at which the compromised device will be replaced.

In Algorithm-2, consider $IoT_{1},IoT_{2},\ldots IoT_{n}$ as references to IoT devices connected to the system from remote locations. Each IoT device has an attribute value, ( $Att_{v}$ ) denoted by $\psi\in\mathbb{Z}$ , where $\psi$ belongs to the set of integers $\mathbb{Z}$ . Start by setting $\psi$ to $0$ and define the attribute threshold as $\varphi$ . During the process of examining the discarded values array ( $Dtv$ ), for any discarded value originating from the $IoT_{i}$ IoT device, its attribute $I_{i}$ gets incremented. Suppose the attribute value reaches $\varphi$ then readings from device $d\leftarrow I$ corresponding to $I_{i}$ are halted. Both algorithms can be visualized in Fig. 6.

V Simulations and Results

V-A Data Set

We utilized two Kaggle-published IoT temperature reading datasets[39] [40]. These datasets aggregate temperature measurements from various IoT devices. The first dataset comprises approximately 77,000 temperature readings, gathered over four months from five distinct IoT devices. Conversely, the second dataset encompasses around 43,000 daily temperature readings, recorded over a span of more than four years from 10 different IoT devices. For our analysis, we handpicked a subset of 5,999 readings. Both datasets are accessible from GitHub repositories ¹¹1https://github.com/Fahadrahman2121/IoT-temp-reading-&²²2https://github.com/Fahadrahman2121/temperature-city-data.

V-B Evaluation

For evaluation and comparison of ABQs model with different existing approaches, the following Equations are used i) Root Mean Square Error (RMSE), Equation 4, ii) Percent Error (PE), Equations 5, iii) Mean Absolute Error (MAE), Equations 6, iv) Mean Squared Error (MSE), Equations 7, v) Mean Absolute Percentage Error (MAPE), Equations 8, vi) R-squared (Coefficient of Determination), Equations 9, vii) Adjusted $R^{2}$ , Equation 10, viii) Mean Bias Deviation (MBD), Equations 11, ix) Median Absolute Deviation (MAD), Equations 12

Root Mean Square Error $(RMSE)$ , Equations 4,

\displaystyle RMSE

\displaystyle=\sqrt{\frac{\sum(O_{v}-E_{v})^{2}}{n}}

(4)

Percent Error $(PE)$ , Equations 5

\displaystyle PE

\displaystyle=\frac{\sum(|O_{v}-E_{v}|)}{E_{v}}\times{100}

(5)

where $O_{v}$ = Observed-values, $E_{v}$ = Expected-values.

Mean Absolute Error $(MAE)$ , Equations 6

\displaystyle MAE

\displaystyle=\frac{1}{n}\sum_{i=1}^{n}|y_{i}-\hat{y_{i}}|

(6)

where $y_{i}$ actual values, $\hat{y_{i}}$ predicted values, and $n$ is the number of observations.

Mean Squared Error $(MSE)$ , Equations 7

\displaystyle MSE

\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{({y_{i}-\bar{y_{i}})}^{2}}

(7)

Mean Absolute Percentage Error $(MAPE)$ , Equations 8

\displaystyle MAPE

\displaystyle=\frac{100}{n}\sum_{i=1}^{n}|\frac{y_{1}-\bar{y}_{i}}{y_{1}}|

(8)

R-squared (Coefficient of Determination), Equations 9

\displaystyle R^{2}

\displaystyle=1-\frac{SS_{res}}{SS_{tot}}

(9)

where $SS{{}_{(}res)}$ is the residual sum of squares $\sum{(y_{i}-\hat{y_{i}})}^{2}$ and $SS{{}_{(}tot)}$ is the total sum of squares $\sum{(y_{i}-\bar{y}_{i})^{2}}$ with $\bar{y}$ as the mean of the observed data.

Adjusted R-squared, Equations 10

\displaystyle Adjusted\ R^{2}

\displaystyle=1-(1-R^{2})\ \times\ \frac{n-1}{n-p-1}

(10)

where $n$ represents the total data points in the dataset, and $p$ denotes the count of independent variables in the model.

Mean Bias Deviation $(MBD)$ , Equations 11

\displaystyle MBD

\displaystyle=\frac{100}{n}\sum_{i=1}^{n}\frac{(y_{i}-\bar{y}_{i})}{y_{i}}

(11)

Median Absolute Deviation $(MAD)$ , Equations 12

\displaystyle MAD

\displaystyle=median(|y_{1}-M|,|y_{2}-M|,...,|y_{n}-M|)

(12)

where $M$ is the median of the dataset.

Our simulation results highlight the efficiency of our proposed algorithm, showcasing its superior performance in ensuring data authenticity and the reliability of the Oracle value. Accurate readings closely align with the original data and contribute to the formation of ABQ. Furthermore, the accuracy of the ABQ system solutions improves with an increase in the number of IoT devices. It’s also advisable to proportionally increase the $\phi$ value with the count of sensors. For comparisons of the data, evaluations, and visual representations, $Spyder\ (Python\ 3.9)$ and $Anaconda\ Navigator$ are used.

V-C Analysis of Statistical Hypotheses

We conducted a t-test, which is based on t-distribution, to determine if there’s a meaningful difference between the models. The resulting $p-values$ indicate a notable disparity (with ( $p$ -value $<0.05$ ) among the methodologies, as detailed in Table V.

TABLE V: Comparing ABQ with Other Methods Using the t Test

Techniques	t value	$p$ -value
Mean approach	5.7	< 0.0000
Weighted approach	3.3	< 0.0000
Consensus approach	3.1	< 0.0000

V-D Reliability Analysis

Constructing and operating economically crucial technological systems demand a thorough reliability assessment. Various methodologies systematically evaluate a system’s reliability and risk. Fig. 7, shows Fault Tree Diagram (FTD), which is a widely adopted tool. Using the FTD derived from Equation 13, the reliability of the proposed Blockchain Oracle can be evaluated qualitatively through feedback on performance and dependability, and quantitatively by measuring specific metrics such as uptime, error rates, and response times.

\displaystyle F(S)

\displaystyle=F(X_{1})\ OR\ F(X_{2})\ldots OR\ F(X_{n})

(13)

where F( $X_{n}$ ) is $n$ -th event fail.

In this system, we have experienced that reliability $(r)$ is enhanced when $sensors$ are connected in ’parallel’ ( ${par}$ ), instead of connections in series. This means that the system’s reliability improves with an increase in the number of sensor(s). The reliability of our proposed system, denoted as ( $R_{par}$ ) is determined by Equation 14.

\displaystyle R_{par}

\displaystyle=1-(1-r)^{m}

(14)

Where, $r$ represents the reliability of a single unit, while $m$ denotes the count of active units. Given that an IoT device has a reliability of 0.966, a minimum of five IoT devices is required to achieve 100% system reliability. The detailed reliability metrics for the system are presented in Table VI.

TABLE VI: System Reliability in the Proposed Model

Number of Devices	Reliability-Analysis
1	0.9660000
2	0.9984000
3	0.9999360
4	0.9999974
5	0.9999999

V-E Absolute Improvement

This is simply the difference between the accuracy of the new technique and the accuracy of the old technique, as shown in Equation 15. Table VII presents a comparative analysis with ABQ, revealing absolute improvement of 2.6, 1.5, and 1.4 with the Weighted, Consensus, and Mean methods, respectively

\displaystyle Absolute\ Improvement

\displaystyle=A_{new}-A_{old}

(15)

where $A_{new}$ is accuracy of ABQ technique and $A_{old}$ is accuracy of Weighted, Consensus and Mean techniques.

TABLE VII: Absolute Improvement

Technique	Absolute Improvement
ABQ Vs Weighted	2.6
ABQ Vs Consensus	1.5
ABQ Vs Mean	1.4

V-F $F_{1}$ Score

The $F1$ score serves as an indicator of a model’s precision and recall balance, providing a composite accuracy assessment in machine learning tasks. Its value spans between $0$ (least optimal, the worst) and $1$ (most optimal) with higher values indicating better accuracy and completeness in predictions. For the calculation of $F_{1}$ score, first, we need to find precision and recall values.

V-F1 Precision and Recall Evaluation

Precision assesses the correctness of positive predictions, whereas Recall gauges the model’s capability to detect all pertinent cases. Both metrics range between 0 and 1, with higher values indicating superior performance. Equations 16 and Equation 17 are used to find the values of Precision and Recall, respectively. The results are shown in Table VIII. Where True Positive is abbreviated as $TP$ and False Positive as $FP$ .

	$\displaystyle Precision$	$\displaystyle=\frac{TP}{TP\ +\ FP}$		(16)
	$\displaystyle Recall$	$\displaystyle=\frac{TP}{TP\ +\ FP}$		(17)

TABLE VIII: Precision and Recall Values

Prediction Method	Precision	Recall
ABQ-Value	1.0000	0.9849
Mean-Value	1.0000	0.9843
Consensus-Value	0.9988	0.9896
Weighted-Value	0.9993	0.9705

V-F2 $F_{1}$ Score Evaluation

For the evaluation of $F_{1}$ score, Equation 18 is executed on dataset 1 and dataset 2, the results are mentioned below in Table IX:

\displaystyle F_{1}

\displaystyle=2\ \times\ \frac{precision\ \times\ recall}{precision\ +\ recall}

(18)

TABLE IX: F1 Score

Prediction Method	$F_{1}$ Score	Accuracy	$F_{1}$ Score	Accuracy
	Dataset 1		Dataset 2
ABQ-Value	0.9964	0.9890	0.9994	0.9895
Mean-Value	0.9921	0.9844	0.9921	0.9844
Consensus-Value	0.9922	0.9855	0.9922	0.9885
Weighted-Value	0.9847	0.9700	0.9847	0.9700

The outcome of Table IX is based on threshold values $30$ and $60$ for $dataset_{1}$ and $dataset_{2}$ , respectively:

•

$ABQ-Value$ has the highest $F1$ Score and accuracy at these threshold values.
•

$Mean-Value$ and $Consensus-Value$ follow closely behind.
•

$Weighted-Value$ has a slightly lower $F1$ Score and accuracy in comparison.

Dataset 1 and dataset 2 are applied to the above equations and inferred the results in graphical format which are depicted in Fig. 8 to Fig. 16. The results are clearly showing that the accuracy and performance of the proposed $ABQ$ approach is far better than the existing approaches.

The discussed statistical techniques i.e. $RMSE$ , $PE$ , $(MSE)$ , $(MAE)$ , $R^{2}$ , $(MBD)$ , $(MAPE)$ , Adjusted $R^{2}$ , and $(MAD)$ are mathematical measures used to analyze and interpret data, providing insights and supporting decision-making based on empirical evidence. From descriptive statistics to advanced inferential predictive modeling, it is proved that the accuracy of the $ABQ$ approach is much better than the compared related work. This paper concludes with the key factors of a Blockchain Oracle in comparison with related published work in Table X.

TABLE X: Proposed model in Comparison with related work

Authors	Security	Simplicity	Trustlessness	Transparency	Redundancy	Independent
S. Ellis et al. [27]	No	No	No	No	No	No
Adler et al. [28]	Yes	No	No	No	No	No
Tian et al. [29]	No	No	No	No	No	No
Heiss et al. [30]	Yes	No	No	No	No	No
Berger et al. [31]	Yes	No	No	No	No	No
Tseng et al. [32]	No	No	No	No	No	No
Wei et al. [33]	Yes	No	No	No	No	No
Proposed Solution	Yes	Yes	Yes	Yes	Yes	Yes

VI Conclusion and Future Work

In this study, a novel approach is introduced in which ABQs and heuristic-based detection are clubbed together for accuracy and pinpointing compromised devices, respectively. Our method leans on the foundational Blockchain belief that the majority (two-thirds) of nodes are consistently accurate. Through our methodology, we observed the generation of highly accurate and trustworthy values, facilitating the swift identification of any malfunctioning data-transmission units. We deduced that $ABQs$ offer a more streamlined and rapid solution for extracting near-authentic values from dubious data sources. The Heuristic-based detection ( $HBD$ ) stands out as an efficient tool for spotting faulty nodes, demonstrating resilience by maintaining operations even when over a quarter of its nodes fail or exhibit malicious behavior. Our empirical findings reveal that the precision of the $ABQs$ method surpasses and offers greater resilience than traditional Blockchain Oracle techniques. Malicious or compromised nodes can be promptly pinpointed through the audit and accountability mechanisms embedded in the ‘heuristic-based’ detection approach.

In the future, Machine Learning (ML) can be employed to design quicker and more efficient Oracles. Considering data trails or backtracking data to its Oracle value could improve the autonomy of Blockchain Oracle.

References

[1] Tharaka Hewa, Mika Ylianttila, and Madhusanka Liyanage. Survey on blockchain based smart contracts: Applications, opportunities and challenges. Journal of Network and Computer Applications, 177:102857, 2021.
[2] Anton Permenev, Dimitar Dimitrov, Petar Tsankov, Dana Drachsler-Cohen, and Martin Vechev. Verx: Safety verification of smart contracts. In 2020 IEEE symposium on security and privacy (SP), pages 1661–1677. IEEE, 2020.
[3] Shuai Wang, Liwei Ouyang, Yong Yuan, Xiaochun Ni, Xuan Han, and Fei-Yue Wang. Blockchain-enabled smart contracts: architecture, applications, and future trends. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(11):2266–2277, 2019.
[4] Ishu Gupta, Niharika Singh, and Ashutosh Kumar Singh. Layer-based privacy and security architecture for cloud data sharing. Journal of Communications Software and Systems, 15(2):173–185, 2019.
[5] Kamran Mammadzada, Mubashar Iqbal, Fredrik Milani, Luciano García-Bañuelos, and Raimundas Matulevičius. Blockchain oracles: A framework for blockchain-based applications. In Business Process Management: Blockchain and Robotic Process Automation Forum: BPM 2020 Blockchain and RPA Forum, Seville, Spain, September 13–18, 2020, Proceedings 18, pages 19–34. Springer, 2020.
[6] Nuno Leite, Alexandre Santos, and Nuno Lopes. Assuring m2m secure transactions via blockchain and smart contracts. Journal of Communications Software and Systems, 17(3):260–269, 2021.
[7] Thomas McGhin, Kim-Kwang Raymond Choo, Charles Zhechao Liu, and Debiao He. Blockchain in healthcare applications: Research challenges and opportunities. Journal of Network and Computer Applications, 135:62–75, 2019.
[8] Justin Sunny, Naveen Undralla, and V Madhusudanan Pillai. Supply chain transparency through blockchain-based traceability: An overview with demonstration. Computers & Industrial Engineering, 150:106895, 2020.
[9] Noor Abdalkarem Mohammedali, Triantafyllos Kanakis, Ali Al-Sherbaz, and Michael Opoku Agyeman. Management and evaluation of the performance of end-to-end 5g inter/intra slicing using machine learning in a sustainable environment. Journal of Communications Software and Systems, 19(1):91–102, 2023.
[10] Feng Yang, Liao Lei, and Lin Chen. Method of interaction between blockchain and the world outside the chain based on oracle machine. In 2022 IEEE 8th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing,(HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), pages 101–106. IEEE, 2022.
[11] Vishwanath Garagad and Nalini Iyer. Dynamic trust-based device legitimacy assessment towards secure iot interactions. Journal of Communications Software and Systems, 18(3):269–276, 2022.
[12] Boutaina Jebari, Khalil Ibrahimi, Mohammed Jouhari, and Mounir Ghogho. Analysis of blockchain selfish mining: a stochastic game approach. In ICC 2022-IEEE International Conference on Communications, pages 4217–4222. IEEE, 2022.
[13] Roy Lai and David LEE Kuo Chuen. Blockchain–from public to private. In Handbook of Blockchain, Digital Finance, and Inclusion, Volume 2, pages 145–177. Elsevier, 2018.
[14] Lan N Nguyen, Truc DT Nguyen, Thang N Dinh, and My T Thai. Optchain: optimal transactions placement for scalable blockchain sharding. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 525–535. IEEE, 2019.
[15] Ishu Gupta, Ashutosh Kumar Singh, Chung-Nan Lee, and Rajkumar Buyya. Secure data storage and sharing techniques for data protection in cloud environments: A systematic review, analysis, and future directions. IEEE Access, 2022.
[16] Asrar Ahmed Baktayan, Ibrahim Ahmed Al-Baltah, and Abdul Azim Abd Ghani. Intelligent pricing model for task offloading in unmanned aerial vehicle mounted mobile edge computing for vehicular network. Journal of Communications Software and Systems, 18(2):111–123, 2022.
[17] Nargess Tahmasbi, Guohou Shan, and Aaron M French. Identifying washtrading cases in nft sales networks. IEEE Transactions on Computational Social Systems, 2023.
[18] Rahime Belen-Saglam, Enes Altuncu, Yang Lu, and Shujun Li. A systematic literature review of the tension between the gdpr and public blockchain systems. Blockchain: Research and Applications, page 100129, 2023.
[19] Sejin Han and Sooyong Park. A gap between blockchain and general data protection regulation: A systematic review. IEEE Access, 2022.
[20] Fathan Abdul Shodiq, Rizka Reza Pahlevi, and Parman Sukarno. Secure mqtt authentication and message exchange methods for iot constrained device. In 2021 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), pages 70–74. IEEE, 2021.
[21] Mohsen Attaran. Blockchain technology in healthcare: Challenges and opportunities. International Journal of Healthcare Management, 15(1):70–83, 2022.
[22] Anokye Acheampong Amponsah, Adebayo Felix Adekoya, and Benjamin Asubam Weyori. Improving the financial security of national health insurance using cloud-based blockchain technology application. International Journal of Information Management Data Insights, 2(1):100081, 2022.
[23] Mansour Mededjel, Ghalem Belalem, Fatima Zohra Nesrine Benadda, and Samah Kadakelloucha. A blockchain application prototype for the internet of things. Journal of Communications Software and Systems, 18(2):124–136, 2022.
[24] Marco Alessi, Alessio Camillò, Enza Giangreco, Marco Matera, Stefano Pino, and Davide Storelli. A decentralized personal data store based on ethereum: Towards gdpr compliance. Journal of Communications Software and Systems, 15(2):79–88, 2019.
[25] Korhan Cengiz, Basak Ozyurt, Krishna Kant Singh, Rohit Sharma, Tuna Topac, and Jyotir Moy Chatterjee. The role of iot and narrow band (nb)-iot for several use cases. In Emergence of Cyber Physical System and IoT in Smart Automation and Robotics: Computer Engineering in Automation, pages 161–174. Springer, 2021.
[26] Yehia R Hamdy and Ahmed I Alghannam. Evaluation of zigbee topology effect on throughput and end to end delay due to different transmission bands for iot applications. Journal of communications software and systems, 16(3):254–259, 2020.
[27] Steve Ellis, Ari Juels, and Sergey Nazarov. Chainlink: A decentralized oracle network. Retrieved March, 11:2018, 2017.
[28] John Adler, Ryan Berryhill, Andreas Veneris, Zissis Poulos, Neil Veira, and Anastasia Kastania. Astraea: A decentralized blockchain oracle. In 2018 IEEE international conference on internet of things (IThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), pages 1145–1152. IEEE, 2018.
[29] Zhihong Tian, Xiangsong Gao, Shen Su, Jing Qiu, Xiaojiang Du, and Mohsen Guizani. Evaluating reputation management schemes of internet of vehicles based on evolutionary game theory. IEEE Transactions on Vehicular Technology, 68(6):5971–5980, 2019.
[30] Jonathan Heiss, Jacob Eberhardt, and Stefan Tai. From oracles to trustworthy data on-chaining systems. In 2019 IEEE International Conference on Blockchain (Blockchain), pages 496–503. IEEE, 2019.
[31] Christian Berger, Hans P Reiser, João Sousa, and Alysson Neves Bessani. Aware: Adaptive wide-area replication for fast and resilient byzantine consensus. IEEE Transactions on Dependable and Secure Computing, 2020.
[32] Lewis Tseng, Qinzi Zhang, Saptaparni Kumar, and Yifan Zhang. Exact consensus under global asymmetric byzantine links. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), pages 721–731. IEEE, 2020.
[33] Liangxiong Wei, Yanru Chen, Yuanyuan Zhang, Lian Zhao, and Liangyin Chen. Pspl: A generalized model to convert existing neighbor discovery algorithms to highly efficient asymmetric ones for heterogeneous iot devices. IEEE Internet of Things Journal, 7(8):7207–7219, 2020.
[34] Fahad Rahman, Chafiq Titouna, and Farid Naït-Abdesselam. Asymmetric byzantine quorum approach to resolve trust issues in decentralized blockchain oracles. In 2023 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pages 1–6, 2023.
[35] Ik Soo Lim and Naoki Masuda. To trust or not to trust: evolutionary dynamics of an asymmetric n-player trust game. IEEE Transactions on Evolutionary Computation, 2023.
[36] Orestis Alpos, Christian Cachin, and Luca Zanolini. How to trust strangers: Composition of byzantine quorum systems. In 2021 40th International Symposium on Reliable Distributed Systems (SRDS), pages 120–131. IEEE, 2021.
[37] Elaine Shi. Analysis of deterministic longest-chain protocols. In 2019 IEEE 32nd Computer Security Foundations Symposium (CSF), pages 122–12213. IEEE, 2019.
[38] Erica Blum, Aggelos Kiayias, Cristopher Moore, Saad Quader, and Alexander Russell. The combinatorics of the longest-chain rule: Linear consistency for proof-of-stake blockchains. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1135–1154. SIAM, 2020.
[39] Anirudh Singh Chauhan. Eda on temperature readings iot devices. In https://www.kaggle.com/code/anirudhchauhan/eda-on-temperature-readings-iot-devices, 2021.
[40] Waleed Faheem. Temperature analysis and visualization. In https://www.kaggle.com/code/waleedfaheem/temperature-analysis-and-visualization/data, 2022.