This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Demand Responsive Dynamic Pricing Framework for Prosumer Dominated Microgrids using Multiagent Reinforcement Learning

Amin Shojaeighadikolaei, Arman Ghasemi, Kailani R. Jones,
Alexandru G. Bardas, Morteza Hashemi, Reza Ahmadi
Abstract

Demand Response (DR) has a widely recognized potential for improving grid stability and reliability while reducing customers’ energy bills. However, the conventional DR techniques come with several shortcomings, such as inability to handle operational uncertainties and incurring customer disutility, impeding their wide spread adoption in real-world applications. This paper proposes a new multiagent Reinforcement Learning (RL) based decision-making environment for implementing a Real-Time Pricing (RTP) DR technique in a prosumer dominated microgrid. The proposed technique addresses several shortcomings common to traditional DR methods and provides significant economic benefits to the grid operator and prosumers. To show its better efficacy, the proposed DR method is compared to a baseline traditional operation scenario in a small-scale microgrid system. Finally, investigations on the use of prosumers’ energy storage capacity in this microgrid highlight the advantages of the proposed method in establishing a balanced market setup.

Index Terms:
Microgrid, Demand Response, Prosumer, Real-Time Pricing , Reinforcement Learning

I Introduction

Demand-side management, also known as Demand Response (DR), is one of the most widely studied topics in the context of the smart grid [1]. In order to flatten the demand curve, conventional time-based DR methods such as the Real-Time Pricing (RTP) approach [2] rely on dynamically changing the electricity price to motivate the customers to alter their energy use profile [3]. This potentially improves the grid stability and reliability by shifting the peak demand and decreasing the need for peaking power plants while offering reduced energy bills to residential customers [4, 5].

Nevertheless, the conventional time-based DR methods typically assume that the pricing policies are deterministic and decided ahead of time [6, 7, 8], or consider that pricing policies follow a random process with known properties [9]. Thus, conventional DR methods are either not able to offer convergence to the optimal solution in presence of uncertainties in the environment or the mathematical formulations become cumbersome to model uncertainties [10], which makes them unsuitable for real-world implementations.

Moreover, the conventional time-based DR approaches almost exclusively focus on altering the preferred electricity consumption pattern of the customers, e.g., by changing the temperature set point on air conditioning systems or delaying the use of major appliances, in order to shift their load to off-peak periods. These approaches often lead to customer dissatisfaction (disutility), and, as a consequence, have not been widely adopted. Psychologically, the consumers value their comfort much higher than economic savings provided by traditionally proposed DR approaches [11].

Integration of energy storage into residential photovoltaic (PV) systems, EV and PV forecasting should provide energy consumers & producers, commonly known as prosumers, with more flexibility to participate in DR programs while minimizing their disutility[12, 13, 14]. In other words, prosumers should be able to shape their demand profile in real-time regardless of their consumption profile [15]. As a result, prosumers can potentially receive greater economic benefits by selling their excess energy to the grid at higher prices, while aiding grid support services [16]. Achieving the aforementioned benefits demands for a novel DR approach on the prosumers’ side. This DR approach considers various factors such as state-of-charge (SOC) of the storage device, household consumption profile, and real-time electricity price and PV generation levels. Furthermore, taking advantage of the flexibility provided by prosumers requires a modern grid management strategy which can treat the storage capacity of households as a grid asset that can be dispatched by properly incentivizing the households for DR participation, and leverage this asset for grid cost and performance optimization.

This paper proposes a multiagent deep Reinforcement Learning (RL) framework for implementing a new RTP-based DR technique in a prosumer microgrid that provides both, prosumers and the grid operator, with the aforementioned flexibility and greater economic benefits. The main contributions of the proposed framework are summarized as follows: a) Real-time learning: Grid and prosumer agents learn the optimal price and DR participation policy by interacting with the environment in real-time, rather than using a complex grid dynamic model for optimization. Therefore, the proposed method is applicable to the high dimensional and non-stationary environment of the grid with much less computational burden than traditional DR methods, allowing for real-world implementations; b) Altering the households’ grid injection patterns: The goal of the proposed prosumer-side DR algorithm is not to alter the consumption pattern of households which can typically lead to customer dissatisfaction. Rather, the prosumer-side DR algorithm provides cost savings by altering the households’ grid injection pattern, using the flexibility provided by the energy storage and PV generation; and c) Balanced market: The proposed framework makes better use of prosumers’ energy storage capacity, allowing for significant prosumer electricity bill reduction, as well as considerable grid economic benefit improvement, with a reasonably sized battery pack. Our results also show that after a certain threshold value, a larger battery size on the prosumer side does not necessarily yield a much higher profit. This facilitates a balanced market setup where trying to abruptly and unilaterally manipulate the pricing scheme is not in anyone’s financial interest.

Related Work: In recent years there has been a growing interest in application of RL to the problem of dynamic pricing and DR. A comprehensive survey of published works in this area is provided in [17]. Among the surveyed publications, the works in [18] and [19] are closely aligned with our work. The authors in [18] propose a RL-based dynamic pricing and energy consumption scheduling framework that can work without priori information and leads to reduced system costs. The proposed framework targets regular customers without grid injection capability and assumes that customer behavior is myopic and deterministic, i.e. each costumer is trying to minimize its cost in every single time slot. In contrast, our work takes advantage of the PV generation and storage capacity of prosumers, leading to greater flexibility for DR participation, and enables the prosumers to make decisions that lead to long-term optimization of their accumulative economic benefit, rather than minimizing their instantaneous cost. On the other hand, the authors in [19] present a RL-based dynamic pricing algorithm which can promote service provider profitability and reduce energy costs for the customers. However, similar to [18], this work only deals with regular electricity consumers, rather than prosumers with generation capability. A multi-Agent using deep reinforcement learning for a distributed energy resources in Smart Grids also provided in [20] which is highly related to our work.

The reminder of this paper is structured as follows: Section II details the proposed RL based DR approach. Section III implements the proposed method for a small-scale microgrid as a case study and provides simulation results to verify efficacy of the proposed method. Finally, our concluding thoughts are presented in Section IV.

II Proposed RL Based DR Method

Fig. 1 illustrates the envisioned microgrid system with conventional generation facilites, prosumers, and consumers. Prosumers are entities that can produce energy locally from renewable resources, store the excess energy in their battery, sell energy into the grid, or buy electricity from the grid for local consumption. The prosumers can make a profit by selling electricity to the grid at a dynamic $/kWh price of (δb\delta^{b}), referred to as buy price hereinafter. On the other hand, the prosumers incur a cost when buying electricity from the grid at a $/kWh price referred to as sell price (δs\delta^{s}), hereinafter. The goal of prosumer agents is to maximize its long term profit by determining an optimal charge/discharge policy for their energy storage devices. Similarly, the grid can buy energy from prosumers at a price of (δb\delta^{b}) and incur a cost, or sell electricity to prosumers at a price of (δs\delta^{s}) to make a profit. The goal of the grid agent is defined as maximizing the long term profit of the grid by determining an optimal buy price (δb\delta^{b}) policy. Therefore, we define the following optimization problem for grid agents,

Refer to caption
Figure 1: Small scale microgrid system that consists of generation facilities, traditional consumers, and prosumers. Grid and prosumers are equipped with reinforcement learning (RL) agents to dynamically adjust their policies in terms of buy/sell prices, and power injection.
{maximizeM(T)subject to:q=1Nc+NpPqd(t)=i=1NgPiG(t)+j=1NpPjinj(t)PiG,min(t)PiG(t)PiG,max(t),\displaystyle\begin{cases}\mathop{\mathrm{maximize}}&\text{M}\left(T\right)\\ \text{subject to:}&{\sum}\limits_{q=1}^{\small{N_{c}+N_{p}}}P_{q}^{d}\left(t\right)={\sum}\limits_{i=1}^{{N_{g}}}P_{i}^{G}\left(t\right)+{\sum}\limits_{j=1}^{N_{p}}{P_{j}^{inj}\left(t\right)}\\ &~{}P_{i}^{G,\text{min}}(t)\leq P_{i}^{G}(t)\leq P_{i}^{G,\text{max}}(t),\end{cases} (1)

where M(T)\text{M}(T) is the grid profit defined as follows:

M(T)=V(Pdm(T))i=1NgFi(PiG(T))j=1NpCj(Pjinj(T)).\text{M}\left(T\right)=V\left({P^{dm}}\left(T\right)\right)-\sum\limits_{i=1}^{N_{g}}{F_{i}}\left({P_{i}^{G}\left(T\right)}\right)-\sum\limits_{j=1}^{{N_{p}}}{{C_{j}}}\left({P_{j}^{inj}\left(T\right)}\right).

In addition, we have:

V(Pdm(T))=0TPdm(t)δs(t)𝑑t,\displaystyle V\left({{P^{dm}}\left(T\right)}\right)=\int_{0}^{T}{{P^{dm}}\left(t\right)}\,\,{\delta^{s}}\left(t\right)dt\ , (2)
Cj(Pjinj(T))={0TPjinj(t)δb(t)𝑑tforPjinj(t)>00forPjinj(t)0.\displaystyle{C_{j}}\left({P_{j}^{inj}\left(T\right)}\right)=\begin{cases}\int\limits_{0}^{T}{P_{j}^{inj}\left(t\right){\delta^{b}}\left(t\right)dt}&\textrm{for}\ P_{j}^{inj}(t)>0\\ 0&\textrm{for}\ P_{j}^{inj}(t)\leq 0.\end{cases} (3)

In which V(Pdm(T))V({P^{dm}}(T)) represents the accumulative grid revenue by selling electricity to the households in time horizon TT, Cj(Pjinj(T)){{C_{j}}({P_{j}^{inj}(T)})} represents the accumulative grid cost of buying excess energy from jthj^{th} prosumer in time horizon TT, Fi(PiG(T)){{F_{i}}({P_{i}^{G}(T)})} represents the accumulative cost of buying electricity form ithi^{th} generation facility time horizon TT, NcN_{c} , NgN_{g} and NpN_{p} are the number of consumers, generation facilities and prosumers respectively. The first equality constraint in (1) represents the grid’s power balance requirement which needs to be maintained at all times, PdmP^{dm} is the total demand of the network, Pqd(t){P_{q}^{d}\left(t\right)} is the demand of household qq , and PjinjP_{j}^{inj} is the power injection into the grid by jthj^{th} prosumer.

Similarly we define the following optimization problem for jthj^{th} prosumer,

{maximizeUj(T)subject to:Pjinj(t)+Pjbatt(t)+Pjc(t)=Pjpv(t)|Pjinj(t)|Pjinj,max|Pjbatt(t)|Pjbatt,max0Pjpv(t)Pjpv,maxSoCjminSoCj(t)SoCjmax,\displaystyle\begin{cases}\mathop{\mathrm{maximize}}&{{\rm U}_{j}}\left(T\right)\\ \text{subject to:}&P_{j}^{inj}\left(t\right)+P_{j}^{batt}\left(t\right)+P_{j}^{c}\left(t\right)=P_{j}^{pv}\left(t\right)\\ &\left|{P_{j}^{inj}\left(t\right)}\right|\leq P_{j}^{inj,\max}\\ &\left|{P_{j}^{batt}\left(t\right)}\right|\leq P_{j}^{batt,\max}\\ &0\leq P_{j}^{pv}\left(t\right)\leq P_{j}^{pv,\max}\\ &SoC_{j}^{\min}\leq So{C_{j}}\left(t\right)\leq SoC_{j}^{\max},\end{cases} (4)

where Uj(T)=Vjp(Pjinj(T))Cjp(Pjinj(T)){{\rm U}_{j}}(T)=V_{j}^{p}\left({P_{j}^{inj}\left(T\right)}\right)-C_{j}^{p}\left({P_{j}^{inj}\left(T\right)}\right) is the jthj^{th} prosumer accumulative profit in time horizon TT, Vjp(Pjinj(T))V_{j}^{p}(P_{j}^{inj}(T)) represents the jthj^{th} prosumer accumulative revenue for selling excess electricity to the grid in time horizon TT calculated using the same equation as (3) considering Vjp(Pjinj(T))=Cj(Pjinj(T))V_{j}^{p}(P_{j}^{inj}(T))=C_{j}(P_{j}^{inj}(T)), and Cjp(Pjinj(T))C_{j}^{p}(P_{j}^{inj}(T)) represents the jthj^{th} prosumer accumulative cost of buying electricity from the grid in time horizon TT calculated as,

Cjp(Pjinj(T))={0TPjinj(t)δs(t)𝑑tforPjinj(t)<0,0forPjinj(t)0.\displaystyle{C_{j}^{p}}\left({P_{j}^{inj}\left(T\right)}\right)=\begin{cases}\int\limits_{0}^{T}{P_{j}^{inj}\left(t\right){\delta^{s}}\left(t\right)dt}&\textrm{for}\,P_{j}^{inj}(t)<0,\\ 0&\textrm{for}\,{P_{j}^{inj}\left(t\right)\geq 0}.\end{cases} (5)

Pjpv(t)P_{j}^{pv}(t) is the PV generation with the peak generation of Pjpv,maxP_{j}^{pv,\max} for jthj^{th} household, Pjc(t)P_{j}^{c}(t) is consumption for jthj^{th} household, Pjinj,maxP_{j}^{inj,\max} represents the maximum allowable power injection for jthj^{th} household, Pjbatt(t)P_{j}^{batt}(t) represents the energy storage charge/discharge power with maximum allowable charge/discharge power of Pjbatt,maxP_{j}^{batt,\max} for jthj^{th} household, SoCj(t)So{C_{j}}\left(t\right) is the state of charge for jthj^{th} prosumer where SoCjminSoC_{j}^{\min} and SoCjmaxSoC_{j}^{\max} are minimum and maximum allowable state of charge. The state of charge is calculated as,

SoCj(t)=SoCj(0)+1ηj0tPjbatt(τ)𝑑τ,\displaystyle So{C_{j}}\left(t\right)=So{C_{j}}\left(0\right)+\frac{1}{{\eta_{j}}}\int_{0}^{t}{P_{j}^{batt}\left(\tau\right)\,d\tau}, (6)

where SoCj(0)So{C_{j}}\left(0\right) is the initial state of charge and ηj{\eta_{j}} is the energy storage capacity of jthj^{th} prosumer.

Each agent uses RL for solving the defined optimization problems in real-time. The agents interact with the environment in order to maximize a specified reward. The agents receive positive rewards for taking desirable actions and negative rewards for taking undesirable actions. This learning process is modeled as a Markov Decision Process (MDP) for a Multi-agent Reinforcement Learning (MARL) environment defined by a tuple of {N,S,{Ai},P,{Ri},γ}\{{\rm{N,\ S,\ \{}}{{\rm{A}}_{\rm{i}}}{\rm{\},\ P,\ \{}}{{\rm{R}}_{\rm{i}}}{\rm{\},\ \gamma}}\} where N\rm{N} is the set of agents, S\rm{S} is the set of states of the environment, Ai{\rm{A}}_{\rm{i}} is the set of actions for ithi^{th} agent, P is the set of transition function, Ri{\rm{R}}_{\rm{i}} is the immediate reward function set for ithi^{th} agent, and γ[0,1]{\rm{\gamma}}\in\left[{0,1}\right] is the discount factor. The agent-environment interaction for an MDP is shown in Fig 2. At each discretized time index kk, the agent Ni\rm{N_{i}} selects an action ai,kAi{a_{i,k}}\in{{\rm{A}}_{\rm{i}}} based on observing the current state of the environment denoted by sks_{k}. Subsequently, the agent receives a numerical feedback signal known as reward, ri,kRi{r_{i,k}}\in{{\rm{R}}_{\rm{i}}}, and transitions to a new state sk+1s_{k+1}. In the proposed framework, each agent takes actions only based on its own local. In other word, each agent can only observe partial features of the entire environment. Thus, the grid agent can observe the following environment states,

Refer to caption
Figure 2: A typical reinforcement learning framework wherein agents learn their optimal actions through observing the environment states, taking actions, and receiving rewards.
s1,k={𝐅k,𝛀k,P1,kdm}S,\displaystyle{s_{1,k}}=\left\{{{{\mathbf{F}}_{k}},\ {{\mathbf{\Omega}}_{k}},\ P_{1,k}^{dm}}\right\}\in\textrm{S}\ , (7)

where 𝐅k=[F1,kFi,k]T{{\mathbf{F}}_{k}}={\left[{\begin{array}[]{*{20}{c}}{{F_{1,k}}}&{...}&{{F_{i,k}}}\end{array}}\right]^{T}} and 𝛀k=[C1,kCj,k]T{{\mathbf{\Omega}}_{k}}={\left[{\begin{array}[]{*{20}{c}}{{C_{1,k}}}&{...}&{{C_{j,k}}}\end{array}}\right]^{T}} are the vectors of the grid cost for buying electricity from the generation facilities and prosumers at time slot k respectively, P1,kdm{{P}}_{1,k}^{dm} represents the total grid demand at time slot k.

Similarly, each prosumer agent can observe the following environment states,

sn,k={SoCn,k,Pn,kpv,Pn,kc,δkb}S\displaystyle\begin{aligned} &{s_{n,k}}=\left\{{SoC_{n,k},\ P_{n,k}^{pv},\ P_{n,k}^{c},\ \delta_{k}^{b}}\right\}\in\textrm{S}\end{aligned}
forn=2,3,,Np+1,\displaystyle\textrm{for}\,\,\,n=2,3,...,{N_{p}}+1\ , (8)

where SoCn,k{SoC_{n,k}} is the state of charge of energy storage device of nthn^{th} prosumer at time slot k, Pn,kpvP_{n,k}^{pv} and Pn,kcP_{n,k}^{c} are the PV generation and local consumption of nthn^{th} prosumer at time slot k, δkb\delta_{k}^{b} is the buy price at time slot k.

In this work, the grid agent controls the buy price to incentivize the DR participation of the prosumers and maximize its own reward. Therefore, the buy price is the action of the grid agent denoted by a1,k=δkbA1{a_{1,k}}=\delta_{k}^{b}\in{A_{1}}. On the other hand, the prosumer agents control the household’s energy storage charge/discharge state in response to the dynamic buy price changes. Therefore, the action for nthn^{th} prosumer’s agent is defined as an,kAn{a_{n,k}}\in{A_{n}} which denotes the charge/discharge command to the energy storage.

Finally, the immediate reward function for the grid and prosumer agents are defined as,

r1,k=P1,kdm×δksi=1NgFi,k(Pi,kG)j=1NpPj,kinj×δkbforPj,kinj>0,\displaystyle{r_{1,k}}=P_{1,k}^{dm}\times\delta_{k}^{s}-\sum\limits_{i=1}^{{N_{g}}}{{F_{i,k}}\left({P_{i,k}^{G}}\right)}-\sum\limits_{j=1}^{{N_{p}}}{P_{j,k}^{inj}\times\delta_{k}^{b}}\,\,\,\textrm{for}\,P_{j,k}^{inj}>0\ , (9)
rn,k=(ρ)Pn,kinj×δkb+(ρ1)Pn,kinj×δks,\displaystyle{r_{n,k}}=\left(\rho\right)P_{n,k}^{inj}\times\delta_{k}^{b}+\left({\rho-1}\right)P_{n,k}^{inj}\times\delta_{k}^{s}\ , (10)

where ρ{0,1}\rho\in\left\{{0,1}\right\} in which ρ=0\rho=0 when Pn,kinj0P_{n,k}^{inj}\leqslant 0 and ρ=1\rho=1 when Pn,kinj>0P_{n,k}^{inj}>0.

According to the above terminology, the MDP trajectories for the grid agent and nthn^{th} prosumer’s agent begin with s1,1,a1,1,r1,2,s1,2,a1,2,r1,3,{s_{1,1}},\ {a_{1,1}},\ {r_{1,2}},\ {s_{1,2}},\ {a_{1,2}},\ {r_{1,3}},... and sn,1,an,1,rn,2,sn,2,an,2,rn,3,{s_{n,1}},\ {a_{n,1}},\ {r_{n,2}},\ {s_{n,2}},\ {a_{n,2}},\ {r_{n,3}},..., respectively. The primary goal of each agent is to maximize the accumulative reward sequence formulated by G1=t=0(γ1)tr1,k+t+1{G_{1}}={\sum}\limits_{t=0}^{\infty}{{{\left({{\gamma_{1}}}\right)}^{t}}}\,{r_{1,k+t+1}} and Gn=t=0(γn)t.rn,k+t+1{G_{n}}={\sum}\limits_{t=0}^{\infty}{{{\left({{\gamma_{n}}}\right)}^{t}}}.\,{r_{n,k+t+1}} where 0γ110\leqslant{\gamma_{1}}\leqslant 1 and 0γn10\leqslant{\gamma_{n}}\leqslant 1 are discount factors and n=2,3,,Np+1n=2,3,...,{N_{p}}+1. In this work, to find the optimal policy, Deep Q-Network (DQN) [21] is deployed to approximate the optimal action-value function which satisfies the Bellman Equation as,

Q^1,k+1(s1,k,a1,k)=(1α1)Q^1,k(s1,k,a1,k)+\displaystyle{{\hat{Q}}_{1,k+1}}\left({{s_{1,k}},{a_{1,k}}}\right)=\left({1-{\alpha_{1}}}\right){{\hat{Q}}_{1,k}}\left({{s_{1,k}},{a_{1,k}}}\right)+
α1{r1,k(s1,k,a1,k)+γ1maxa1,k+1Q^1,k(s1,k+1,a1,k+1)},\displaystyle{\alpha_{1}}\left\{{{r_{1,k}}\left({{s_{1,k}},{a_{1,k}}}\right)+{\gamma_{1}}\mathop{\max}\limits_{{a_{1,k+1}}}{{\hat{Q}}_{1,k}}\left({{s_{1,k+1}},{a_{1,k+1}}}\right)}\right\}\ , (11)
Q^n,k+1(sn,k,an,k)=(1αn)Q^n,k(sn,k,an,k)+\displaystyle{{\hat{Q}}_{n,k+1}}\left({{s_{n,k}},{a_{n,k}}}\right)=\left({1-{\alpha_{n}}}\right){{\hat{Q}}_{n,k}}\left({{s_{n,k}},{a_{n,k}}}\right)+
αn{rn,k(sn,k,an,k)+γnmaxan,k+1Q^n,k(sn,k+1,an,k+1)},\displaystyle{\alpha_{n}}\left\{{{r_{n,k}}\left({{s_{n,k}},{a_{n,k}}}\right)+{\gamma_{n}}\mathop{\max}\limits_{{a_{n,k+1}}}{{\hat{Q}}_{n,k}}\left({{s_{n,k+1}},{a_{n,k+1}}}\right)}\right\}\ , (12)

where Q^{\hat{Q}} is the approximation of QQ that is estimated by a deep neural network. In this work, we use the updating mechanism provided in [22] for Q-values.

III Case Study and Numerical Results

The proposed DR scheme is implemented on a small-scale microgrid similar to Fig. 1. The case study system is comprised of (Np=3)(N_{p}=3) prosumers equipped with solar rooftop panels, energy storage system and smart agent, a conventional consumer representative of non-generational consumer (Nc=1)(N_{c}=1), and two generation facilities (Ng=2)(N_{g}=2) where one acts as a baseline generation facility and another acts as a reserve generation capacity. The prosumers’ details are provided in Table. I. Employed generation and consumption profiles for the prosumers are provided in Fig. 3, where the generation and consumption profiles are derived from California ISO [23].

Two scenarios have been simulated for analyzing the efficacy of the proposed DR scheme. In the first scenario, referred to as conventional scenario hereinafter, no DR scheme is applied to the microgrid. The prosumers simply inject power to grid when there is an excess energy generation in the household and the battery capacity is full. For the second scenario, the proposed DR scheme is implemented on the microgrid system and results are compared with the first scenario to evalute the effectiveness of the proposed method.

Item Max PV ESS Max Agent
Generation Capacity Charge/Discharge
Prosumer 1 4 kW 6 kW 2/-2 kW Agent1
Prosumer 2 4 kW 12 kW 2/-2 kW Agent2
Prosumer 3 4 kW 8 kW 2/-2 kW Agent3
TABLE I: Microgrid Details
Refer to caption
Figure 3: generation and consumption waveform sample (a) Generation and prosumers’ consumption waveform (b) Consumer consumption waveform

Fig. 4 compares the time domain profiles of the SoC of the prosumers and the buy/sell prices over a 24 hour period between the conventional and agent-based scenarios after fully training the DQN agents. Comparing Fig. 4 (b)-(d) with Fig. 3 (a), it can be observed that in the conventional scenario the changes in the SoC of the battery of each prosumer is very closely synced with the prosumer’s PV generation profile. This is expected since as mentioned above in the conventional scenario prosumers inject power to grid when there is an excess energy generation. On the other hand, in the agent based scenario, the battery SoC is changed as a result of charge/discharge commands issued by the prosumer agents based on the identified optimal charge/discharge policy. To compare the effect of this significant change in battery usage between the two scenarios, prosumers’ average daily electricity bill, grid daily profit and reserve power consumption for the two scenarios have been calculated and plotted in Fig.5. According to this figure, the prosumers’s average daily electricity bill is reduced significantly in the agent-based scenario. Similarly, the grid profit is considerably higher in the agent-based scenario which can be attributed to the significant drop in the reserve generation usage in this scenario.

According to the findings discussed above, it can be concluded that the grid and prosumer agents are leveraging the battery capacity of the prosumers to maximize their profits and reduce their costs.

In the next experiment, the effect of battery capacity on reduction of daily electricity bill of prosumers and raising the grid profit is investigated by running several agent-based simulations and increasing the battery capacity of the prosumers for each simulation. The results are shown in Fig. 6 and Fig. 7, respectively. As pictured, the battery capacity is increased from 2 kWh to 25 kWh and the daily energy bill of prosumers and the grid profit are measured at the end of simualtion and plotted against the battery capacity. Fig. 6 and Fig. 7 show a downward trend in the daily electricity cost of prosumers and an upward trend in the grid profit as a function of battery capacity. However, these trends seem to slow down around 15 kWh battery capacity, meaning that the improvements when using batteries with larger than 15 kWh capacity seem negligible. Therefore, for a given PV generation capacity (i.e., Ppv,maxP^{pv,\max}), it can be concluded that the proposed DR scheme can provide maximum benefits with a reasonable battery pack size of around 15 kWh in the households.

Refer to caption
Figure 4: Simulation results after learning algorithm convergence for a day (24 hours) (a) Grid buying and selling price (b) Prosumer 1 battery SoC (c) Prosumer 2 battery SoC (d) Prosumer 3 battery SoC
Refer to caption
Figure 5: Daily bill comparison over episodes (a)prosumer 1-3 daily bill (b) grid profit (c) grid reserve power consume
Refer to caption
Figure 6: Daily bill reduction for prosumers with different battery capacity
Refer to caption
Figure 7: Monitoring Grid profit change with different battery capacity

IV Conclusions

This paper proposes a new multiagent RL-based decision-making environment for implementing a DR scheme in a microgrid dominated by prosumers. The proposed technique implements a Real-Time Pricing scheme that can mitigate several shortcomings common to traditional DR methods while providing important economic benefits to the grid operator and prosumers. To showcase the better efficacy of the RL-based method, this work includes a comparison to a baseline traditional operation scenario in a small-scale microgrid system. Results showed significant daily bill reductions (e.g., 38% , 46%, and 26%) for prosumers in the proposed RL-based marketplace . Moreover, experiments on the use of prosumers’ energy storage capacity in this microgrid setup highlight the advantages of the proposed method in establishing a fair and balanced market setup.

References

  • [1] “Demand response and smart grids—a survey,” Renewable and Sustainable Energy Reviews, vol. 30, pp. 461 – 478, 2014.
  • [2] H. T. Haider, O. H. See, and W. Elmenreich, “A review of residential demand response of smart grid,” Renewable and Sustainable Energy Reviews, vol. 59, pp. 166–178, 2016.
  • [3] J. S. Vardakas, N. Zorba, and C. V. Verikoukis, “A survey on demand response programs in smart grids: Pricing methods and optimization algorithms,” IEEE Communications Surveys Tutorials, vol. 17, no. 1, pp. 152–178, 2015.
  • [4] J. Morales González, A. Conejo, H. Madsen, P. Pinson, and M. Zugno, Integrating Renewables in Electricity Markets: Operational Problems, ser. International Series in Operations Research and Management Science.   Springer, 2014.
  • [5] J. Su, P. Dehghanian, M. Nazemi, and B. Wang, “Distributed wind power resources for enhanced power grid resilience,” in 2019 North American Power Symposium (NAPS), 2019, pp. 1–6.
  • [6] H. Roh and J. Lee, “Residential demand response scheduling with multiclass appliances in the smart grid,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp. 94–104, 2016.
  • [7] R. Kaviani, M. Rashidinejad, and A. Abdollahi, “A milp igdt-based self-scheduling model for participating in electricity markets,” in 2016 24th Iranian Conference on Electrical Engineering (ICEE), 2016, pp. 152–157.
  • [8] M. Ostadijafari, R. R. Jha, and A. Dubey, “Aggregation and bidding of residential demand response into wholesale market,” in 2020 IEEE Texas Power and Energy Conference (TPEC), 2020, pp. 1–6.
  • [9] M. H. Shoreh, P. Siano, M. Shafie-khah, V. Loia, and J. P. Catalão, “A survey of industrial applications of demand response,” Electric Power Systems Research, vol. 141, pp. 31 – 49, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0378779616302632
  • [10] X. Huang, S. H. Hong, and Y. Li, “Hour-ahead price based energy management scheme for industrial facilities,” IEEE Transactions on Industrial Informatics, vol. 13, no. 6, pp. 2886–2898, 2017.
  • [11] S. Althaher, P. Mancarella, and J. Mutale, “Automated demand response from home energy management system under dynamic pricing and power and comfort constraints,” IEEE Transactions on Smart Grid, vol. 6, no. 4, pp. 1874–1883, 2015.
  • [12] O. Ciftci, M. Mehrtash, F. Safdarian, and A. Kargarian, “Chance-constrained microgrid energy management with flexibility constraints provided by battery storage,” in 2019 IEEE Texas Power and Energy Conference (TPEC), 2019, pp. 1–6.
  • [13] A. Asrari, M. Ansari, J. Khazaei, and P. Fajri, “A market framework for decentralized congestion management in smart distribution grids considering collaboration among electric vehicle aggregators,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1147–1158, 2020.
  • [14] H. Panamtash, Q. Zhou, T. Hong, Z. Qu, and K. O. Davis, “A copula-based bayesian method for probabilistic solar power forecasting,” Solar Energy, vol. 196, pp. 336 – 345, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0038092X1931179X
  • [15] H. Yang, T. Xiong, J. Qiu, D. Qiu, and Z. Y. Dong, “Optimal operation of des/cchp based regional multi-energy prosumer with demand response,” Applied Energy, vol. 167, pp. 353 – 365, 2016.
  • [16] N. Liu, M. Cheng, X. Yu, J. Zhong, and J. Lei, “Energy-sharing provider for pv prosumer clusters: A hybrid approach using stochastic programming and stackelberg game,” IEEE Transactions on Industrial Electronics, vol. 65, no. 8, pp. 6740–6750, 2018.
  • [17] J. R. Vázquez-Canteli and Z. Nagy, “Reinforcement learning for demand response: A review of algorithms and modeling techniques,” Applied Energy, vol. 235, pp. 1072 – 1089, 2019.
  • [18] B. Kim, Y. Zhang, M. van der Schaar, and J. Lee, “Dynamic pricing and energy consumption scheduling with reinforcement learning,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2187–2198, 2016.
  • [19] R. Lu, S. H. Hong, and X. Zhang, “A dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach,” Applied Energy, vol. 220, pp. 220–230, 2018.
  • [20] A. Ghasemi, A. Shojaeighadikolaei, K. R. Jones, A. G. Bardas, M. Hashemi, and R. Ahmadi, “A multi-agent deep reinforcement learning approach for a distributed energy marketplace in smart grids,” in IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, 2020.
  • [21] N. Naderializadeh and M. Hashemi, “Energy-aware multi-server mobile edge computing: A deep reinforcement learning approach,” in 53rd Asilomar Conference on Signals, Systems, and Computers.   IEEE, 2019, pp. 383–387.
  • [22] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” 2017.
  • [23] California ISO. Current and forecasted demand. [Online]. Available: http://www.caiso.com/TodaysOutlook/Pages/default.aspx