The
optimal
dispatch
of
energy
storage
systems
(ESSs)
presents
formidable
challenges
due
to
the
uncertainty
introduced
by
fluctuations
in
dynamic
prices,
demand
consumption,
and
renewable-based
generation.
By
exploiting
generalization
capabilities
deep
neural
networks
(DNNs),
reinforcement
learning
(DRL)
algorithms
can
learn
good-quality
control
models
that
adaptively
respond
distribution
networks'
stochastic
nature.
However,
current
standard
DRL
are
limited
constraint
satisfaction
unable
provide
feasible
actions.
To
address
this
issue,
we
propose
a
framework
effectively
handles
continuous
action
spaces
while
strictly
enforcing
environments
space
operational
constraints
during
online
operation.
Firstly,
proposed
trains
an
action-value
function
modeled
using
DNNs.
Subsequently,
is
formulated
as
mixed-integer
programming
(MIP)
formulation,
enabling
consideration
environment's
constraints.
Comprehensive
numerical
simulations
show
superior
performance
MIP-DRL
framework,
all
delivering
high-quality
decisions
when
compared
with
state-of-the-art
solution
obtained
perfect
forecast
variables.
Intelligent Decision Technologies,
Journal Year:
2024,
Volume and Issue:
18(4), P. 3091 - 3104
Published: Nov. 1, 2024
This
article
aimed
to
use
the
proximal
policy
optimization
(PPO)
algorithm
address
limitations
of
power
system
startup
strategies,
enhance
adaptability,
coping
ability,
and
overall
robustness
variable
grid
demand
integrated
renewable
energy,
constraints
in
start-up
strategy
are
optimized.
Firstly,
this
constructed
a
dynamic
model
system,
including
key
components
such
as
generators,
transformers,
transmission
lines;
secondly,
it
PPO
designed
interfaces
that
allow
interact
with
model;
afterward,
state
variables
were
determined,
reward
function
was
evaluate
efficiency
stability
system.
Next,
adjusted
trained
iterated
multiple
times
simulation
environment
guide
learn
optimal
strategy.
Finally,
an
effective
evaluation
can
be
conducted.
The
research
results
showed
after
by
algorithm,
stable
frequency
only
took
about
23
seconds,
recovery
time
reduced
33.3%
under
sudden
load
increase.
used
significantly
optimize
intelligent