Tutorial¶
IntelliHealer provides imitation learning algorithms with several variations for different feature inputs.
IntelliHealer can be used as a Gym environment for distribution system restoration to connect with state-of-the-art reinforcement learning algorithms, such as Stable-Baselines3. Currently, it contains two test feeders: 33-node and 119-node system.
IntelliHealer provides distribution system optimization models built on Pyomo, whicn can be used to develop other problem formulations.
Distribution System Restoration Optimization¶
The distribution system restoration is modeled in the OutageManage class.
To solve a restoration problem, we will first to build the problem object as follows:
from gym_power_res.envs.DS_pyomo import OutageManage
problem = OutageManage()
The
OutageManagewill read a system case data specified indata_test_case.py. A test case data can be obtained by:
import gym_power_res.envs.data_test_case as case
ppc = case.case33_tieline()
Then we initialize the problem class with the test case data
ppcand a line outage vector. Take a line outage vector['line_3', 'line_5', 'line_9']as an example.
problem.data_preparation(ppc, ['line_3', 'line_5', 'line_9'])
Here we assume in each time step, only one tie-line can be operated. Then, we need to specify the total time step of the problem as follows:
problem.initialize_problem(total_time_step)
Then we can solve the problem using the pre-defined constraints in the function
solve_network_restoration.
problem.solve_network_restoration()
opt = pm.SolverFactory("cplex", executable = '/Applications/CPLEX_Studio128/cplex/bin/x86-64_osx/cplex')
opt.options['mipgap'] = 0
results = opt.solve(problem.model, tee = True)
Gym for Dsitribution System Restoration¶
The Gym for dsitribution system restoration is built based on the Gym environment template and the optimization models described in the above section. There are three classes for different scenarios:
RestorationDisEnv: 33-node Gyn environment for imitation learningRestorationDisEnvRL: 33-node Gyn environment for Stable-Baselines3RestorationDisEnv119: 119-node Gyn environment for imitation learning
Please refer to the following steps to run the envrironment.
First, we will import and make the environment with the max/min line outage numbers as follows:
from gym_power_res.envs import RestorationDisEnv # import environment
ENV_NAME_1 = "RestorationDisEnv-v0" # define the name of the environment
env = gym.make(ENV_NAME_1, max_disturbance=2, min_disturbance=2) # define the environment object
Second, we will reset the environment. The optional input is a line outage vector, such as
['line_3', 'line_5', 'line_9']. Without the input, the outage will be randomly sampled from all lines.
env.reset(['line_6', 'line_11'])
During the reset process, the restoration optimization problem object named
sim_caseis created and initialized. Then we can simulate the evolution of the environment under sequetial actions
env.step(action_1)
env.step(action_2)
env.step(action_3)
Finally, we can retrieve the results using
env.sim_case.get_solution_2d('bus_variable_name', env.sim_case.iter_bus, env.sim_case.iter_time)
env.sim_case.get_solution_2d('line_variable_name', env.sim_case.iter_line, env.sim_case.iter_time)
Imitation Learning¶
The imitation learning algoritm is implemented in the function main_behavior_cloning. It requires the env object
and the agent object. The operation of the alforithm is described below with self-explanatory comments.
def main_behavior_cloning(output_path):
""" BC algorithm
"""
# ============= create GYM environment ===============
env = gym.make(ENV_NAME_1, , max_disturbance=1, min_disturbance=1)
# ============== create agent ===================
agent = Agent(env, output_path)
# ============= Begin main training loop ===========
flag_convergence = False # set convergence flag to be false
tic = time.perf_counter() # start clock
for it in range(NUM_TOTAL_EPISODES):
if it % 1 == 0:
toc = time.perf_counter()
print("===================================================")
print(f"Training time: {toc - tic:0.4f} seconds; Mission {it:d} of {NUM_TOTAL_EPISODES:d}")
print("===================================================")
agent.logger.info(f"=============== Mission {it:d} of {NUM_TOTAL_EPISODES:d} =================")
# initialize environment
s0, l0 = env.reset()
# get expert policy
# note that expert only retrieve information from the environment but will not change it
agent.get_expert_policy(env, s0)
# determine if use expert advice or learned policy
# main difference between behavior cloning and DAGGER
if it > WARM_START_OFF:
agent.warmstart = False
# executes the expert policy and perform imitation learning
agent.run_train(env, s0, l0)
# test current trained policy network using new environment from certain iterations
if it >= 0:
# initialize environment
s0, l0 = env.reset()
# execute learned policy on the environment
flag_convergence = agent.run_test(env, s0)
if flag_convergence == True:
break
agent.total_episode = agent.total_episode + 1
return agent