#collapse-output
import os.path
!pip install ray[rllib]==1.8.0
!pip install tensorflow==2.7.0
!pip install seaborn==0.11.2
!pip install gym==0.21.0
!pip install pyglet==1.5.21
This blog post is still work in progress. Currently, there seems to be an issue with attention in RLlib. I have not had the time to look into this again but still wanted to share the current state since it might still be useful.
In reinforcement learning (RL), the RL agent typically selects a suitable action based on the last observation. In many practical environments, the full state can only be observed partially, such that important information may be missing when just considering the last observation. This blog post covers options for dealing with missing and only partially observed state, e.g., considering a sequence of last observations and applying self-attention to this sequence.
This blog post is based on and very related to Anyscale’s blog post on attention nets with RLlib. In comparison, I focus less on RLlib’s trajectory API and more on providing a practical, end-to-end tutorial.
Example: The CartPole Gym Environment
As an example, consider the popular OpenAI Gym CartPole environment. Here, the task is to move a cart left or right in order to balance a pole on the cart as long as possible.
In the normal CartPole-v1
environment, the RL agent observes four scalar values (defined here): * The cart position, i.e., where the cart currently is. * The cart velocity, i.e., how fast the cart is currently moving and in which direction (can be positive or negative). * The pole angle, i.e., how tilted the pole currently is and in which direction. * The pole angular velocity, i.e., how fast the pole is currently moving and in which direction.
All four observations are important to decide whether the cart should move left or right.
Now, assume the RL agent only has access to an instant snapshot of the cart and the pole (e.g., through a photo/raw pixels) and can neither observe cart velocity nor pole angular velocity. In this case, the RL agent only has partial observations and does not know whether and how fast the pole is currently swinging. As a result, standard RL agents cannot solve the problem and do not learn to balance the pole. How to deal with this problem of partial observations, i.e., missing state (here, cart and pole velocity)?
Options for Dealing With Partial Observations
There are different options for dealing with partial observations/missing state, e.g., missing velocity in the CartPole example:
- Add the missing state explicitly, e.g., measure and observe velocity. Note that this may require installing extra sensors or may even be infeasible in some scenarios.
- Ignore the missing state, i.e., just rely on the available, partial observations. Depending on the missing state, this may be problematic and keep the agent from learning.
- Keep track of a sequence of the last observations. By observing the cart position and pole angle over time, the agent can implicitly derive their velocity. There are different ways to deal with this sequence:
- Just use the sequence as is for a standard multi-layer perceptron (MLP)/dense feedforward neural network.
- Feed the sequence into a recurrent neural network (RNN), e.g., with long short-term memory (LSTM).
- Feed the sequence into a neural network with self-attention.
In the following, I go through each option in more detail and illustrate them using simple example code.
Setup
For the examples, I use a PPO RL agent from Ray RLlib with the CartPole environment, described above.
To install these dependencies, run the following code (tested with Python 3.8 on Windows):
Start up ray, load the default PPO config, and determine the number of training iterations, which is the same for all options (for comparability).
import ray
from ray.rllib.agents import ppo
# adjust num_cpus and num_gpus to your system
# for some reason, num_cpus=2 gets stuck on my system (when trying to train)
=3, ignore_reinit_error=True)
ray.init(num_cpus
# stop conditions based on training iterations (each with 4000 train steps)
= 10
train_iters = {"training_iteration": train_iters} stop
2021-12-01 22:52:23,565 INFO worker.py:832 -- Calling ray.init() again after it has already been called.
Option 1: Explicitly Add Missing State
Sometimes, it is possible to extend the observations and explicitly add important state that was previously unobserved. In the CartPole example, the cart and pole velocity can simply be “added” by using the default CartPole-v1
environment. Here, the cart velocity and pole velocity are already included in the observations.
Note that in many practical scenarios such “missing” state cannot be added and observed simply. Instead, it may require installing additional sensors or may even be completely infeasible.
Let’s start with the best case, i.e., explicitly including the missing state.
import gym
# the default CartPole env has all 4 observations: position and velocity of both cart and pole
= gym.make("CartPole-v1")
env env.observation_space.shape
(4,)
#collapse-output
# run PPO on the default CartPole-v1 env
= ppo.DEFAULT_CONFIG.copy()
config1 "env"] = "CartPole-v1"
config1[
# training takes a while
= ray.tune.run("PPO", config=config1, stop=stop)
results1 print("Option 1: Training finished successfully")
Current time: 2021-12-01 22:52:39 (running for 00:00:00.16)
Memory usage on this node: 9.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_CartPole-v1_0091e_00000 | PENDING |
(pid=16556) 2021-12-01 22:52:50,305 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=16556) 2021-12-01 22:52:50,310 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=16556) 2021-12-01 22:52:50,310 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=2436) 2021-12-01 22:53:02,005 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=16556) 2021-12-01 22:53:04,522 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=16556) 2021-12-01 22:53:05,755 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=16556) 2021-12-01 22:53:05,755 INFO trainable.py:110 -- Trainable.setup took 15.450 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=16556) 2021-12-01 22:53:05,755 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=16556) 2021-12-01 22:53:12,536 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
(pid=16556) Windows fatal exception: access violation
(pid=16556)
(pid=2436) [2021-12-01 22:54:37,017 E 2436 8960] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=2436) Windows fatal exception: access violation
(pid=2436)
(pid=14216) [2021-12-01 22:54:37,018 E 14216 11712] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=14216) Windows fatal exception: access violation
(pid=14216)
2021-12-01 22:54:37,138 INFO tune.py:630 -- Total run time: 118.07 seconds (117.55 seconds for the tuning loop).
Current time: 2021-12-01 22:53:05 (running for 00:00:26.46)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 |
Current time: 2021-12-01 22:53:06 (running for 00:00:27.52)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 |
Current time: 2021-12-01 22:53:12 (running for 00:00:32.87)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 |
Current time: 2021-12-01 22:53:17 (running for 00:00:37.94)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 |
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_22-53-17
done: false
episode_len_mean: 20.331632653061224
episode_media: {}
episode_reward_max: 69.0
episode_reward_mean: 20.331632653061224
episode_reward_min: 8.0
episodes_this_iter: 196
episodes_total: 196
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6666923761367798
entropy_coeff: 0.0
kl: 0.02727562002837658
model: {}
policy_loss: -0.03548957407474518
total_loss: 163.0438232421875
vf_explained_var: 0.02411726862192154
vf_loss: 163.0738525390625
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 85.05625
ram_util_percent: 85.14375
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1063987523513995
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12976663512813205
mean_inference_ms: 2.738906715446057
mean_raw_obs_processing_ms: 0.29470778093728145
time_since_restore: 11.613266468048096
time_this_iter_s: 11.613266468048096
time_total_s: 11.613266468048096
timers:
learn_throughput: 829.341
learn_time_ms: 4823.104
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 590.463
sample_time_ms: 6774.35
update_time_ms: 2.998
timestamp: 1638395597
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_22-53-27
done: false
episode_len_mean: 43.5
episode_media: {}
episode_reward_max: 128.0
episode_reward_mean: 43.5
episode_reward_min: 9.0
episodes_this_iter: 85
episodes_total: 281
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.30000001192092896
cur_lr: 4.999999873689376e-05
entropy: 0.6100984811782837
entropy_coeff: 0.0
kl: 0.018913770094513893
model: {}
policy_loss: -0.03986572101712227
total_loss: 392.36260986328125
vf_explained_var: 0.05626700446009636
vf_loss: 392.3967590332031
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 85.2
ram_util_percent: 85.08571428571429
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10915718929193942
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12762452156562074
mean_inference_ms: 2.478078257762592
mean_raw_obs_processing_ms: 0.24854778323044394
time_since_restore: 21.935389518737793
time_this_iter_s: 10.322123050689697
time_total_s: 21.935389518737793
timers:
learn_throughput: 806.601
learn_time_ms: 4959.079
load_throughput: 8015870.043
load_time_ms: 0.499
sample_throughput: 474.211
sample_time_ms: 8435.067
update_time_ms: 3.001
timestamp: 1638395607
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_22-53-37
done: false
episode_len_mean: 70.15
episode_media: {}
episode_reward_max: 292.0
episode_reward_mean: 70.15
episode_reward_min: 11.0
episodes_this_iter: 36
episodes_total: 317
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.30000001192092896
cur_lr: 4.999999873689376e-05
entropy: 0.5675911903381348
entropy_coeff: 0.0
kl: 0.009604203514754772
model: {}
policy_loss: -0.02363675646483898
total_loss: 785.93994140625
vf_explained_var: 0.09917476773262024
vf_loss: 785.960693359375
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 83.96428571428571
ram_util_percent: 85.07857142857142
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11086732721675017
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.129464220418653
mean_inference_ms: 2.403483071642798
mean_raw_obs_processing_ms: 0.23551804810537205
time_since_restore: 31.98438596725464
time_this_iter_s: 10.048996448516846
time_total_s: 31.98438596725464
timers:
learn_throughput: 827.692
learn_time_ms: 4832.716
load_throughput: 6088260.312
load_time_ms: 0.657
sample_throughput: 436.537
sample_time_ms: 9163.022
update_time_ms: 2.669
timestamp: 1638395617
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_22-53-46
done: false
episode_len_mean: 97.99
episode_media: {}
episode_reward_max: 371.0
episode_reward_mean: 97.99
episode_reward_min: 11.0
episodes_this_iter: 20
episodes_total: 337
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.30000001192092896
cur_lr: 4.999999873689376e-05
entropy: 0.5582262873649597
entropy_coeff: 0.0
kl: 0.0037608244456350803
model: {}
policy_loss: -0.012490973807871342
total_loss: 696.2131958007812
vf_explained_var: 0.2233099341392517
vf_loss: 696.2244873046875
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 83.60769230769232
ram_util_percent: 85.76923076923075
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1126154093070354
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1320370369396816
mean_inference_ms: 2.352884663632094
mean_raw_obs_processing_ms: 0.2294510967493389
time_since_restore: 40.960214138031006
time_this_iter_s: 8.975828170776367
time_total_s: 40.960214138031006
timers:
learn_throughput: 839.249
learn_time_ms: 4766.169
load_throughput: 8117680.416
load_time_ms: 0.493
sample_throughput: 438.213
sample_time_ms: 9127.987
update_time_ms: 2.002
timestamp: 1638395626
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_22-53-56
done: false
episode_len_mean: 132.51
episode_media: {}
episode_reward_max: 500.0
episode_reward_mean: 132.51
episode_reward_min: 12.0
episodes_this_iter: 15
episodes_total: 352
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.15000000596046448
cur_lr: 4.999999873689376e-05
entropy: 0.5601204037666321
entropy_coeff: 0.0
kl: 0.0012711239978671074
model: {}
policy_loss: -0.007236114237457514
total_loss: 605.6217041015625
vf_explained_var: 0.29318979382514954
vf_loss: 605.6287841796875
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 83.5
ram_util_percent: 86.91538461538461
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11232943522683439
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1322710531601683
mean_inference_ms: 2.318162079591458
mean_raw_obs_processing_ms: 0.22431872747963477
time_since_restore: 50.56009912490845
time_this_iter_s: 9.599884986877441
time_total_s: 50.56009912490845
timers:
learn_throughput: 851.742
learn_time_ms: 4696.257
load_throughput: 10147100.52
load_time_ms: 0.394
sample_throughput: 431.646
sample_time_ms: 9266.849
update_time_ms: 1.601
timestamp: 1638395636
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_22-54-04
done: false
episode_len_mean: 162.46
episode_media: {}
episode_reward_max: 500.0
episode_reward_mean: 162.46
episode_reward_min: 13.0
episodes_this_iter: 16
episodes_total: 368
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.07500000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5491490960121155
entropy_coeff: 0.0
kl: 0.012883742339909077
model: {}
policy_loss: -0.014221735298633575
total_loss: 350.70465087890625
vf_explained_var: 0.5025997757911682
vf_loss: 350.7178955078125
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 69.30000000000001
ram_util_percent: 87.18181818181819
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11068721601459906
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1316580125011057
mean_inference_ms: 2.2647407230497265
mean_raw_obs_processing_ms: 0.21722746948586308
time_since_restore: 58.025999307632446
time_this_iter_s: 7.465900182723999
time_total_s: 58.025999307632446
timers:
learn_throughput: 884.584
learn_time_ms: 4521.899
load_throughput: 12176520.624
load_time_ms: 0.329
sample_throughput: 439.637
sample_time_ms: 9098.425
update_time_ms: 1.335
timestamp: 1638395644
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_22-54-12
done: false
episode_len_mean: 196.68
episode_media: {}
episode_reward_max: 500.0
episode_reward_mean: 196.68
episode_reward_min: 15.0
episodes_this_iter: 8
episodes_total: 376
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.07500000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5566104650497437
entropy_coeff: 0.0
kl: 0.0053793760016560555
model: {}
policy_loss: -0.009221607819199562
total_loss: 434.2621765136719
vf_explained_var: 0.1736932396888733
vf_loss: 434.2709655761719
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 71.21818181818182
ram_util_percent: 87.13636363636364
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11012958319487443
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.13087477368874187
mean_inference_ms: 2.2335852221532058
mean_raw_obs_processing_ms: 0.2131696843540125
time_since_restore: 66.21467590332031
time_this_iter_s: 8.188676595687866
time_total_s: 66.21467590332031
timers:
learn_throughput: 900.369
learn_time_ms: 4442.624
load_throughput: 14205940.728
load_time_ms: 0.282
sample_throughput: 447.857
sample_time_ms: 8931.421
update_time_ms: 1.144
timestamp: 1638395652
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_22-54-21
done: false
episode_len_mean: 229.19
episode_media: {}
episode_reward_max: 500.0
episode_reward_mean: 229.19
episode_reward_min: 15.0
episodes_this_iter: 9
episodes_total: 385
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.07500000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5450154542922974
entropy_coeff: 0.0
kl: 0.0061668953858315945
model: {}
policy_loss: -0.006067643407732248
total_loss: 457.9305114746094
vf_explained_var: 0.032116785645484924
vf_loss: 457.9361267089844
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 76.9
ram_util_percent: 87.25833333333333
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10949012056065484
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12963875953335788
mean_inference_ms: 2.2011992223423453
mean_raw_obs_processing_ms: 0.20929612392916075
time_since_restore: 74.98133444786072
time_this_iter_s: 8.766658544540405
time_total_s: 74.98133444786072
timers:
learn_throughput: 920.017
learn_time_ms: 4347.744
load_throughput: 16235360.832
load_time_ms: 0.246
sample_throughput: 446.868
sample_time_ms: 8951.185
update_time_ms: 1.001
timestamp: 1638395661
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_22-54-28
done: false
episode_len_mean: 260.25
episode_media: {}
episode_reward_max: 500.0
episode_reward_mean: 260.25
episode_reward_min: 15.0
episodes_this_iter: 8
episodes_total: 393
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.07500000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5320675373077393
entropy_coeff: 0.0
kl: 0.0075341472402215
model: {}
policy_loss: -0.0072624157182872295
total_loss: 404.454345703125
vf_explained_var: 0.05579644814133644
vf_loss: 404.4610290527344
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 68.26999999999998
ram_util_percent: 87.2
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10868070777730038
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1282069423856235
mean_inference_ms: 2.1709855087013663
mean_raw_obs_processing_ms: 0.2055371812270087
time_since_restore: 82.31100749969482
time_this_iter_s: 7.3296730518341064
time_total_s: 82.31100749969482
timers:
learn_throughput: 939.355
learn_time_ms: 4258.24
load_throughput: 18264780.936
load_time_ms: 0.219
sample_throughput: 455.121
sample_time_ms: 8788.864
update_time_ms: 0.89
timestamp: 1638395668
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: 0091e_00000
Result for PPO_CartPole-v1_0091e_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_22-54-36
done: true
episode_len_mean: 292.74
episode_media: {}
episode_reward_max: 500.0
episode_reward_mean: 292.74
episode_reward_min: 15.0
episodes_this_iter: 9
episodes_total: 402
experiment_id: 9b99b97d259948058ce175fdb437bf92
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.07500000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5234162211418152
entropy_coeff: 0.0
kl: 0.004971951246261597
model: {}
policy_loss: -0.0019533345475792885
total_loss: 415.5965576171875
vf_explained_var: 0.15562385320663452
vf_loss: 415.59814453125
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.35000000000001
ram_util_percent: 87.0
pid: 16556
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10759807711359962
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12680782099974997
mean_inference_ms: 2.1360139834425205
mean_raw_obs_processing_ms: 0.20154871655205042
time_since_restore: 90.66206645965576
time_this_iter_s: 8.351058959960938
time_total_s: 90.66206645965576
timers:
learn_throughput: 951.379
learn_time_ms: 4204.422
load_throughput: 6695620.386
load_time_ms: 0.597
sample_throughput: 458.038
sample_time_ms: 8732.908
update_time_ms: 1.201
timestamp: 1638395676
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: 0091e_00000
Option 1: Training finished successfully
Current time: 2021-12-01 22:53:22 (running for 00:00:43.31)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 1 | 11.6133 | 4000 | 20.3316 | 69 | 8 | 20.3316 |
Current time: 2021-12-01 22:53:27 (running for 00:00:48.40)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 1 | 11.6133 | 4000 | 20.3316 | 69 | 8 | 20.3316 |
Current time: 2021-12-01 22:53:32 (running for 00:00:53.64)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 2 | 21.9354 | 8000 | 43.5 | 128 | 9 | 43.5 |
Current time: 2021-12-01 22:53:38 (running for 00:00:59.59)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 3 | 31.9844 | 12000 | 70.15 | 292 | 11 | 70.15 |
Current time: 2021-12-01 22:53:43 (running for 00:01:04.64)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 3 | 31.9844 | 12000 | 70.15 | 292 | 11 | 70.15 |
Current time: 2021-12-01 22:53:49 (running for 00:01:10.64)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 4 | 40.9602 | 16000 | 97.99 | 371 | 11 | 97.99 |
Current time: 2021-12-01 22:53:55 (running for 00:01:15.74)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 4 | 40.9602 | 16000 | 97.99 | 371 | 11 | 97.99 |
Current time: 2021-12-01 22:54:00 (running for 00:01:21.29)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 5 | 50.5601 | 20000 | 132.51 | 500 | 12 | 132.51 |
Current time: 2021-12-01 22:54:06 (running for 00:01:26.77)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 6 | 58.026 | 24000 | 162.46 | 500 | 13 | 162.46 |
Current time: 2021-12-01 22:54:11 (running for 00:01:31.87)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 6 | 58.026 | 24000 | 162.46 | 500 | 13 | 162.46 |
Current time: 2021-12-01 22:54:16 (running for 00:01:37.04)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 7 | 66.2147 | 28000 | 196.68 | 500 | 15 | 196.68 |
Current time: 2021-12-01 22:54:23 (running for 00:01:43.79)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 8 | 74.9813 | 32000 | 229.19 | 500 | 15 | 229.19 |
Current time: 2021-12-01 22:54:28 (running for 00:01:48.84)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 8 | 74.9813 | 32000 | 229.19 | 500 | 15 | 229.19 |
Current time: 2021-12-01 22:54:34 (running for 00:01:55.21)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | RUNNING | 127.0.0.1:16556 | 9 | 82.311 | 36000 | 260.25 | 500 | 15 | 260.25 |
Current time: 2021-12-01 22:54:36 (running for 00:01:57.59)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\git-repos\private\blog\_notebooks\results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_CartPole-v1_0091e_00000 | TERMINATED | 127.0.0.1:16556 | 10 | 90.6621 | 40000 | 292.74 | 500 | 15 | 292.74 |
# check and print results
def print_reward(results):
= "episode_reward_mean"
results.default_metric = "max"
results.default_mode # print mean number of time steps the pole was balanced (higher = better)
= results.best_result["episode_reward_mean"]
reward print(f"Reward after {train_iters} training iterations: {reward}")
print_reward(results1)
Reward after 10 training iterations: 292.74
# plot the last 100 episode rewards
import seaborn as sns
def plot_rewards(results):
"""Plot scatter plot of the last 100 training episodes"""
= results.best_result["hist_stats"]["episode_reward"]
eps_rewards = [i for i in range(len(eps_rewards))]
eps = sns.scatterplot(eps, eps_rewards)
ax "Reward over the last 100 Episodes")
ax.set_title("Episodes")
ax.set_xlabel("Episode Reward")
ax.set_ylabel(
plot_rewards(results1)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
import os
import pandas as pd
# plot complete learning curve based on logged progress
def plot_learning(results, label=None):
"""Plot lineplot of the mean episode reward over all training iterations"""
= os.path.join(results.best_logdir, "progress.csv")
progress_path = pd.read_csv(progress_path)
df = sns.lineplot(x=df["training_iteration"], y=df["episode_reward_mean"], label=label)
ax "Mean Episode Reward over Training Iterations")
ax.set_title(
="1: Full Observations") plot_learning(results1, label
Including the missing state helps the agent learn a good policy quickly, leading to high reward.
Option 2: Ignore Missing State
In many practical scenarios, missing state cannot be simply added to complete the partial observations, e.g., because measuring/capturing the missing observations incurs prohibitive costs or is physically not feasible.
In this case, the simplest alternative is using the partial observations as they are available. This works if the observations still include enough information to learn a useful policy.
However, if too much important information is missing, learning a useful policy becomes slow or even impossible. In the CartPole example, partial observations that do not include the velocity of the cart and the pole keep the agent from learning a useful policy.
#collapse-output
from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole
from ray.tune import registry
"StatelessCartPole", lambda _: StatelessCartPole())
registry.register_env(= ppo.DEFAULT_CONFIG.copy()
config2 "env"] = "StatelessCartPole"
config2[# train; this takes a while
= ray.tune.run("PPO", config=config2, stop=stop)
results2 print("Option 2: Training finished successfully")
Current time: 2021-12-01 22:57:23 (running for 00:00:00.16)
Memory usage on this node: 9.7/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_aa22d_00000 | PENDING |
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=9044) 2021-12-01 22:57:37,705 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=9044) 2021-12-01 22:57:37,705 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=9044) 2021-12-01 22:57:37,705 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=18476) 2021-12-01 22:57:53,455 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=9044) 2021-12-01 22:57:54,972 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=9044) 2021-12-01 22:57:56,141 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=9044) 2021-12-01 22:57:56,141 INFO trainable.py:110 -- Trainable.setup took 18.440 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=9044) 2021-12-01 22:57:56,141 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=9044) 2021-12-01 22:58:00,922 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
2021-12-01 22:59:20,276 INFO tune.py:630 -- Total run time: 116.62 seconds (116.23 seconds for the tuning loop).
(pid=9044) [2021-12-01 22:59:20,145 E 9044 18960] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=9044) Windows fatal exception: access violation
(pid=9044)
(pid=18476) [2021-12-01 22:59:20,149 E 18476 12556] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=18476) Windows fatal exception: access violation
(pid=18476)
(pid=16556) [2021-12-01 22:59:20,148 E 16556 1448] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=16556) Windows fatal exception: access violation
(pid=16556)
Current time: 2021-12-01 22:57:28 (running for 00:00:05.16)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_aa22d_00000 | PENDING |
Current time: 2021-12-01 22:57:56 (running for 00:00:32.50)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 |
Current time: 2021-12-01 22:57:57 (running for 00:00:33.51)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 |
Current time: 2021-12-01 22:58:02 (running for 00:00:38.63)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 |
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_22-58-06
done: false
episode_len_mean: 22.44632768361582
episode_media: {}
episode_reward_max: 85.0
episode_reward_mean: 22.44632768361582
episode_reward_min: 8.0
episodes_this_iter: 177
episodes_total: 177
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6807681322097778
entropy_coeff: 0.0
kl: 0.012478094547986984
model: {}
policy_loss: -0.02269022725522518
total_loss: 180.2766876220703
vf_explained_var: 0.0005618375726044178
vf_loss: 180.296875
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 83.74285714285713
ram_util_percent: 88.00714285714285
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10327919820561605
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12135616748037466
mean_inference_ms: 1.9038462404123306
mean_raw_obs_processing_ms: 0.16696460223270435
time_since_restore: 10.051005840301514
time_this_iter_s: 10.051005840301514
time_total_s: 10.051005840301514
timers:
learn_throughput: 757.885
learn_time_ms: 5277.849
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 837.645
sample_time_ms: 4775.29
update_time_ms: 5.515
timestamp: 1638395886
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_22-58-13
done: false
episode_len_mean: 30.083333333333332
episode_media: {}
episode_reward_max: 106.0
episode_reward_mean: 30.083333333333332
episode_reward_min: 8.0
episodes_this_iter: 132
episodes_total: 309
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.648204505443573
entropy_coeff: 0.0
kl: 0.00953536108136177
model: {}
policy_loss: -0.010645464062690735
total_loss: 191.36209106445312
vf_explained_var: 0.02945260889828205
vf_loss: 191.37083435058594
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 73.69090909090909
ram_util_percent: 88.29090909090907
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.09029685607221599
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1000974098193085
mean_inference_ms: 1.6932173327936377
mean_raw_obs_processing_ms: 0.18978260286703064
time_since_restore: 17.443942308425903
time_this_iter_s: 7.39293646812439
time_total_s: 17.443942308425903
timers:
learn_throughput: 899.218
learn_time_ms: 4448.31
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 577.076
sample_time_ms: 6931.495
update_time_ms: 5.261
timestamp: 1638395893
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_22-58-24
done: false
episode_len_mean: 37.31481481481482
episode_media: {}
episode_reward_max: 143.0
episode_reward_mean: 37.31481481481482
episode_reward_min: 9.0
episodes_this_iter: 108
episodes_total: 417
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6112045049667358
entropy_coeff: 0.0
kl: 0.006910913623869419
model: {}
policy_loss: -0.015092005021870136
total_loss: 245.5015411376953
vf_explained_var: 0.021608643233776093
vf_loss: 245.5152587890625
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 92.86
ram_util_percent: 88.22666666666667
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10518439466006081
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11606732866669264
mean_inference_ms: 1.8527932982897477
mean_raw_obs_processing_ms: 0.19247182033120946
time_since_restore: 28.011292934417725
time_this_iter_s: 10.567350625991821
time_total_s: 28.011292934417725
timers:
learn_throughput: 856.291
learn_time_ms: 4671.309
load_throughput: 11972323.501
load_time_ms: 0.334
sample_throughput: 522.229
sample_time_ms: 7659.481
update_time_ms: 4.504
timestamp: 1638395904
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_22-58-32
done: false
episode_len_mean: 42.79
episode_media: {}
episode_reward_max: 152.0
episode_reward_mean: 42.79
episode_reward_min: 10.0
episodes_this_iter: 94
episodes_total: 511
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5654925107955933
entropy_coeff: 0.0
kl: 0.004174998961389065
model: {}
policy_loss: -0.012154466472566128
total_loss: 252.0902862548828
vf_explained_var: 0.03405797854065895
vf_loss: 252.1016082763672
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.5
ram_util_percent: 86.57272727272728
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10566344965230401
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12007891816714177
mean_inference_ms: 1.8777825593616126
mean_raw_obs_processing_ms: 0.19165146846813094
time_since_restore: 36.43282437324524
time_this_iter_s: 8.421531438827515
time_total_s: 36.43282437324524
timers:
learn_throughput: 915.336
learn_time_ms: 4369.982
load_throughput: 15963098.002
load_time_ms: 0.251
sample_throughput: 483.585
sample_time_ms: 8271.563
update_time_ms: 3.884
timestamp: 1638395912
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_22-58-39
done: false
episode_len_mean: 43.32
episode_media: {}
episode_reward_max: 133.0
episode_reward_mean: 43.32
episode_reward_min: 11.0
episodes_this_iter: 90
episodes_total: 601
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.10000000149011612
cur_lr: 4.999999873689376e-05
entropy: 0.5446197986602783
entropy_coeff: 0.0
kl: 0.004509765654802322
model: {}
policy_loss: -0.005693237762898207
total_loss: 206.11839294433594
vf_explained_var: 0.1073232963681221
vf_loss: 206.1236572265625
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 69.06000000000002
ram_util_percent: 86.75
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10193395594952362
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11727529060495807
mean_inference_ms: 1.7989170802559789
mean_raw_obs_processing_ms: 0.18388387052508867
time_since_restore: 43.44782853126526
time_this_iter_s: 7.0150041580200195
time_total_s: 43.44782853126526
timers:
learn_throughput: 957.628
learn_time_ms: 4176.985
load_throughput: 19953872.502
load_time_ms: 0.2
sample_throughput: 497.425
sample_time_ms: 8041.409
update_time_ms: 3.708
timestamp: 1638395919
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_22-58-48
done: false
episode_len_mean: 47.98
episode_media: {}
episode_reward_max: 159.0
episode_reward_mean: 47.98
episode_reward_min: 11.0
episodes_this_iter: 81
episodes_total: 682
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.05000000074505806
cur_lr: 4.999999873689376e-05
entropy: 0.5157366991043091
entropy_coeff: 0.0
kl: 0.002469780156388879
model: {}
policy_loss: -0.00011549380724318326
total_loss: 223.69801330566406
vf_explained_var: 0.14510144293308258
vf_loss: 223.697998046875
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 77.66666666666667
ram_util_percent: 86.87499999999999
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10315175733432268
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11747585144831862
mean_inference_ms: 1.796743386961408
mean_raw_obs_processing_ms: 0.18361450972900276
time_since_restore: 52.08714461326599
time_this_iter_s: 8.639316082000732
time_total_s: 52.08714461326599
timers:
learn_throughput: 963.146
learn_time_ms: 4153.057
load_throughput: 23944647.003
load_time_ms: 0.167
sample_throughput: 497.199
sample_time_ms: 8045.069
update_time_ms: 5.866
timestamp: 1638395928
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_22-58-56
done: false
episode_len_mean: 50.24
episode_media: {}
episode_reward_max: 159.0
episode_reward_mean: 50.24
episode_reward_min: 11.0
episodes_this_iter: 80
episodes_total: 762
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.02500000037252903
cur_lr: 4.999999873689376e-05
entropy: 0.4738757014274597
entropy_coeff: 0.0
kl: 0.005073909182101488
model: {}
policy_loss: 0.00443687941879034
total_loss: 240.7548065185547
vf_explained_var: 0.14891892671585083
vf_loss: 240.75022888183594
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 72.13636363636364
ram_util_percent: 86.82727272727271
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10310251236597129
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11948908895760547
mean_inference_ms: 1.781021410338414
mean_raw_obs_processing_ms: 0.18258329446260468
time_since_restore: 59.83448100090027
time_this_iter_s: 7.747336387634277
time_total_s: 59.83448100090027
timers:
learn_throughput: 983.057
learn_time_ms: 4068.942
load_throughput: 27935421.503
load_time_ms: 0.143
sample_throughput: 495.114
sample_time_ms: 8078.951
update_time_ms: 5.028
timestamp: 1638395936
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_22-59-04
done: false
episode_len_mean: 50.37
episode_media: {}
episode_reward_max: 155.0
episode_reward_mean: 50.37
episode_reward_min: 9.0
episodes_this_iter: 81
episodes_total: 843
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.02500000037252903
cur_lr: 4.999999873689376e-05
entropy: 0.44857272505760193
entropy_coeff: 0.0
kl: 0.005331501364707947
model: {}
policy_loss: -0.00537552684545517
total_loss: 236.2506103515625
vf_explained_var: 0.16449585556983948
vf_loss: 236.25584411621094
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 73.15454545454546
ram_util_percent: 86.82727272727271
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10669802327244614
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11682882882097279
mean_inference_ms: 1.7686794144275058
mean_raw_obs_processing_ms: 0.1813271875273223
time_since_restore: 67.72298955917358
time_this_iter_s: 7.888508558273315
time_total_s: 67.72298955917358
timers:
learn_throughput: 996.781
learn_time_ms: 4012.918
load_throughput: 31926196.004
load_time_ms: 0.125
sample_throughput: 496.729
sample_time_ms: 8052.687
update_time_ms: 4.527
timestamp: 1638395944
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_22-59-12
done: false
episode_len_mean: 49.87
episode_media: {}
episode_reward_max: 110.0
episode_reward_mean: 49.87
episode_reward_min: 11.0
episodes_this_iter: 81
episodes_total: 924
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.02500000037252903
cur_lr: 4.999999873689376e-05
entropy: 0.42752259969711304
entropy_coeff: 0.0
kl: 0.005028429441154003
model: {}
policy_loss: -0.0017633900279179215
total_loss: 193.06703186035156
vf_explained_var: 0.2048284411430359
vf_loss: 193.06866455078125
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 75.8
ram_util_percent: 86.93333333333334
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10545672991997837
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11722357639711418
mean_inference_ms: 1.7522968685832132
mean_raw_obs_processing_ms: 0.17896026200224338
time_since_restore: 75.84080076217651
time_this_iter_s: 8.11781120300293
time_total_s: 75.84080076217651
timers:
learn_throughput: 994.383
learn_time_ms: 4022.594
load_throughput: 35916970.504
load_time_ms: 0.111
sample_throughput: 499.317
sample_time_ms: 8010.943
update_time_ms: 4.024
timestamp: 1638395952
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: aa22d_00000
Result for PPO_StatelessCartPole_aa22d_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_22-59-19
done: true
episode_len_mean: 46.75
episode_media: {}
episode_reward_max: 125.0
episode_reward_mean: 46.75
episode_reward_min: 13.0
episodes_this_iter: 84
episodes_total: 1008
experiment_id: 99df0008334f43779394474d46d27ce1
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.02500000037252903
cur_lr: 4.999999873689376e-05
entropy: 0.45696982741355896
entropy_coeff: 0.0
kl: 0.0024534217081964016
model: {}
policy_loss: 0.0026924554258584976
total_loss: 184.4345245361328
vf_explained_var: 0.2629404664039612
vf_loss: 184.43174743652344
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 70.31
ram_util_percent: 86.9
pid: 9044
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10223809426816181
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11759765514824477
mean_inference_ms: 1.7250900206764819
mean_raw_obs_processing_ms: 0.17782050917712555
time_since_restore: 83.26664853096008
time_this_iter_s: 7.425847768783569
time_total_s: 83.26664853096008
timers:
learn_throughput: 1002.298
learn_time_ms: 3990.829
load_throughput: 7987248.75
load_time_ms: 0.501
sample_throughput: 500.117
sample_time_ms: 7998.131
update_time_ms: 3.621
timestamp: 1638395959
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: aa22d_00000
Option 2: Training finished successfully
Current time: 2021-12-01 22:58:08 (running for 00:00:44.63)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 1 | 10.051 | 4000 | 22.4463 | 85 | 8 | 22.4463 |
Current time: 2021-12-01 22:58:13 (running for 00:00:49.73)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 1 | 10.051 | 4000 | 22.4463 | 85 | 8 | 22.4463 |
Current time: 2021-12-01 22:58:18 (running for 00:00:55.09)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 2 | 17.4439 | 8000 | 30.0833 | 106 | 8 | 30.0833 |
Current time: 2021-12-01 22:58:23 (running for 00:01:00.16)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 2 | 17.4439 | 8000 | 30.0833 | 106 | 8 | 30.0833 |
Current time: 2021-12-01 22:58:29 (running for 00:01:05.71)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 3 | 28.0113 | 12000 | 37.3148 | 143 | 9 | 37.3148 |
Current time: 2021-12-01 22:58:34 (running for 00:01:11.09)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 4 | 36.4328 | 16000 | 42.79 | 152 | 10 | 42.79 |
Current time: 2021-12-01 22:58:39 (running for 00:01:16.14)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 5 | 43.4478 | 20000 | 43.32 | 133 | 11 | 43.32 |
Current time: 2021-12-01 22:58:44 (running for 00:01:21.20)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 5 | 43.4478 | 20000 | 43.32 | 133 | 11 | 43.32 |
Current time: 2021-12-01 22:58:50 (running for 00:01:26.88)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 6 | 52.0871 | 24000 | 47.98 | 159 | 11 | 47.98 |
Current time: 2021-12-01 22:58:55 (running for 00:01:31.95)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 6 | 52.0871 | 24000 | 47.98 | 159 | 11 | 47.98 |
Current time: 2021-12-01 22:59:01 (running for 00:01:37.71)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 7 | 59.8345 | 28000 | 50.24 | 159 | 11 | 50.24 |
Current time: 2021-12-01 22:59:07 (running for 00:01:43.60)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 8 | 67.723 | 32000 | 50.37 | 155 | 9 | 50.37 |
Current time: 2021-12-01 22:59:12 (running for 00:01:48.70)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 8 | 67.723 | 32000 | 50.37 | 155 | 9 | 50.37 |
Current time: 2021-12-01 22:59:17 (running for 00:01:53.79)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | RUNNING | 127.0.0.1:9044 | 9 | 75.8408 | 36000 | 49.87 | 110 | 11 | 49.87 |
Current time: 2021-12-01 22:59:19 (running for 00:01:56.27)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_aa22d_00000 | TERMINATED | 127.0.0.1:9044 | 10 | 83.2666 | 40000 | 46.75 | 125 | 13 | 46.75 |
print_reward(results2)
Reward after 10 training iterations: 46.75
plot_rewards(results2)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
# compare learning curves
="1: Full Observations")
plot_learning(results1, label="2: Partial Observations") plot_learning(results2, label
With only the partial observations, i.e., without observing velocity, the RL agent does not learn a useful policy. The reward does not increase notably over time and the resulting episode reward is much smaller than with full obsevations.
Option 3: Use Sequence of Last Observations
Even if the velocity of cart and pole are not explicitly available in this example, it can be derived by the RL agent by looking at a sequence of previous observations. If the cart is always at the same position, its velocity is likely close to zero. If its position varies greatly, it likely has high velocity.
Hence, one useful approach is to simply stack the last \(n\) observations and providing this sequence as input to the RL agent.
Option 3a: Use Raw Sequence as Input
Here, I consider the same default feed-forward neural network with PPO, just providing the stacked, partial observations as input.
Stacking Observations Using Gym’s FrameStack
Wrapper
To stack the last \(n\) observations, I use Gym’s FrameStack
wrapper. As an example, I choose \(n=4\).
from gym.wrappers import FrameStack
= 4
NUM_FRAMES
# stateless CartPole --> only 2 observations: position of cart & angle of pole (not: velocity of cart or pole)
= StatelessCartPole()
env print(f"Shape of observation space (stateless CartPole): {env.observation_space.shape}")
# stack last n observations into sequence --> n x 2
= FrameStack(env, NUM_FRAMES)
env_stacked print(f"Shape of observation space (stacked stateless CartPole): {env_stacked.observation_space.shape}")
# register env for RLlib
"StackedStatelessCartPole", lambda _: FrameStack(StatelessCartPole(), NUM_FRAMES)) registry.register_env(
Shape of observation space (stateless CartPole): (2,)
Shape of observation space (stacked stateless CartPole): (4, 2)
#collapse-output
# use PPO with vanilla MLP
= ppo.DEFAULT_CONFIG.copy()
config3a "env"] = "StackedStatelessCartPole"
config3a[# train; this takes a while
= ray.tune.run("PPO", config=config3a, stop=stop)
results3a print("Option 3a with FrameStack: Training finished successfully")
Current time: 2021-12-01 23:02:44 (running for 00:00:00.15)
Memory usage on this node: 9.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | PENDING |
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=13456) 2021-12-01 23:03:00,839 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=13456) 2021-12-01 23:03:00,839 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=13456) 2021-12-01 23:03:00,839 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=19996) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\gym\spaces\box.py:142: UserWarning: WARN: Casting input x to numpy array.
(pid=19996) logger.warn("Casting input x to numpy array.")
(pid=3484) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\gym\spaces\box.py:142: UserWarning: WARN: Casting input x to numpy array.
(pid=3484) logger.warn("Casting input x to numpy array.")
(pid=3484) 2021-12-01 23:03:17,245 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=13456) 2021-12-01 23:03:19,489 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=13456) 2021-12-01 23:03:20,834 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=13456) 2021-12-01 23:03:20,834 INFO trainable.py:110 -- Trainable.setup took 19.995 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=13456) 2021-12-01 23:03:20,839 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=13456) 2021-12-01 23:03:26,389 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
2021-12-01 23:04:57,071 INFO tune.py:630 -- Total run time: 132.65 seconds (132.43 seconds for the tuning loop).ayletClient] Failed to disconnect from raylet.
(pid=13456) Windows fatal exception: access violation
(pid=13456)
(pid=19996) [2021-12-01 23:04:56,970 E 19996 3460] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=19996) Windows fatal exception: access violation
(pid=19996)
(pid=3484) [2021-12-01 23:04:56,971 E 3484 17628] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=3484) Windows fatal exception: access violation
(pid=3484)
Current time: 2021-12-01 23:02:49 (running for 00:00:05.15)
Memory usage on this node: 9.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | PENDING |
Current time: 2021-12-01 23:03:20 (running for 00:00:36.40)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 |
Current time: 2021-12-01 23:03:21 (running for 00:00:37.50)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 |
Current time: 2021-12-01 23:03:28 (running for 00:00:43.61)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 |
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_23-03-31
done: false
episode_len_mean: 20.91578947368421
episode_media: {}
episode_reward_max: 61.0
episode_reward_mean: 20.91578947368421
episode_reward_min: 8.0
episodes_this_iter: 190
episodes_total: 190
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6832552552223206
entropy_coeff: 0.0
kl: 0.010132171213626862
model: {}
policy_loss: -0.017918335273861885
total_loss: 126.10237121582031
vf_explained_var: 0.01804439164698124
vf_loss: 126.1182632446289
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 80.80666666666666
ram_util_percent: 85.86666666666669
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11120850849435063
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.18972639138395245
mean_inference_ms: 2.0335766275739453
mean_raw_obs_processing_ms: 0.3751123260983276
time_since_restore: 10.363371133804321
time_this_iter_s: 10.363371133804321
time_total_s: 10.363371133804321
timers:
learn_throughput: 833.897
learn_time_ms: 4796.753
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 719.056
sample_time_ms: 5562.852
update_time_ms: 5.016
timestamp: 1638396211
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_23-03-40
done: false
episode_len_mean: 29.455882352941178
episode_media: {}
episode_reward_max: 136.0
episode_reward_mean: 29.455882352941178
episode_reward_min: 8.0
episodes_this_iter: 136
episodes_total: 326
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6507855653762817
entropy_coeff: 0.0
kl: 0.010778849013149738
model: {}
policy_loss: -0.01948031783103943
total_loss: 162.9302215576172
vf_explained_var: 0.03349286690354347
vf_loss: 162.94754028320312
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 84.03076923076924
ram_util_percent: 85.8076923076923
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10394749777843926
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15974853905888772
mean_inference_ms: 1.9997662326250156
mean_raw_obs_processing_ms: 0.3263675950258694
time_since_restore: 20.015674591064453
time_this_iter_s: 9.652303457260132
time_total_s: 20.015674591064453
timers:
learn_throughput: 849.072
learn_time_ms: 4711.025
load_throughput: 7966389.364
load_time_ms: 0.502
sample_throughput: 518.103
sample_time_ms: 7720.474
update_time_ms: 4.509
timestamp: 1638396220
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_23-03-49
done: false
episode_len_mean: 45.47
episode_media: {}
episode_reward_max: 200.0
episode_reward_mean: 45.47
episode_reward_min: 9.0
episodes_this_iter: 83
episodes_total: 409
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6116749048233032
entropy_coeff: 0.0
kl: 0.0092014130204916
model: {}
policy_loss: -0.017703521996736526
total_loss: 372.5731201171875
vf_explained_var: 0.04724571481347084
vf_loss: 372.5889892578125
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.78333333333333
ram_util_percent: 85.93333333333334
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.09840720456414369
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15510985166002864
mean_inference_ms: 1.9296228234529118
mean_raw_obs_processing_ms: 0.29812654395886573
time_since_restore: 28.900269746780396
time_this_iter_s: 8.884595155715942
time_total_s: 28.900269746780396
timers:
learn_throughput: 868.178
learn_time_ms: 4607.352
load_throughput: 5989010.947
load_time_ms: 0.668
sample_throughput: 488.004
sample_time_ms: 8196.646
update_time_ms: 4.004
timestamp: 1638396229
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_23-04-00
done: false
episode_len_mean: 63.03
episode_media: {}
episode_reward_max: 272.0
episode_reward_mean: 63.03
episode_reward_min: 13.0
episodes_this_iter: 51
episodes_total: 460
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5801236629486084
entropy_coeff: 0.0
kl: 0.006844064686447382
model: {}
policy_loss: -0.009995924308896065
total_loss: 404.9743957519531
vf_explained_var: 0.09591271728277206
vf_loss: 404.9830627441406
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 88.04
ram_util_percent: 85.95333333333335
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.0999835854110982
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15354213307036435
mean_inference_ms: 1.9525573766667057
mean_raw_obs_processing_ms: 0.29008862922916356
time_since_restore: 39.33686113357544
time_this_iter_s: 10.436591386795044
time_total_s: 39.33686113357544
timers:
learn_throughput: 860.917
learn_time_ms: 4646.208
load_throughput: 7985347.93
load_time_ms: 0.501
sample_throughput: 461.014
sample_time_ms: 8676.526
update_time_ms: 3.003
timestamp: 1638396240
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_23-04-09
done: false
episode_len_mean: 83.03
episode_media: {}
episode_reward_max: 272.0
episode_reward_mean: 83.03
episode_reward_min: 10.0
episodes_this_iter: 41
episodes_total: 501
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5570896863937378
entropy_coeff: 0.0
kl: 0.00568711943924427
model: {}
policy_loss: -0.01232148241251707
total_loss: 408.25262451171875
vf_explained_var: 0.11371473968029022
vf_loss: 408.26385498046875
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.44615384615385
ram_util_percent: 85.80769230769229
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10660624354014989
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.149048118877094
mean_inference_ms: 1.9695397741116307
mean_raw_obs_processing_ms: 0.28367179033707196
time_since_restore: 48.41154980659485
time_this_iter_s: 9.07468867301941
time_total_s: 48.41154980659485
timers:
learn_throughput: 873.938
learn_time_ms: 4576.986
load_throughput: 9981684.912
load_time_ms: 0.401
sample_throughput: 451.554
sample_time_ms: 8858.292
update_time_ms: 2.403
timestamp: 1638396249
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_23-04-19
done: false
episode_len_mean: 102.34
episode_media: {}
episode_reward_max: 304.0
episode_reward_mean: 102.34
episode_reward_min: 10.0
episodes_this_iter: 24
episodes_total: 525
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5555164217948914
entropy_coeff: 0.0
kl: 0.004422472789883614
model: {}
policy_loss: -0.008869567885994911
total_loss: 570.493896484375
vf_explained_var: 0.24427081644535065
vf_loss: 570.5018920898438
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 85.06923076923076
ram_util_percent: 85.82307692307693
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10851443555266879
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14933858568901648
mean_inference_ms: 1.9681416365815767
mean_raw_obs_processing_ms: 0.28052371436443274
time_since_restore: 57.97208309173584
time_this_iter_s: 9.560533285140991
time_total_s: 57.97208309173584
timers:
learn_throughput: 878.415
learn_time_ms: 4553.655
load_throughput: 11978021.894
load_time_ms: 0.334
sample_throughput: 446.643
sample_time_ms: 8955.693
update_time_ms: 2.669
timestamp: 1638396259
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_23-04-27
done: false
episode_len_mean: 127.8
episode_media: {}
episode_reward_max: 321.0
episode_reward_mean: 127.8
episode_reward_min: 10.0
episodes_this_iter: 23
episodes_total: 548
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.10000000149011612
cur_lr: 4.999999873689376e-05
entropy: 0.5434969067573547
entropy_coeff: 0.0
kl: 0.008256432600319386
model: {}
policy_loss: -0.0062043326906859875
total_loss: 453.47607421875
vf_explained_var: 0.3077850043773651
vf_loss: 453.4814758300781
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.55833333333334
ram_util_percent: 85.89166666666667
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.110027565885569
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14780355323603098
mean_inference_ms: 1.9526956031732303
mean_raw_obs_processing_ms: 0.27615972972533087
time_since_restore: 66.65037989616394
time_this_iter_s: 8.6782968044281
time_total_s: 66.65037989616394
timers:
learn_throughput: 887.707
learn_time_ms: 4505.99
load_throughput: 13974358.877
load_time_ms: 0.286
sample_throughput: 446.514
sample_time_ms: 8958.288
update_time_ms: 2.859
timestamp: 1638396267
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_23-04-38
done: false
episode_len_mean: 145.51
episode_media: {}
episode_reward_max: 392.0
episode_reward_mean: 145.51
episode_reward_min: 10.0
episodes_this_iter: 26
episodes_total: 574
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.10000000149011612
cur_lr: 4.999999873689376e-05
entropy: 0.5463652014732361
entropy_coeff: 0.0
kl: 0.010875530540943146
model: {}
policy_loss: -0.007964679040014744
total_loss: 391.5842590332031
vf_explained_var: 0.35197633504867554
vf_loss: 391.5911865234375
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 87.83571428571429
ram_util_percent: 85.70000000000002
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11137697557468901
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14710154952868693
mean_inference_ms: 1.9442323273031195
mean_raw_obs_processing_ms: 0.27244073066886854
time_since_restore: 76.93640422821045
time_this_iter_s: 10.286024332046509
time_total_s: 76.93640422821045
timers:
learn_throughput: 875.973
learn_time_ms: 4566.348
load_throughput: 15970695.859
load_time_ms: 0.25
sample_throughput: 442.806
sample_time_ms: 9033.302
update_time_ms: 2.879
timestamp: 1638396278
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_23-04-48
done: false
episode_len_mean: 164.14
episode_media: {}
episode_reward_max: 392.0
episode_reward_mean: 164.14
episode_reward_min: 13.0
episodes_this_iter: 20
episodes_total: 594
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.10000000149011612
cur_lr: 4.999999873689376e-05
entropy: 0.5408887267112732
entropy_coeff: 0.0
kl: 0.011410081759095192
model: {}
policy_loss: -0.013954582624137402
total_loss: 419.84454345703125
vf_explained_var: 0.34064534306526184
vf_loss: 419.8573913574219
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 84.79285714285716
ram_util_percent: 85.70000000000003
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1111877406329047
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14667458354156745
mean_inference_ms: 1.9476374535757386
mean_raw_obs_processing_ms: 0.2706214096819341
time_since_restore: 86.94219470024109
time_this_iter_s: 10.00579047203064
time_total_s: 86.94219470024109
timers:
learn_throughput: 881.266
learn_time_ms: 4538.926
load_throughput: 5140953.457
load_time_ms: 0.778
sample_throughput: 433.901
sample_time_ms: 9218.702
update_time_ms: 2.559
timestamp: 1638396288
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: '69565_00000'
Result for PPO_StackedStatelessCartPole_69565_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_23-04-56
done: true
episode_len_mean: 176.48
episode_media: {}
episode_reward_max: 392.0
episode_reward_mean: 176.48
episode_reward_min: 13.0
episodes_this_iter: 19
episodes_total: 613
experiment_id: dad4489332ba46c8ab9c9ed834879afb
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.10000000149011612
cur_lr: 4.999999873689376e-05
entropy: 0.5288471579551697
entropy_coeff: 0.0
kl: 0.008251729421317577
model: {}
policy_loss: -0.007646023295819759
total_loss: 273.2549743652344
vf_explained_var: 0.5383354425430298
vf_loss: 273.2617492675781
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 80.93333333333332
ram_util_percent: 86.14999999999999
pid: 13456
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11016966943400075
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14657990293613932
mean_inference_ms: 1.948797938913841
mean_raw_obs_processing_ms: 0.2673978140474405
time_since_restore: 95.56249117851257
time_this_iter_s: 8.620296478271484
time_total_s: 95.56249117851257
timers:
learn_throughput: 893.185
learn_time_ms: 4478.354
load_throughput: 5712170.508
load_time_ms: 0.7
sample_throughput: 434.433
sample_time_ms: 9207.403
update_time_ms: 2.303
timestamp: 1638396296
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: '69565_00000'
Option 3a with FrameStack: Training finished successfully
Current time: 2021-12-01 23:03:33 (running for 00:00:48.84)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 1 | 10.3634 | 4000 | 20.9158 | 61 | 8 | 20.9158 |
Current time: 2021-12-01 23:03:38 (running for 00:00:53.93)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 1 | 10.3634 | 4000 | 20.9158 | 61 | 8 | 20.9158 |
Current time: 2021-12-01 23:03:44 (running for 00:00:59.57)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 2 | 20.0157 | 8000 | 29.4559 | 136 | 8 | 29.4559 |
Current time: 2021-12-01 23:03:49 (running for 00:01:04.66)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 2 | 20.0157 | 8000 | 29.4559 | 136 | 8 | 29.4559 |
Current time: 2021-12-01 23:03:55 (running for 00:01:10.55)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 3 | 28.9003 | 12000 | 45.47 | 200 | 9 | 45.47 |
Current time: 2021-12-01 23:04:00 (running for 00:01:15.68)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 3 | 28.9003 | 12000 | 45.47 | 200 | 9 | 45.47 |
Current time: 2021-12-01 23:04:05 (running for 00:01:21.06)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 4 | 39.3369 | 16000 | 63.03 | 272 | 13 | 63.03 |
Current time: 2021-12-01 23:04:11 (running for 00:01:27.06)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 5 | 48.4115 | 20000 | 83.03 | 272 | 10 | 83.03 |
Current time: 2021-12-01 23:04:16 (running for 00:01:32.16)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 5 | 48.4115 | 20000 | 83.03 | 272 | 10 | 83.03 |
Current time: 2021-12-01 23:04:22 (running for 00:01:37.68)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 6 | 57.9721 | 24000 | 102.34 | 304 | 10 | 102.34 |
Current time: 2021-12-01 23:04:27 (running for 00:01:42.89)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 6 | 57.9721 | 24000 | 102.34 | 304 | 10 | 102.34 |
Current time: 2021-12-01 23:04:32 (running for 00:01:48.46)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 7 | 66.6504 | 28000 | 127.8 | 321 | 10 | 127.8 |
Current time: 2021-12-01 23:04:38 (running for 00:01:53.54)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 7 | 66.6504 | 28000 | 127.8 | 321 | 10 | 127.8 |
Current time: 2021-12-01 23:04:43 (running for 00:01:58.76)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 8 | 76.9364 | 32000 | 145.51 | 392 | 10 | 145.51 |
Current time: 2021-12-01 23:04:49 (running for 00:02:04.76)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 9 | 86.9422 | 36000 | 164.14 | 392 | 13 | 164.14 |
Current time: 2021-12-01 23:04:56 (running for 00:02:11.83)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | RUNNING | 127.0.0.1:13456 | 9 | 86.9422 | 36000 | 164.14 | 392 | 13 | 164.14 |
Current time: 2021-12-01 23:04:56 (running for 00:02:12.48)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_69565_00000 | TERMINATED | 127.0.0.1:13456 | 10 | 95.5625 | 40000 | 176.48 | 392 | 13 | 176.48 |
print_reward(results3a)
Reward after 10 training iterations: 176.48
plot_rewards(results3a)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
="1: Full Observations")
plot_learning(results1, label="2: Partial Observations")
plot_learning(results2, label="3a: Stacked, Partial Observations") plot_learning(results3a, label
Simply by stacking the last \(n\) observations, the RL agent learns a useful policy again - even though each observation is still partial, i.e., missing the cart and pole velocity.
As you can see in the learning curves, the agent learns a bit slower than with full observations but still much faster than the agent with only a single partial observation (which does not really learn at all).
Stacking Observations Using RLlib’s Trajectory API
Above, I used Gym’s FrameStack
wrapper to stack the last \(n\) observations inside the environment. Alternatively, the stacking can be implemented on the model side, e.g., using RLlib’s trajectory API, which reduces space complexity for storing the stacked observations but should lead to similar results.
#collapse-output
from ray.rllib.examples.models.trajectory_view_utilizing_models import FrameStackingCartPoleModel
from ray.rllib.models.catalog import ModelCatalog
"stacking_model", FrameStackingCartPoleModel)
ModelCatalog.register_custom_model(
= ppo.DEFAULT_CONFIG.copy()
config3a2 "env"] = "StatelessCartPole"
config3a2["model"] = {
config3a2["custom_model": "stacking_model",
"custom_model_config": {
"num_frames": NUM_FRAMES,
}
}
= ray.tune.run("PPO", config=config3a2, stop=stop)
results3a2 print("Option 3a2 with Trajectory API: Training finished successfully")
Current time: 2021-12-01 23:11:27 (running for 00:00:00.14)
Memory usage on this node: 9.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_a1402_00000 | PENDING |
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=7032) 2021-12-01 23:11:41,672 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=7032) 2021-12-01 23:11:41,672 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=7032) 2021-12-01 23:11:41,672 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=20056) 2021-12-01 23:11:58,655 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=7032) 2021-12-01 23:12:00,488 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=7032) 2021-12-01 23:12:01,689 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=7032) 2021-12-01 23:12:01,689 INFO trainable.py:110 -- Trainable.setup took 20.017 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=7032) 2021-12-01 23:12:01,689 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=7032) 2021-12-01 23:12:08,389 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
(pid=7032) Windows fatal exception: access violation
(pid=7032)
(pid=20056) [2021-12-01 23:13:35,836 C 20056 17856] core_worker.cc:796: Check failed: _s.ok() Bad status: IOError: Unknown error
(pid=20056) *** StackTrace Information ***
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyInit__raylet
(pid=20056) PyNumber_InPlaceLshift
(pid=20056) Py_CheckFunctionResult
(pid=20056) PyEval_EvalFrameDefault
(pid=20056) Py_CheckFunctionResult
(pid=20056) PyEval_EvalFrameDefault
(pid=20056) PyEval_EvalCodeWithName
(pid=20056) PyEval_EvalCodeEx
(pid=20056) PyEval_EvalCode
(pid=20056) PyArena_New
(pid=20056) PyArena_New
(pid=20056) PyRun_FileExFlags
(pid=20056) PyRun_SimpleFileExFlags
(pid=20056) PyRun_AnyFileExFlags
(pid=20056) Py_FatalError
(pid=20056) Py_RunMain
(pid=20056) Py_RunMain
(pid=20056) Py_Main
(pid=20056) BaseThreadInitThunk
(pid=20056) RtlUserThreadStart
(pid=20056)
(pid=20056) Windows fatal exception: access violation
(pid=20056)
(pid=20056) Stack (most recent call first):
(pid=20056) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\worker.py", line 425 in main_loop
(pid=20056) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\workers/default_worker.py", line 218 in <module>
(pid=6936) [2021-12-01 23:13:35,836 C 6936 7600] core_worker.cc:796: Check failed: _s.ok() Bad status: IOError: Unknown error
(pid=6936) *** StackTrace Information ***
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyInit__raylet
(pid=6936) PyNumber_InPlaceLshift
(pid=6936) Py_CheckFunctionResult
(pid=6936) PyEval_EvalFrameDefault
(pid=6936) Py_CheckFunctionResult
(pid=6936) PyEval_EvalFrameDefault
(pid=6936) PyEval_EvalCodeWithName
(pid=6936) PyEval_EvalCodeEx
(pid=6936) PyEval_EvalCode
(pid=6936) PyArena_New
(pid=6936) PyArena_New
(pid=6936) PyRun_FileExFlags
(pid=6936) PyRun_SimpleFileExFlags
(pid=6936) PyRun_AnyFileExFlags
(pid=6936) Py_FatalError
(pid=6936) Py_RunMain
(pid=6936) Py_RunMain
(pid=6936) Py_Main
(pid=6936) BaseThreadInitThunk
(pid=6936) RtlUserThreadStart
(pid=6936)
(pid=6936) Windows fatal exception: access violation
(pid=6936)
(pid=6936) Stack (most recent call first):
(pid=6936) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\worker.py", line 425 in main_loop
(pid=6936) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\workers/default_worker.py", line 218 in <module>
2021-12-01 23:13:35,937 INFO tune.py:630 -- Total run time: 128.19 seconds (127.68 seconds for the tuning loop).
Current time: 2021-12-01 23:11:32 (running for 00:00:05.14)
Memory usage on this node: 9.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_a1402_00000 | PENDING |
Current time: 2021-12-01 23:12:01 (running for 00:00:33.99)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 |
Current time: 2021-12-01 23:12:02 (running for 00:00:35.21)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 |
Current time: 2021-12-01 23:12:08 (running for 00:00:40.28)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 |
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_23-12-12
done: false
episode_len_mean: 22.420454545454547
episode_media: {}
episode_reward_max: 76.0
episode_reward_mean: 22.420454545454547
episode_reward_min: 9.0
episodes_this_iter: 176
episodes_total: 176
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6806640028953552
entropy_coeff: 0.0
kl: 0.013389287516474724
model: {}
policy_loss: -0.021481554955244064
total_loss: 188.75352478027344
vf_explained_var: -0.03809177502989769
vf_loss: 188.77232360839844
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 80.06666666666668
ram_util_percent: 86.22666666666666
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12501936271684463
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.18659985231728216
mean_inference_ms: 2.6111630884934134
mean_raw_obs_processing_ms: 0.3001049366084402
time_since_restore: 10.872305870056152
time_this_iter_s: 10.872305870056152
time_total_s: 10.872305870056152
timers:
learn_throughput: 952.767
learn_time_ms: 4198.298
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 597.657
sample_time_ms: 6692.801
update_time_ms: 0.0
timestamp: 1638396732
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_23-12-21
done: false
episode_len_mean: 27.07482993197279
episode_media: {}
episode_reward_max: 92.0
episode_reward_mean: 27.07482993197279
episode_reward_min: 9.0
episodes_this_iter: 147
episodes_total: 323
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6690337657928467
entropy_coeff: 0.0
kl: 0.006590469740331173
model: {}
policy_loss: -0.005931881722062826
total_loss: 152.8258056640625
vf_explained_var: -0.11891558021306992
vf_loss: 152.83041381835938
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 76.04615384615386
ram_util_percent: 86.37692307692308
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11535685211184185
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.17986845152071707
mean_inference_ms: 2.358616568210475
mean_raw_obs_processing_ms: 0.24868940552548166
time_since_restore: 19.69421625137329
time_this_iter_s: 8.821910381317139
time_total_s: 19.69421625137329
timers:
learn_throughput: 1030.618
learn_time_ms: 3881.168
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 493.345
sample_time_ms: 8107.914
update_time_ms: 0.0
timestamp: 1638396741
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_23-12-29
done: false
episode_len_mean: 31.515625
episode_media: {}
episode_reward_max: 107.0
episode_reward_mean: 31.515625
episode_reward_min: 9.0
episodes_this_iter: 128
episodes_total: 451
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6504582166671753
entropy_coeff: 0.0
kl: 0.01009401399642229
model: {}
policy_loss: -0.014275692403316498
total_loss: 183.70419311523438
vf_explained_var: -0.09810103476047516
vf_loss: 183.71646118164062
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 74.85000000000001
ram_util_percent: 86.45
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11715046978259044
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.16287547160468283
mean_inference_ms: 2.2526623070031446
mean_raw_obs_processing_ms: 0.22798632078370612
time_since_restore: 27.94515609741211
time_this_iter_s: 8.250939846038818
time_total_s: 27.94515609741211
timers:
learn_throughput: 1097.145
learn_time_ms: 3645.825
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 482.347
sample_time_ms: 8292.781
update_time_ms: 1.334
timestamp: 1638396749
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_23-12-39
done: false
episode_len_mean: 41.78
episode_media: {}
episode_reward_max: 114.0
episode_reward_mean: 41.78
episode_reward_min: 10.0
episodes_this_iter: 94
episodes_total: 545
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6398107409477234
entropy_coeff: 0.0
kl: 0.006905578076839447
model: {}
policy_loss: -9.581594349583611e-05
total_loss: 233.55148315429688
vf_explained_var: -0.05927522853016853
vf_loss: 233.55018615722656
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 80.42307692307692
ram_util_percent: 86.43846153846154
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11285530769795944
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15454845909642295
mean_inference_ms: 2.234221561962398
mean_raw_obs_processing_ms: 0.23170078440755937
time_since_restore: 37.353280544281006
time_this_iter_s: 9.408124446868896
time_total_s: 37.353280544281006
timers:
learn_throughput: 1071.336
learn_time_ms: 3733.658
load_throughput: 15993532.888
load_time_ms: 0.25
sample_throughput: 477.101
sample_time_ms: 8383.978
update_time_ms: 2.002
timestamp: 1638396759
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_23-12-48
done: false
episode_len_mean: 45.25
episode_media: {}
episode_reward_max: 112.0
episode_reward_mean: 45.25
episode_reward_min: 11.0
episodes_this_iter: 90
episodes_total: 635
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6046473383903503
entropy_coeff: 0.0
kl: 0.00994083285331726
model: {}
policy_loss: -0.013053015805780888
total_loss: 248.39173889160156
vf_explained_var: -0.07114432007074356
vf_loss: 248.4027862548828
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 82.4
ram_util_percent: 86.02307692307691
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1164652225945224
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1496187880432378
mean_inference_ms: 2.2751065340130405
mean_raw_obs_processing_ms: 0.2258053944148459
time_since_restore: 46.962218284606934
time_this_iter_s: 9.608937740325928
time_total_s: 46.962218284606934
timers:
learn_throughput: 1078.068
learn_time_ms: 3710.34
load_throughput: 19991916.111
load_time_ms: 0.2
sample_throughput: 459.08
sample_time_ms: 8713.071
update_time_ms: 4.935
timestamp: 1638396768
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_23-12-58
done: false
episode_len_mean: 51.02
episode_media: {}
episode_reward_max: 167.0
episode_reward_mean: 51.02
episode_reward_min: 14.0
episodes_this_iter: 74
episodes_total: 709
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5902564525604248
entropy_coeff: 0.0
kl: 0.009673806838691235
model: {}
policy_loss: -0.0029914507176727057
total_loss: 325.0210266113281
vf_explained_var: -0.020246472209692
vf_loss: 325.0221252441406
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 80.94615384615385
ram_util_percent: 85.85384615384616
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12087068704524867
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15143744867276646
mean_inference_ms: 2.305419644492814
mean_raw_obs_processing_ms: 0.22585836913700305
time_since_restore: 56.67875838279724
time_this_iter_s: 9.716540098190308
time_total_s: 56.67875838279724
timers:
learn_throughput: 1089.778
learn_time_ms: 3670.472
load_throughput: 23990299.333
load_time_ms: 0.167
sample_throughput: 448.858
sample_time_ms: 8911.498
update_time_ms: 4.112
timestamp: 1638396778
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_23-13-07
done: false
episode_len_mean: 58.96
episode_media: {}
episode_reward_max: 173.0
episode_reward_mean: 58.96
episode_reward_min: 12.0
episodes_this_iter: 64
episodes_total: 773
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5728155970573425
entropy_coeff: 0.0
kl: 0.008897491730749607
model: {}
policy_loss: -0.011549570597708225
total_loss: 301.4852294921875
vf_explained_var: -0.004416638985276222
vf_loss: 301.4949951171875
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 81.25384615384615
ram_util_percent: 86.0
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12220176410428243
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14921622016148267
mean_inference_ms: 2.304350481426518
mean_raw_obs_processing_ms: 0.22517177176600833
time_since_restore: 65.64418339729309
time_this_iter_s: 8.96542501449585
time_total_s: 65.64418339729309
timers:
learn_throughput: 1099.932
learn_time_ms: 3636.59
load_throughput: 27988682.555
load_time_ms: 0.143
sample_throughput: 448.012
sample_time_ms: 8928.331
update_time_ms: 3.525
timestamp: 1638396787
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_23-13-16
done: false
episode_len_mean: 65.85
episode_media: {}
episode_reward_max: 173.0
episode_reward_mean: 65.85
episode_reward_min: 12.0
episodes_this_iter: 57
episodes_total: 830
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5591020584106445
entropy_coeff: 0.0
kl: 0.010199657641351223
model: {}
policy_loss: -0.0008332685683853924
total_loss: 395.8393859863281
vf_explained_var: 0.02246681973338127
vf_loss: 395.8381652832031
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 74.975
ram_util_percent: 86.25000000000001
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11971547914161862
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1491953792440581
mean_inference_ms: 2.288866379945899
mean_raw_obs_processing_ms: 0.21941612443688768
time_since_restore: 74.37759304046631
time_this_iter_s: 8.733409643173218
time_total_s: 74.37759304046631
timers:
learn_throughput: 1104.214
learn_time_ms: 3622.488
load_throughput: 31987065.777
load_time_ms: 0.125
sample_throughput: 449.229
sample_time_ms: 8904.153
update_time_ms: 3.084
timestamp: 1638396796
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_23-13-26
done: false
episode_len_mean: 75.45
episode_media: {}
episode_reward_max: 294.0
episode_reward_mean: 75.45
episode_reward_min: 15.0
episodes_this_iter: 49
episodes_total: 879
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5561075806617737
entropy_coeff: 0.0
kl: 0.010793409310281277
model: {}
policy_loss: -0.006749303545802832
total_loss: 487.8432922363281
vf_explained_var: 0.037814658135175705
vf_loss: 487.8478698730469
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 87.64285714285714
ram_util_percent: 86.90714285714286
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12074714910315451
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15026008834474422
mean_inference_ms: 2.3119485993450812
mean_raw_obs_processing_ms: 0.21632637276503874
time_since_restore: 84.76655960083008
time_this_iter_s: 10.38896656036377
time_total_s: 84.76655960083008
timers:
learn_throughput: 1111.045
learn_time_ms: 3600.213
load_throughput: 35985448.999
load_time_ms: 0.111
sample_throughput: 440.37
sample_time_ms: 9083.268
update_time_ms: 2.964
timestamp: 1638396806
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: a1402_00000
Result for PPO_StatelessCartPole_a1402_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_23-13-35
done: true
episode_len_mean: 84.4
episode_media: {}
episode_reward_max: 294.0
episode_reward_mean: 84.4
episode_reward_min: 17.0
episodes_this_iter: 40
episodes_total: 919
experiment_id: a99c739e101a4c88ba77c4f9b0d64803
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5557733774185181
entropy_coeff: 0.0
kl: 0.005785451736301184
model: {}
policy_loss: -0.0009503072360530496
total_loss: 503.70172119140625
vf_explained_var: 0.0725829154253006
vf_loss: 503.7015380859375
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.30833333333332
ram_util_percent: 87.125
pid: 7032
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12150490867421336
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14914803016875297
mean_inference_ms: 2.327153258190586
mean_raw_obs_processing_ms: 0.21595014888922642
time_since_restore: 93.20761466026306
time_this_iter_s: 8.441055059432983
time_total_s: 93.20761466026306
timers:
learn_throughput: 1120.545
learn_time_ms: 3569.692
load_throughput: 20046858.645
load_time_ms: 0.2
sample_throughput: 442.479
sample_time_ms: 9039.968
update_time_ms: 2.868
timestamp: 1638396815
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: a1402_00000
Option 3a2 with Trajectory API: Training finished successfully
Current time: 2021-12-01 23:12:13 (running for 00:00:45.92)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 1 | 10.8723 | 4000 | 22.4205 | 76 | 9 | 22.4205 |
Current time: 2021-12-01 23:12:18 (running for 00:00:51.00)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 1 | 10.8723 | 4000 | 22.4205 | 76 | 9 | 22.4205 |
Current time: 2021-12-01 23:12:24 (running for 00:00:56.98)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 2 | 19.6942 | 8000 | 27.0748 | 92 | 9 | 27.0748 |
Current time: 2021-12-01 23:12:29 (running for 00:01:02.06)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 3 | 27.9452 | 12000 | 31.5156 | 107 | 9 | 31.5156 |
Current time: 2021-12-01 23:12:34 (running for 00:01:07.12)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 3 | 27.9452 | 12000 | 31.5156 | 107 | 9 | 31.5156 |
Current time: 2021-12-01 23:12:40 (running for 00:01:12.53)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 4 | 37.3533 | 16000 | 41.78 | 114 | 10 | 41.78 |
Current time: 2021-12-01 23:12:45 (running for 00:01:17.88)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 4 | 37.3533 | 16000 | 41.78 | 114 | 10 | 41.78 |
Current time: 2021-12-01 23:12:51 (running for 00:01:23.24)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 5 | 46.9622 | 20000 | 45.25 | 112 | 11 | 45.25 |
Current time: 2021-12-01 23:12:56 (running for 00:01:28.34)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 5 | 46.9622 | 20000 | 45.25 | 112 | 11 | 45.25 |
Current time: 2021-12-01 23:13:01 (running for 00:01:33.98)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 6 | 56.6788 | 24000 | 51.02 | 167 | 14 | 51.02 |
Current time: 2021-12-01 23:13:06 (running for 00:01:39.08)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 6 | 56.6788 | 24000 | 51.02 | 167 | 14 | 51.02 |
Current time: 2021-12-01 23:13:12 (running for 00:01:45.03)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 7 | 65.6442 | 28000 | 58.96 | 173 | 12 | 58.96 |
Current time: 2021-12-01 23:13:17 (running for 00:01:50.20)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 8 | 74.3776 | 32000 | 65.85 | 173 | 12 | 65.85 |
Current time: 2021-12-01 23:13:23 (running for 00:01:55.26)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 8 | 74.3776 | 32000 | 65.85 | 173 | 12 | 65.85 |
Current time: 2021-12-01 23:13:29 (running for 00:02:01.24)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 9 | 84.7666 | 36000 | 75.45 | 294 | 15 | 75.45 |
Current time: 2021-12-01 23:13:34 (running for 00:02:06.30)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | RUNNING | 127.0.0.1:7032 | 9 | 84.7666 | 36000 | 75.45 | 294 | 15 | 75.45 |
Current time: 2021-12-01 23:13:35 (running for 00:02:07.72)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_a1402_00000 | TERMINATED | 127.0.0.1:7032 | 10 | 93.2076 | 40000 | 84.4 | 294 | 17 | 84.4 |
print_reward(results3a2)
Reward after 10 training iterations: 84.4
plot_rewards(results3a2)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
# stacking observations inside the model works worse?
="3a: Stacked Obs in Env")
plot_learning(results3a, label="3a2: Stacked Obs in Model") plot_learning(results3a2, label
Option 3b: Use an LSTM for Processing the Sequence
Instead of stacking the last \(n\) observations and providing this sequence as input to a regular feed-forward neural network, a recurrent neural network (RNN) can be used, keeping track of a learned state that is passed onwards from observation to observation.
Long short-term memory (LSTM) networks are a variant of RNNs that are good at keeping state for longer durations. To use an LSTM with RLlib, simply set the corresponding flag in the model config:
#collapse-output
= ppo.DEFAULT_CONFIG.copy()
config3b "env"] = "StatelessCartPole"
config3b["model"] = {
config3b["use_lstm": True,
# "max_seq_len": 10,
}
= ray.tune.run("PPO", config=config3b, stop=stop)
results3b print("Option 3b: Training finished successfully")
Current time: 2021-12-01 23:14:43 (running for 00:00:00.14)
Memory usage on this node: 9.6/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | PENDING |
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=14344) 2021-12-01 23:14:58,972 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=14344) 2021-12-01 23:14:58,972 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=14344) 2021-12-01 23:14:58,972 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=8656) 2021-12-01 23:15:15,322 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=14344) 2021-12-01 23:15:20,461 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=14344) 2021-12-01 23:15:23,593 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=14344) 2021-12-01 23:15:23,593 INFO trainable.py:110 -- Trainable.setup took 24.634 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=14344) 2021-12-01 23:15:23,593 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=14344) 2021-12-01 23:15:32,644 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
(pid=14344) Windows fatal exception: access violation
(pid=14344)
2021-12-01 23:26:57,480 INFO tune.py:630 -- Total run time: 733.96 seconds (733.70 seconds for the tuning loop).
Current time: 2021-12-01 23:14:48 (running for 00:00:05.16)
Memory usage on this node: 9.6/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | PENDING |
Current time: 2021-12-01 23:15:23 (running for 00:00:40.06)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:25 (running for 00:00:42.11)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:30 (running for 00:00:47.44)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:36 (running for 00:00:52.62)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:41 (running for 00:00:57.90)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:46 (running for 00:01:03.24)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:52 (running for 00:01:08.55)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:15:57 (running for 00:01:13.93)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:02 (running for 00:01:19.04)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:07 (running for 00:01:24.29)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:12 (running for 00:01:29.41)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:18 (running for 00:01:35.06)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:23 (running for 00:01:40.21)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:29 (running for 00:01:45.76)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:34 (running for 00:01:50.86)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Current time: 2021-12-01 23:16:39 (running for 00:01:56.00)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 |
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_23-16-44
done: false
episode_len_mean: 24.74534161490683
episode_media: {}
episode_reward_max: 73.0
episode_reward_mean: 24.74534161490683
episode_reward_min: 9.0
episodes_this_iter: 161
episodes_total: 161
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6702085137367249
entropy_coeff: 0.0
kl: 0.01620529219508171
model: {}
policy_loss: -0.021323969587683678
total_loss: 147.9649658203125
vf_explained_var: -0.08384507149457932
vf_loss: 147.98304748535156
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 98.09908256880733
ram_util_percent: 87.94036697247705
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.14404039499775298
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15224504509160317
mean_inference_ms: 3.842544658680462
mean_raw_obs_processing_ms: 0.25258780840844064
time_since_restore: 80.78264856338501
time_this_iter_s: 80.78264856338501
time_total_s: 80.78264856338501
timers:
learn_throughput: 55.809
learn_time_ms: 71672.947
load_throughput: 4001243.978
load_time_ms: 1.0
sample_throughput: 442.023
sample_time_ms: 9049.296
update_time_ms: 14.003
timestamp: 1638397004
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_23-17-57
done: false
episode_len_mean: 29.848484848484848
episode_media: {}
episode_reward_max: 85.0
episode_reward_mean: 29.848484848484848
episode_reward_min: 9.0
episodes_this_iter: 132
episodes_total: 293
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6473504304885864
entropy_coeff: 0.0
kl: 0.010533971711993217
model: {}
policy_loss: -0.009972598403692245
total_loss: 134.57276916503906
vf_explained_var: 0.15424844622612
vf_loss: 134.58062744140625
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 98.81443298969072
ram_util_percent: 88.0278350515464
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1454985114758189
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1627394677104148
mean_inference_ms: 3.657513714820443
mean_raw_obs_processing_ms: 0.2482827895962494
time_since_restore: 153.95946764945984
time_this_iter_s: 73.17681908607483
time_total_s: 153.95946764945984
timers:
learn_throughput: 58.598
learn_time_ms: 68261.841
load_throughput: 8002487.956
load_time_ms: 0.5
sample_throughput: 89.708
sample_time_ms: 44589.185
update_time_ms: 10.501
timestamp: 1638397077
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_23-19-11
done: false
episode_len_mean: 32.095238095238095
episode_media: {}
episode_reward_max: 91.0
episode_reward_mean: 32.095238095238095
episode_reward_min: 9.0
episodes_this_iter: 126
episodes_total: 419
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6104030013084412
entropy_coeff: 0.0
kl: 0.01123636681586504
model: {}
policy_loss: -0.0049535431899130344
total_loss: 151.1492919921875
vf_explained_var: 0.1538025140762329
vf_loss: 151.15199279785156
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 97.37070707070708
ram_util_percent: 87.73737373737374
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1454022761596221
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1557348165264045
mean_inference_ms: 3.294093173663449
mean_raw_obs_processing_ms: 0.25605837493978906
time_since_restore: 227.74660396575928
time_this_iter_s: 73.78713631629944
time_total_s: 227.74660396575928
timers:
learn_throughput: 58.855
learn_time_ms: 67963.962
load_throughput: 6009031.519
load_time_ms: 0.666
sample_throughput: 74.781
sample_time_ms: 53489.736
update_time_ms: 8.973
timestamp: 1638397151
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_23-20-22
done: false
episode_len_mean: 42.16
episode_media: {}
episode_reward_max: 97.0
episode_reward_mean: 42.16
episode_reward_min: 12.0
episodes_this_iter: 93
episodes_total: 512
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6158422827720642
entropy_coeff: 0.0
kl: 0.010851425118744373
model: {}
policy_loss: -0.003959023393690586
total_loss: 152.54002380371094
vf_explained_var: 0.17243239283561707
vf_loss: 152.5417938232422
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 99.08333333333333
ram_util_percent: 85.98645833333335
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1462575659080806
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15407968489488727
mean_inference_ms: 3.275941401825844
mean_raw_obs_processing_ms: 0.24603215260913444
time_since_restore: 298.2684841156006
time_this_iter_s: 70.52188014984131
time_total_s: 298.2684841156006
timers:
learn_throughput: 59.973
learn_time_ms: 66696.53
load_throughput: 8012042.025
load_time_ms: 0.499
sample_throughput: 67.917
sample_time_ms: 58895.107
update_time_ms: 7.627
timestamp: 1638397222
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_23-21-45
done: false
episode_len_mean: 33.36974789915966
episode_media: {}
episode_reward_max: 103.0
episode_reward_mean: 33.36974789915966
episode_reward_min: 10.0
episodes_this_iter: 119
episodes_total: 631
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6034429669380188
entropy_coeff: 0.0
kl: 0.012084417045116425
model: {}
policy_loss: -0.012375776655972004
total_loss: 157.4758758544922
vf_explained_var: 0.14680173993110657
vf_loss: 157.48585510253906
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 98.08396226415095
ram_util_percent: 85.38396226415095
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.14943092584832965
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1572909528662859
mean_inference_ms: 3.248902887560971
mean_raw_obs_processing_ms: 0.24150111904383836
time_since_restore: 381.94007539749146
time_this_iter_s: 83.67159128189087
time_total_s: 381.94007539749146
timers:
learn_throughput: 58.34
learn_time_ms: 68563.84
load_throughput: 10015052.531
load_time_ms: 0.399
sample_throughput: 65.33
sample_time_ms: 61227.393
update_time_ms: 8.119
timestamp: 1638397305
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_23-22-56
done: false
episode_len_mean: 39.95
episode_media: {}
episode_reward_max: 100.0
episode_reward_mean: 39.95
episode_reward_min: 9.0
episodes_this_iter: 100
episodes_total: 731
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6064294576644897
entropy_coeff: 0.0
kl: 0.006135161034762859
model: {}
policy_loss: -0.0012852392392233014
total_loss: 144.01748657226562
vf_explained_var: 0.2577447295188904
vf_loss: 144.01754760742188
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 97.5808510638298
ram_util_percent: 86.03617021276594
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.14752250682829854
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15924224077925142
mean_inference_ms: 3.1588659969113992
mean_raw_obs_processing_ms: 0.23714881828949555
time_since_restore: 452.454083442688
time_this_iter_s: 70.51400804519653
time_total_s: 452.454083442688
timers:
learn_throughput: 59.002
learn_time_ms: 67794.396
load_throughput: 12018063.037
load_time_ms: 0.333
sample_throughput: 61.729
sample_time_ms: 64799.66
update_time_ms: 7.201
timestamp: 1638397376
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_23-23-57
done: false
episode_len_mean: 34.38461538461539
episode_media: {}
episode_reward_max: 93.0
episode_reward_mean: 34.38461538461539
episode_reward_min: 9.0
episodes_this_iter: 117
episodes_total: 848
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.602990984916687
entropy_coeff: 0.0
kl: 0.005702142603695393
model: {}
policy_loss: -0.0026899229269474745
total_loss: 126.622802734375
vf_explained_var: 0.28406739234924316
vf_loss: 126.62434387207031
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 97.09285714285713
ram_util_percent: 85.65119047619051
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.14141074845480212
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1582564423010624
mean_inference_ms: 3.1048115340137876
mean_raw_obs_processing_ms: 0.23267272113813148
time_since_restore: 513.63032746315
time_this_iter_s: 61.176244020462036
time_total_s: 513.63032746315
timers:
learn_throughput: 60.697
learn_time_ms: 65900.842
load_throughput: 14021073.543
load_time_ms: 0.285
sample_throughput: 60.942
sample_time_ms: 65636.225
update_time_ms: 6.744
timestamp: 1638397437
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_23-24-57
done: false
episode_len_mean: 36.44954128440367
episode_media: {}
episode_reward_max: 77.0
episode_reward_mean: 36.44954128440367
episode_reward_min: 11.0
episodes_this_iter: 109
episodes_total: 957
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5927915573120117
entropy_coeff: 0.0
kl: 0.012266391888260841
model: {}
policy_loss: -0.0021681918296962976
total_loss: 94.81949615478516
vf_explained_var: 0.31072309613227844
vf_loss: 94.81922149658203
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 95.18048780487804
ram_util_percent: 85.51341463414633
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.13663044317764386
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.15200395813252837
mean_inference_ms: 3.023815960844668
mean_raw_obs_processing_ms: 0.22803359086359726
time_since_restore: 573.1020631790161
time_this_iter_s: 59.47173571586609
time_total_s: 573.1020631790161
timers:
learn_throughput: 62.141
learn_time_ms: 64369.633
load_throughput: 16024084.05
load_time_ms: 0.25
sample_throughput: 61.555
sample_time_ms: 64982.899
update_time_ms: 5.901
timestamp: 1638397497
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_23-25-57
done: false
episode_len_mean: 34.80701754385965
episode_media: {}
episode_reward_max: 93.0
episode_reward_mean: 34.80701754385965
episode_reward_min: 10.0
episodes_this_iter: 114
episodes_total: 1071
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5793452262878418
entropy_coeff: 0.0
kl: 0.009524409659206867
model: {}
policy_loss: 0.00497779343277216
total_loss: 101.02494812011719
vf_explained_var: 0.31699204444885254
vf_loss: 101.01805877685547
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 95.77349397590359
ram_util_percent: 85.25903614457832
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.13135990601240866
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14946784585673123
mean_inference_ms: 2.9620819680365273
mean_raw_obs_processing_ms: 0.22295928348485725
time_since_restore: 633.5450580120087
time_this_iter_s: 60.442994832992554
time_total_s: 633.5450580120087
timers:
learn_throughput: 63.214
learn_time_ms: 63277.532
load_throughput: 18027094.556
load_time_ms: 0.222
sample_throughput: 62.131
sample_time_ms: 64380.478
update_time_ms: 6.023
timestamp: 1638397557
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: 15f17_00000
Result for PPO_StatelessCartPole_15f17_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_23-26-57
done: true
episode_len_mean: 41.01
episode_media: {}
episode_reward_max: 91.0
episode_reward_mean: 41.01
episode_reward_min: 9.0
episodes_this_iter: 97
episodes_total: 1168
experiment_id: e1f0e5c8896845ff90ba05d38befaef8
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5690697431564331
entropy_coeff: 0.0
kl: 0.005614957306534052
model: {}
policy_loss: 0.0013457380700856447
total_loss: 97.90939331054688
vf_explained_var: 0.3871138393878937
vf_loss: 97.90692138671875
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 95.57160493827159
ram_util_percent: 85.20246913580247
pid: 14344
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1271907494503554
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1465406708441855
mean_inference_ms: 2.916534058088682
mean_raw_obs_processing_ms: 0.21554435209484965
time_since_restore: 693.0942261219025
time_this_iter_s: 59.5491681098938
time_total_s: 693.0942261219025
timers:
learn_throughput: 64.181
learn_time_ms: 62324.001
load_throughput: 20030105.062
load_time_ms: 0.2
sample_throughput: 62.515
sample_time_ms: 63984.333
update_time_ms: 6.121
timestamp: 1638397617
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: 15f17_00000
Option 3b: Training finished successfully
Current time: 2021-12-01 23:16:45 (running for 00:02:01.96)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:16:50 (running for 00:02:07.22)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:16:55 (running for 00:02:12.37)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:01 (running for 00:02:17.58)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:06 (running for 00:02:22.68)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:11 (running for 00:02:28.07)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:16 (running for 00:02:33.22)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:22 (running for 00:02:38.61)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:27 (running for 00:02:43.93)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:32 (running for 00:02:49.19)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:37 (running for 00:02:54.33)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:43 (running for 00:02:59.68)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:48 (running for 00:03:04.83)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:53 (running for 00:03:10.02)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 1 | 80.7826 | 4000 | 24.7453 | 73 | 9 | 24.7453 |
Current time: 2021-12-01 23:17:58 (running for 00:03:15.16)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:03 (running for 00:03:20.24)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:08 (running for 00:03:25.35)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:14 (running for 00:03:30.69)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:19 (running for 00:03:35.82)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:24 (running for 00:03:41.03)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:29 (running for 00:03:46.16)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:34 (running for 00:03:51.40)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:40 (running for 00:03:56.51)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:45 (running for 00:04:01.86)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:50 (running for 00:04:07.05)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:18:55 (running for 00:04:12.42)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:19:01 (running for 00:04:17.63)
Memory usage on this node: 10.6/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:19:06 (running for 00:04:23.08)
Memory usage on this node: 10.7/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 2 | 153.959 | 8000 | 29.8485 | 85 | 9 | 29.8485 |
Current time: 2021-12-01 23:19:12 (running for 00:04:29.01)
Memory usage on this node: 10.6/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:17 (running for 00:04:34.08)
Memory usage on this node: 10.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:22 (running for 00:04:39.15)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:28 (running for 00:04:44.49)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:33 (running for 00:04:49.61)
Memory usage on this node: 10.4/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:38 (running for 00:04:54.90)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:43 (running for 00:04:60.00)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:48 (running for 00:05:05.26)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:53 (running for 00:05:10.39)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:19:59 (running for 00:05:15.79)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:20:04 (running for 00:05:20.90)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:20:09 (running for 00:05:26.05)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:20:14 (running for 00:05:31.20)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:20:19 (running for 00:05:36.44)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 3 | 227.747 | 12000 | 32.0952 | 91 | 9 | 32.0952 |
Current time: 2021-12-01 23:20:25 (running for 00:05:41.62)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:20:30 (running for 00:05:46.71)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:20:35 (running for 00:05:51.93)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:20:41 (running for 00:05:57.61)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:20:46 (running for 00:06:02.79)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:20:51 (running for 00:06:08.06)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:20:56 (running for 00:06:13.36)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:02 (running for 00:06:18.71)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:07 (running for 00:06:23.95)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:13 (running for 00:06:29.72)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:18 (running for 00:06:35.43)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:24 (running for 00:06:41.36)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:30 (running for 00:06:47.02)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:36 (running for 00:06:52.91)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:41 (running for 00:06:58.10)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 4 | 298.268 | 16000 | 42.16 | 97 | 12 | 42.16 |
Current time: 2021-12-01 23:21:46 (running for 00:07:03.31)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:21:51 (running for 00:07:08.39)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:21:57 (running for 00:07:13.54)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:02 (running for 00:07:18.90)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:07 (running for 00:07:24.11)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:13 (running for 00:07:29.46)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:18 (running for 00:07:34.60)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:23 (running for 00:07:39.79)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:28 (running for 00:07:44.86)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:33 (running for 00:07:50.17)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:38 (running for 00:07:55.29)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:44 (running for 00:08:00.65)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:49 (running for 00:08:05.79)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:22:54 (running for 00:08:11.30)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 5 | 381.94 | 20000 | 33.3697 | 103 | 10 | 33.3697 |
Current time: 2021-12-01 23:23:00 (running for 00:08:16.92)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:05 (running for 00:08:22.07)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:10 (running for 00:08:27.19)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:16 (running for 00:08:32.58)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:21 (running for 00:08:37.69)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:26 (running for 00:08:42.86)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:31 (running for 00:08:47.97)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:36 (running for 00:08:53.14)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:41 (running for 00:08:58.26)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:47 (running for 00:09:03.47)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:52 (running for 00:09:08.58)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:23:57 (running for 00:09:13.72)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 6 | 452.454 | 24000 | 39.95 | 100 | 9 | 39.95 |
Current time: 2021-12-01 23:24:02 (running for 00:09:19.13)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:07 (running for 00:09:24.41)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:13 (running for 00:09:29.49)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:18 (running for 00:09:34.71)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:23 (running for 00:09:39.84)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:28 (running for 00:09:45.06)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:33 (running for 00:09:50.14)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:38 (running for 00:09:55.34)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:43 (running for 00:10:00.44)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:49 (running for 00:10:05.64)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:24:54 (running for 00:10:10.74)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 7 | 513.63 | 28000 | 34.3846 | 93 | 9 | 34.3846 |
Current time: 2021-12-01 23:25:00 (running for 00:10:16.66)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:05 (running for 00:10:21.77)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:10 (running for 00:10:26.94)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:15 (running for 00:10:32.14)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:20 (running for 00:10:37.34)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:26 (running for 00:10:42.45)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:31 (running for 00:10:47.60)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:36 (running for 00:10:52.70)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:41 (running for 00:10:57.88)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:46 (running for 00:11:03.02)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:51 (running for 00:11:08.22)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:25:56 (running for 00:11:13.34)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 8 | 573.102 | 32000 | 36.4495 | 77 | 11 | 36.4495 |
Current time: 2021-12-01 23:26:02 (running for 00:11:19.16)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:07 (running for 00:11:24.27)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:13 (running for 00:11:29.46)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:18 (running for 00:11:34.59)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:23 (running for 00:11:39.78)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:28 (running for 00:11:44.93)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:33 (running for 00:11:50.05)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:38 (running for 00:11:55.15)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:43 (running for 00:12:00.31)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:48 (running for 00:12:05.38)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:54 (running for 00:12:10.58)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | RUNNING | 127.0.0.1:14344 | 9 | 633.545 | 36000 | 34.807 | 93 | 10 | 34.807 |
Current time: 2021-12-01 23:26:57 (running for 00:12:13.80)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_15f17_00000 | TERMINATED | 127.0.0.1:14344 | 10 | 693.094 | 40000 | 41.01 | 91 | 9 | 41.01 |
print_reward(results3b)
Reward after 10 training iterations: 41.01
plot_rewards(results3b)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
="1: Full Observations")
plot_learning(results1, label="2: Partial Observations")
plot_learning(results2, label="3a: Stacked, Partial Observations")
plot_learning(results3a, label="3b: LSTM") plot_learning(results3b, label
LSTM with Stacked Observations
Using the StackedStatelessCartPole
from above.
#collapse-output
= ppo.DEFAULT_CONFIG.copy()
config3b2 "env"] = "StackedStatelessCartPole"
config3b2["model"] = {
config3b2["use_lstm": True,
}
= ray.tune.run("PPO", config=config3b2, stop=stop)
results3b2 print("Option 3b2: Training finished successfully")
Current time: 2021-12-01 23:29:25 (running for 00:00:00.15)
Memory usage on this node: 9.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | PENDING |
Current time: 2021-12-01 23:29:30 (running for 00:00:05.16)
Memory usage on this node: 9.5/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | PENDING |
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=10736) 2021-12-01 23:29:41,957 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=10736) 2021-12-01 23:29:41,957 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=10736) 2021-12-01 23:29:41,958 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=11688) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\gym\spaces\box.py:142: UserWarning: WARN: Casting input x to numpy array.
(pid=11688) logger.warn("Casting input x to numpy array.")
(pid=19560) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\gym\spaces\box.py:142: UserWarning: WARN: Casting input x to numpy array.
(pid=19560) logger.warn("Casting input x to numpy array.")
(pid=19560) 2021-12-01 23:29:54,372 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=10736) 2021-12-01 23:29:57,640 WARNING deprecation.py:38 -- DeprecationWarning: `SampleBatch['is_training']` has been deprecated. Use `SampleBatch.is_training` instead. This will raise an error in the future!
(pid=10736) 2021-12-01 23:29:59,755 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=10736) 2021-12-01 23:29:59,755 INFO trainable.py:110 -- Trainable.setup took 17.805 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=10736) 2021-12-01 23:29:59,755 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=10736) 2021-12-01 23:30:05,789 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
(pid=10736) [2021-12-01 23:39:23,601 E 10736 16912] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=10736) Windows fatal exception: access violation
(pid=10736)
(pid=19560) [2021-12-01 23:39:23,630 C 19560 20340] core_worker.cc:796: Check failed: _s.ok() Bad status: IOError: Unknown error
(pid=19560) *** StackTrace Information ***
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyInit__raylet
(pid=19560) PyNumber_InPlaceLshift
(pid=19560) Py_CheckFunctionResult
(pid=19560) PyEval_EvalFrameDefault
(pid=19560) Py_CheckFunctionResult
(pid=19560) PyEval_EvalFrameDefault
(pid=19560) PyEval_EvalCodeWithName
(pid=19560) PyEval_EvalCodeEx
(pid=19560) PyEval_EvalCode
(pid=19560) PyArena_New
(pid=19560) PyArena_New
(pid=19560) PyRun_FileExFlags
(pid=19560) PyRun_SimpleFileExFlags
(pid=19560) PyRun_AnyFileExFlags
(pid=19560) Py_FatalError
(pid=19560) Py_RunMain
(pid=19560) Py_RunMain
(pid=19560) Py_Main
(pid=19560) BaseThreadInitThunk
(pid=19560) RtlUserThreadStart
(pid=19560)
(pid=11688) [2021-12-01 23:39:23,637 C 11688 12384] core_worker.cc:796: Check failed: _s.ok() Bad status: IOError: Unknown error
(pid=11688) *** StackTrace Information ***
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyInit__raylet
(pid=11688) PyNumber_InPlaceLshift
(pid=11688) Py_CheckFunctionResult
(pid=11688) PyEval_EvalFrameDefault
(pid=11688) Py_CheckFunctionResult
(pid=11688) PyEval_EvalFrameDefault
(pid=11688) PyEval_EvalCodeWithName
(pid=11688) PyEval_EvalCodeEx
(pid=11688) PyEval_EvalCode
(pid=11688) PyArena_New
(pid=11688) PyArena_New
(pid=11688) PyRun_FileExFlags
(pid=11688) PyRun_SimpleFileExFlags
(pid=11688) PyRun_AnyFileExFlags
(pid=11688) Py_FatalError
(pid=11688) Py_RunMain
(pid=11688) Py_RunMain
(pid=11688) Py_Main
(pid=11688) BaseThreadInitThunk
(pid=11688) RtlUserThreadStart
(pid=11688)
(pid=19560) Windows fatal exception: access violation
(pid=19560)
(pid=19560) Stack (most recent call first):
(pid=19560) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\worker.py", line 425 in main_loop
(pid=19560) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\workers/default_worker.py", line 218 in <module>
(pid=11688) Windows fatal exception: access violation
(pid=11688)
(pid=11688) Stack (most recent call first):
(pid=11688) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\worker.py", line 425 in main_loop
(pid=11688) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\workers/default_worker.py", line 218 in <module>
2021-12-01 23:39:23,731 INFO tune.py:630 -- Total run time: 598.23 seconds (597.55 seconds for the tuning loop).
Current time: 2021-12-01 23:29:59 (running for 00:00:34.23)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:00 (running for 00:00:35.26)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:05 (running for 00:00:40.36)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:11 (running for 00:00:45.58)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:16 (running for 00:00:50.71)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:21 (running for 00:00:55.84)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:26 (running for 00:01:00.96)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:31 (running for 00:01:06.11)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:36 (running for 00:01:11.19)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:41 (running for 00:01:16.30)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:46 (running for 00:01:21.41)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Current time: 2021-12-01 23:30:52 (running for 00:01:26.58)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 |
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_23-30-56
done: false
episode_len_mean: 23.75
episode_media: {}
episode_reward_max: 76.0
episode_reward_mean: 23.75
episode_reward_min: 8.0
episodes_this_iter: 168
episodes_total: 168
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.67047119140625
entropy_coeff: 0.0
kl: 0.01581866294145584
model: {}
policy_loss: -0.01895357482135296
total_loss: 154.9998321533203
vf_explained_var: -0.10604370385408401
vf_loss: 155.015625
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 92.95384615384614
ram_util_percent: 86.28846153846153
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12315661179645192
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.14159590390983534
mean_inference_ms: 2.4392604373977185
mean_raw_obs_processing_ms: 0.21398437689550173
time_since_restore: 56.266863107681274
time_this_iter_s: 56.266863107681274
time_total_s: 56.266863107681274
timers:
learn_throughput: 79.628
learn_time_ms: 50233.338
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 662.919
sample_time_ms: 6033.918
update_time_ms: 0.0
timestamp: 1638397856
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_23-31-51
done: false
episode_len_mean: 26.83783783783784
episode_media: {}
episode_reward_max: 99.0
episode_reward_mean: 26.83783783783784
episode_reward_min: 9.0
episodes_this_iter: 148
episodes_total: 316
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6504567265510559
entropy_coeff: 0.0
kl: 0.008978299796581268
model: {}
policy_loss: -0.003679021494463086
total_loss: 127.99153137207031
vf_explained_var: 0.08064287155866623
vf_loss: 127.99342346191406
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 93.6051948051948
ram_util_percent: 85.9285714285714
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11010479833490981
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1344388731949505
mean_inference_ms: 2.3811965957541155
mean_raw_obs_processing_ms: 0.23579910601161452
time_since_restore: 111.88376545906067
time_this_iter_s: 55.616902351379395
time_total_s: 111.88376545906067
timers:
learn_throughput: 79.946
learn_time_ms: 50033.46
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 128.91
sample_time_ms: 31029.425
update_time_ms: 7.819
timestamp: 1638397911
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_23-32-46
done: false
episode_len_mean: 31.140625
episode_media: {}
episode_reward_max: 84.0
episode_reward_mean: 31.140625
episode_reward_min: 9.0
episodes_this_iter: 128
episodes_total: 444
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6188111901283264
entropy_coeff: 0.0
kl: 0.01211484707891941
model: {}
policy_loss: -0.0035639703273773193
total_loss: 120.47854614257812
vf_explained_var: 0.1551436185836792
vf_loss: 120.47969055175781
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 93.38815789473684
ram_util_percent: 85.94868421052632
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11705200286720863
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1417983632483218
mean_inference_ms: 2.3423456092602697
mean_raw_obs_processing_ms: 0.24562951880700387
time_since_restore: 167.125750541687
time_this_iter_s: 55.24198508262634
time_total_s: 167.125750541687
timers:
learn_throughput: 80.241
learn_time_ms: 49850.087
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 101.955
sample_time_ms: 39232.888
update_time_ms: 6.891
timestamp: 1638397966
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_23-33-42
done: false
episode_len_mean: 30.353383458646615
episode_media: {}
episode_reward_max: 90.0
episode_reward_mean: 30.353383458646615
episode_reward_min: 10.0
episodes_this_iter: 133
episodes_total: 577
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6017324328422546
entropy_coeff: 0.0
kl: 0.01713641546666622
model: {}
policy_loss: -0.01428857073187828
total_loss: 144.1490936279297
vf_explained_var: 0.12213249504566193
vf_loss: 144.15994262695312
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 94.54285714285712
ram_util_percent: 85.35714285714288
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11847087716647563
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1291269029294712
mean_inference_ms: 2.3499539409316808
mean_raw_obs_processing_ms: 0.24600572504675797
time_since_restore: 222.71135187149048
time_this_iter_s: 55.58560132980347
time_total_s: 222.71135187149048
timers:
learn_throughput: 80.284
learn_time_ms: 49823.259
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 92.442
sample_time_ms: 43270.364
update_time_ms: 6.418
timestamp: 1638398022
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_23-34-39
done: false
episode_len_mean: 32.04032258064516
episode_media: {}
episode_reward_max: 95.0
episode_reward_mean: 32.04032258064516
episode_reward_min: 9.0
episodes_this_iter: 124
episodes_total: 701
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5604901909828186
entropy_coeff: 0.0
kl: 0.009820478968322277
model: {}
policy_loss: -0.0032165604643523693
total_loss: 100.9627914428711
vf_explained_var: 0.21908660233020782
vf_loss: 100.96404266357422
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 95.19487179487179
ram_util_percent: 85.38333333333334
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11550202333653888
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12803896103049967
mean_inference_ms: 2.3503844656975446
mean_raw_obs_processing_ms: 0.24389184124832392
time_since_restore: 279.80817222595215
time_this_iter_s: 57.09682035446167
time_total_s: 279.80817222595215
timers:
learn_throughput: 79.803
learn_time_ms: 50123.572
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 87.481
sample_time_ms: 45724.201
update_time_ms: 6.735
timestamp: 1638398079
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_23-35-37
done: false
episode_len_mean: 34.64655172413793
episode_media: {}
episode_reward_max: 73.0
episode_reward_mean: 34.64655172413793
episode_reward_min: 9.0
episodes_this_iter: 116
episodes_total: 817
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5657119154930115
entropy_coeff: 0.0
kl: 0.013083796948194504
model: {}
policy_loss: -0.002956786658614874
total_loss: 113.45106506347656
vf_explained_var: 0.19632089138031006
vf_loss: 113.45140075683594
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 96.2126582278481
ram_util_percent: 85.9253164556962
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11478686430048778
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.1277210771833874
mean_inference_ms: 2.3750658085342033
mean_raw_obs_processing_ms: 0.2452229849547805
time_since_restore: 337.58967638015747
time_this_iter_s: 57.78150415420532
time_total_s: 337.58967638015747
timers:
learn_throughput: 79.396
learn_time_ms: 50380.468
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 83.877
sample_time_ms: 47688.802
update_time_ms: 6.779
timestamp: 1638398137
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_23-36-34
done: false
episode_len_mean: 33.652542372881356
episode_media: {}
episode_reward_max: 80.0
episode_reward_mean: 33.652542372881356
episode_reward_min: 10.0
episodes_this_iter: 118
episodes_total: 935
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.55117267370224
entropy_coeff: 0.0
kl: 0.007611136883497238
model: {}
policy_loss: 0.007003166247159243
total_loss: 101.61392211914062
vf_explained_var: 0.2326377034187317
vf_loss: 101.60539245605469
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 97.20759493670884
ram_util_percent: 83.11772151898732
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1171131524330598
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.13283579428360615
mean_inference_ms: 2.437068269845546
mean_raw_obs_processing_ms: 0.24942217294799132
time_since_restore: 394.7531487941742
time_this_iter_s: 57.163472414016724
time_total_s: 394.7531487941742
timers:
learn_throughput: 79.423
learn_time_ms: 50363.0
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 81.207
sample_time_ms: 49256.55
update_time_ms: 5.81
timestamp: 1638398194
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_23-37-31
done: false
episode_len_mean: 34.30769230769231
episode_media: {}
episode_reward_max: 83.0
episode_reward_mean: 34.30769230769231
episode_reward_min: 9.0
episodes_this_iter: 117
episodes_total: 1052
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5451725721359253
entropy_coeff: 0.0
kl: 0.009232791140675545
model: {}
policy_loss: -0.004543236922472715
total_loss: 98.80670928955078
vf_explained_var: 0.2929261326789856
vf_loss: 98.80941772460938
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 95.07692307692308
ram_util_percent: 83.00512820512823
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11435559950687862
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12772642169718973
mean_inference_ms: 2.4294528410907237
mean_raw_obs_processing_ms: 0.2475755621430507
time_since_restore: 450.92096877098083
time_this_iter_s: 56.16781997680664
time_total_s: 450.92096877098083
timers:
learn_throughput: 79.407
learn_time_ms: 50373.586
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 79.838
sample_time_ms: 50101.286
update_time_ms: 5.584
timestamp: 1638398251
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_23-38-27
done: false
episode_len_mean: 39.45544554455446
episode_media: {}
episode_reward_max: 88.0
episode_reward_mean: 39.45544554455446
episode_reward_min: 10.0
episodes_this_iter: 101
episodes_total: 1153
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5534942746162415
entropy_coeff: 0.0
kl: 0.008644542656838894
model: {}
policy_loss: 0.000738372968044132
total_loss: 82.19181823730469
vf_explained_var: 0.3500000238418579
vf_loss: 82.18933868408203
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 95.58076923076923
ram_util_percent: 82.91666666666666
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1135820346886012
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12853882753117699
mean_inference_ms: 2.4140773796778965
mean_raw_obs_processing_ms: 0.24722175705109586
time_since_restore: 507.0870122909546
time_this_iter_s: 56.166043519973755
time_total_s: 507.0870122909546
timers:
learn_throughput: 79.385
learn_time_ms: 50387.098
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 78.782
sample_time_ms: 50772.981
update_time_ms: 6.076
timestamp: 1638398307
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: 23a62_00000
Result for PPO_StackedStatelessCartPole_23a62_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_23-39-23
done: true
episode_len_mean: 36.67272727272727
episode_media: {}
episode_reward_max: 80.0
episode_reward_mean: 36.67272727272727
episode_reward_min: 11.0
episodes_this_iter: 110
episodes_total: 1263
experiment_id: a83f6e57239f4aa1a70a247399bd5e70
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.5353969931602478
entropy_coeff: 0.0
kl: 0.007495984435081482
model: {}
policy_loss: -0.003161693923175335
total_loss: 78.63844299316406
vf_explained_var: 0.3598953187465668
vf_loss: 78.64009857177734
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 94.4233766233766
ram_util_percent: 82.21298701298701
pid: 10736
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11409118320273612
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.13058247747347188
mean_inference_ms: 2.398278065616303
mean_raw_obs_processing_ms: 0.2469977443652814
time_since_restore: 562.8484704494476
time_this_iter_s: 55.76145815849304
time_total_s: 562.8484704494476
timers:
learn_throughput: 79.43
learn_time_ms: 50358.561
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 77.946
sample_time_ms: 51317.629
update_time_ms: 5.469
timestamp: 1638398363
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: 23a62_00000
Option 3b2: Training finished successfully
Current time: 2021-12-01 23:30:59 (running for 00:01:33.56)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:04 (running for 00:01:38.73)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:09 (running for 00:01:43.86)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:14 (running for 00:01:49.03)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:19 (running for 00:01:54.11)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:24 (running for 00:01:59.28)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:29 (running for 00:02:04.40)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:35 (running for 00:02:09.57)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:40 (running for 00:02:14.67)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:45 (running for 00:02:19.82)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:50 (running for 00:02:24.91)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 1 | 56.2669 | 4000 | 23.75 | 76 | 8 | 23.75 |
Current time: 2021-12-01 23:31:55 (running for 00:02:30.26)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:00 (running for 00:02:35.31)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:06 (running for 00:02:40.49)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:11 (running for 00:02:45.57)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:16 (running for 00:02:50.74)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:23 (running for 00:02:57.84)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:28 (running for 00:03:03.01)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:33 (running for 00:03:08.13)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:38 (running for 00:03:13.31)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:43 (running for 00:03:18.38)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 2 | 111.884 | 8000 | 26.8378 | 99 | 9 | 26.8378 |
Current time: 2021-12-01 23:32:49 (running for 00:03:23.49)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:32:54 (running for 00:03:28.56)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:32:59 (running for 00:03:33.85)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:04 (running for 00:03:38.93)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:09 (running for 00:03:44.06)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:14 (running for 00:03:49.19)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:19 (running for 00:03:54.41)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:25 (running for 00:03:59.48)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:30 (running for 00:04:04.68)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:35 (running for 00:04:09.80)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:40 (running for 00:04:14.94)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 3 | 167.126 | 12000 | 31.1406 | 84 | 9 | 31.1406 |
Current time: 2021-12-01 23:33:45 (running for 00:04:20.12)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:33:50 (running for 00:04:25.23)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:33:55 (running for 00:04:30.30)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:00 (running for 00:04:35.44)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:06 (running for 00:04:40.56)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:11 (running for 00:04:45.73)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:16 (running for 00:04:50.88)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:21 (running for 00:04:56.00)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:26 (running for 00:05:01.12)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:31 (running for 00:05:06.30)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:36 (running for 00:05:11.38)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 4 | 222.711 | 16000 | 30.3534 | 90 | 10 | 30.3534 |
Current time: 2021-12-01 23:34:42 (running for 00:05:17.29)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:34:47 (running for 00:05:22.33)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:34:53 (running for 00:05:27.47)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:34:58 (running for 00:05:32.56)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:04 (running for 00:05:38.75)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:09 (running for 00:05:43.83)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:14 (running for 00:05:49.08)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:19 (running for 00:05:54.16)
Memory usage on this node: 10.3/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:24 (running for 00:05:59.34)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:30 (running for 00:06:04.49)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:35 (running for 00:06:09.71)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 5 | 279.808 | 20000 | 32.0403 | 95 | 9 | 32.0403 |
Current time: 2021-12-01 23:35:40 (running for 00:06:15.17)
Memory usage on this node: 10.2/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:35:45 (running for 00:06:20.23)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:35:50 (running for 00:06:25.36)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:35:56 (running for 00:06:30.51)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:01 (running for 00:06:35.67)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:06 (running for 00:06:40.82)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:11 (running for 00:06:45.91)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:16 (running for 00:06:51.04)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:21 (running for 00:06:56.15)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:26 (running for 00:07:01.36)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:33 (running for 00:07:07.50)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 6 | 337.59 | 24000 | 34.6466 | 73 | 9 | 34.6466 |
Current time: 2021-12-01 23:36:38 (running for 00:07:13.36)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:36:44 (running for 00:07:18.46)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:36:49 (running for 00:07:23.58)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:36:54 (running for 00:07:28.73)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:36:59 (running for 00:07:33.83)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:04 (running for 00:07:39.02)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:09 (running for 00:07:44.13)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:14 (running for 00:07:49.33)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:19 (running for 00:07:54.45)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:25 (running for 00:07:59.62)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:30 (running for 00:08:04.69)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 7 | 394.753 | 28000 | 33.6525 | 80 | 10 | 33.6525 |
Current time: 2021-12-01 23:37:36 (running for 00:08:10.57)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:37:41 (running for 00:08:15.68)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:37:46 (running for 00:08:20.81)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:37:51 (running for 00:08:25.95)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:37:56 (running for 00:08:31.11)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:38:01 (running for 00:08:36.23)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:38:06 (running for 00:08:41.36)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:38:13 (running for 00:08:47.46)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:38:18 (running for 00:08:52.61)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:38:23 (running for 00:08:57.71)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 8 | 450.921 | 32000 | 34.3077 | 83 | 9 | 34.3077 |
Current time: 2021-12-01 23:38:28 (running for 00:09:02.79)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:38:33 (running for 00:09:07.85)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:38:38 (running for 00:09:12.92)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:38:43 (running for 00:09:18.02)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:38:48 (running for 00:09:23.19)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:38:53 (running for 00:09:28.29)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:38:58 (running for 00:09:33.45)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:39:04 (running for 00:09:38.60)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:39:09 (running for 00:09:43.70)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:39:14 (running for 00:09:48.72)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:39:19 (running for 00:09:53.85)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | RUNNING | 127.0.0.1:10736 | 9 | 507.087 | 36000 | 39.4554 | 88 | 10 | 39.4554 |
Current time: 2021-12-01 23:39:23 (running for 00:09:57.60)
Memory usage on this node: 9.8/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StackedStatelessCartPole_23a62_00000 | TERMINATED | 127.0.0.1:10736 | 10 | 562.848 | 40000 | 36.6727 | 80 | 11 | 36.6727 |
print_reward(results3b2)
Reward after 10 training iterations: 36.67272727272727
plot_rewards(results3b2)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
="3a: Stacked, Partial Observations")
plot_learning(results3a, label="3b: LSTM")
plot_learning(results3b, label="3b2: LSTM + Stacking") plot_learning(results3b2, label
Option 3c: Use Attention for Processing the Sequence
Self-attention is a recent and popular alternative to RNNs for processing sequence data. Currently, the transformer architecture using self-attention is state of the art for natural language processing (NLP) tasks.
A similar, yet slightly modified architecture using attention is also useful for RL (see related paper). Again, enabling attention in RLlib simply requires setting the corresponding flag in the model config:
#collapse-output
= ppo.DEFAULT_CONFIG.copy()
config3c "env"] = "StatelessCartPole"
config3c["model"] = {
config3c[# Attention net wrapping (for tf) can already use the native keras
# model versions. For torch, this will have no effect.
"_use_default_native_models": True,
"use_attention": True,
# "max_seq_len": 10,
# "attention_num_transformer_units": 1,
# "attention_dim": 32,
# "attention_memory_inference": 10,
# "attention_memory_training": 10,
# "attention_num_heads": 1,
# "attention_head_dim": 32,
# "attention_position_wise_mlp_dim": 32,
}
= ray.tune.run("PPO", config=config3c, stop=stop)
results3c print("Option 3c: Training finished successfully")
Current time: 2021-12-01 23:39:24 (running for 00:00:00.15)
Memory usage on this node: 8.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | PENDING |
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=12464) 2021-12-01 23:39:35,779 INFO trainer.py:753 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=12464) 2021-12-01 23:39:35,780 INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(pid=12464) 2021-12-01 23:39:35,780 INFO trainer.py:770 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=None) c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\redis\connection.py:77: UserWarning: redis-py works best with hiredis. Please consider installing
(pid=None) warnings.warn(msg)
(pid=12464) 2021-12-01 23:40:01,112 WARNING trainer_template.py:185 -- `execution_plan` functions should accept `trainer`, `workers`, and `config` as args!
(pid=12464) 2021-12-01 23:40:01,115 INFO trainable.py:110 -- Trainable.setup took 25.341 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=12464) 2021-12-01 23:40:01,117 WARNING util.py:57 -- Install gputil for GPU system monitoring.
(pid=12464) 2021-12-01 23:40:08,500 WARNING deprecation.py:38 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!
(pid=12464) [2021-12-01 23:43:32,273 E 12464 11260] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=12464) Windows fatal exception: access violation
(pid=12464)
(pid=15896) [2021-12-01 23:43:32,278 E 15896 15972] raylet_client.cc:159: IOError: Unknown error [RayletClient] Failed to disconnect from raylet.
(pid=15896) Windows fatal exception: access violation
(pid=15896)
(pid=17956) [2021-12-01 23:43:32,288 C 17956 2592] core_worker.cc:796: Check failed: _s.ok() Bad status: IOError: Unknown error
(pid=17956) *** StackTrace Information ***
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyInit__raylet
(pid=17956) PyNumber_InPlaceLshift
(pid=17956) Py_CheckFunctionResult
(pid=17956) PyEval_EvalFrameDefault
(pid=17956) Py_CheckFunctionResult
(pid=17956) PyEval_EvalFrameDefault
(pid=17956) PyEval_EvalCodeWithName
(pid=17956) PyEval_EvalCodeEx
(pid=17956) PyEval_EvalCode
(pid=17956) PyArena_New
(pid=17956) PyArena_New
(pid=17956) PyRun_FileExFlags
(pid=17956) PyRun_SimpleFileExFlags
(pid=17956) PyRun_AnyFileExFlags
(pid=17956) Py_FatalError
(pid=17956) Py_RunMain
(pid=17956) Py_RunMain
(pid=17956) Py_Main
(pid=17956) BaseThreadInitThunk
(pid=17956) RtlUserThreadStart
(pid=17956)
(pid=17956) Windows fatal exception: access violation
(pid=17956)
(pid=17956) Stack (most recent call first):
(pid=17956) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\worker.py", line 425 in main_loop
(pid=17956) File "c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\ray\workers/default_worker.py", line 218 in <module>
2021-12-01 23:43:32,411 INFO tune.py:630 -- Total run time: 248.18 seconds (247.76 seconds for the tuning loop).
Current time: 2021-12-01 23:39:29 (running for 00:00:05.16)
Memory usage on this node: 8.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | PENDING |
Current time: 2021-12-01 23:40:01 (running for 00:00:36.89)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 |
Current time: 2021-12-01 23:40:02 (running for 00:00:37.93)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 |
Current time: 2021-12-01 23:40:07 (running for 00:00:43.01)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 |
Current time: 2021-12-01 23:40:12 (running for 00:00:48.07)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 |
Current time: 2021-12-01 23:40:17 (running for 00:00:53.12)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc |
---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 |
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 4000
custom_metrics: {}
date: 2021-12-01_23-40-21
done: false
episode_len_mean: 23.939393939393938
episode_media: {}
episode_reward_max: 76.0
episode_reward_mean: 23.939393939393938
episode_reward_min: 10.0
episodes_this_iter: 165
episodes_total: 165
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6663342118263245
entropy_coeff: 0.0
kl: 0.01937255822122097
policy_loss: -0.013014732860028744
total_loss: 154.01339721679688
vf_explained_var: 0.006343733984977007
vf_loss: 154.0225372314453
num_agent_steps_sampled: 4000
num_agent_steps_trained: 4000
num_steps_sampled: 4000
num_steps_trained: 4000
iterations_since_restore: 1
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 70.58666666666666
ram_util_percent: 83.49666666666667
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10596248134570115
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12798089296365409
mean_inference_ms: 3.0966986549503033
mean_raw_obs_processing_ms: 0.1998490762828277
time_since_restore: 20.76484441757202
time_this_iter_s: 20.76484441757202
time_total_s: 20.76484441757202
timers:
learn_throughput: 299.129
learn_time_ms: 13372.16
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 541.375
sample_time_ms: 7388.589
update_time_ms: 4.002
timestamp: 1638398421
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 4000
training_iteration: 1
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 8000
custom_metrics: {}
date: 2021-12-01_23-40-40
done: false
episode_len_mean: 27.791666666666668
episode_media: {}
episode_reward_max: 122.0
episode_reward_mean: 27.791666666666668
episode_reward_min: 9.0
episodes_this_iter: 144
episodes_total: 309
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.20000000298023224
cur_lr: 4.999999873689376e-05
entropy: 0.6823218464851379
entropy_coeff: 0.0
kl: 0.023256205022335052
policy_loss: 0.006493973080068827
total_loss: 161.61268615722656
vf_explained_var: 0.0007292712107300758
vf_loss: 161.60154724121094
num_agent_steps_sampled: 8000
num_agent_steps_trained: 8000
num_steps_sampled: 8000
num_steps_trained: 8000
num_steps_trained_this_iter: 0
iterations_since_restore: 2
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 73.93076923076923
ram_util_percent: 83.53846153846153
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.11035390714468313
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11674551834165348
mean_inference_ms: 3.0091454225371876
mean_raw_obs_processing_ms: 0.1931061975500896
time_since_restore: 39.707205057144165
time_this_iter_s: 18.942360639572144
time_total_s: 39.707205057144165
timers:
learn_throughput: 313.757
learn_time_ms: 12748.702
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 289.977
sample_time_ms: 13794.207
update_time_ms: 4.002
timestamp: 1638398440
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 8000
training_iteration: 2
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 12000
custom_metrics: {}
date: 2021-12-01_23-40-59
done: false
episode_len_mean: 23.017142857142858
episode_media: {}
episode_reward_max: 66.0
episode_reward_mean: 23.017142857142858
episode_reward_min: 8.0
episodes_this_iter: 175
episodes_total: 484
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.30000001192092896
cur_lr: 4.999999873689376e-05
entropy: 0.6649836897850037
entropy_coeff: 0.0
kl: 0.018709277734160423
policy_loss: -0.007455786690115929
total_loss: 69.26802062988281
vf_explained_var: -0.04406118765473366
vf_loss: 69.26985931396484
num_agent_steps_sampled: 12000
num_agent_steps_trained: 12000
num_steps_sampled: 12000
num_steps_trained: 12000
num_steps_trained_this_iter: 0
iterations_since_restore: 3
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 70.76666666666667
ram_util_percent: 83.59629629629629
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10152428396558573
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11275608464037909
mean_inference_ms: 2.987558953459597
mean_raw_obs_processing_ms: 0.18824034213713292
time_since_restore: 58.64493227005005
time_this_iter_s: 18.937727212905884
time_total_s: 58.64493227005005
timers:
learn_throughput: 318.575
learn_time_ms: 12555.928
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 257.773
sample_time_ms: 15517.557
update_time_ms: 2.668
timestamp: 1638398459
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 12000
training_iteration: 3
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 16000
custom_metrics: {}
date: 2021-12-01_23-41-20
done: false
episode_len_mean: 27.27891156462585
episode_media: {}
episode_reward_max: 65.0
episode_reward_mean: 27.27891156462585
episode_reward_min: 10.0
episodes_this_iter: 147
episodes_total: 631
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.30000001192092896
cur_lr: 4.999999873689376e-05
entropy: 0.666556715965271
entropy_coeff: 0.0
kl: 0.021740607917308807
policy_loss: 0.0014071379555389285
total_loss: 86.13334655761719
vf_explained_var: -0.020127560943365097
vf_loss: 86.12541961669922
num_agent_steps_sampled: 16000
num_agent_steps_trained: 16000
num_steps_sampled: 16000
num_steps_trained: 16000
num_steps_trained_this_iter: 0
iterations_since_restore: 4
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 77.01379310344828
ram_util_percent: 83.93103448275863
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.10143838749352374
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.10536623976424975
mean_inference_ms: 2.9572038509498433
mean_raw_obs_processing_ms: 0.2044611579055872
time_since_restore: 79.04436016082764
time_this_iter_s: 20.399427890777588
time_total_s: 79.04436016082764
timers:
learn_throughput: 311.522
learn_time_ms: 12840.182
load_throughput: 0.0
load_time_ms: 0.0
sample_throughput: 244.428
sample_time_ms: 16364.753
update_time_ms: 3.002
timestamp: 1638398480
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 16000
training_iteration: 4
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 20000
custom_metrics: {}
date: 2021-12-01_23-41-45
done: false
episode_len_mean: 23.446428571428573
episode_media: {}
episode_reward_max: 102.0
episode_reward_mean: 23.446428571428573
episode_reward_min: 9.0
episodes_this_iter: 168
episodes_total: 799
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.44999998807907104
cur_lr: 4.999999873689376e-05
entropy: 0.6605092287063599
entropy_coeff: 0.0
kl: 0.019639955833554268
policy_loss: -0.012088990770280361
total_loss: 87.5208740234375
vf_explained_var: -0.08362725377082825
vf_loss: 87.52411651611328
num_agent_steps_sampled: 20000
num_agent_steps_trained: 20000
num_steps_sampled: 20000
num_steps_trained: 20000
num_steps_trained_this_iter: 0
iterations_since_restore: 5
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 89.04
ram_util_percent: 84.47428571428571
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1082024611015279
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.11219990467946991
mean_inference_ms: 3.1819671927268045
mean_raw_obs_processing_ms: 0.2181941025366806
time_since_restore: 104.5215425491333
time_this_iter_s: 25.477182388305664
time_total_s: 104.5215425491333
timers:
learn_throughput: 297.013
learn_time_ms: 13467.402
load_throughput: 19987152.728
load_time_ms: 0.2
sample_throughput: 225.522
sample_time_ms: 17736.596
update_time_ms: 2.401
timestamp: 1638398505
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 20000
training_iteration: 5
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 24000
custom_metrics: {}
date: 2021-12-01_23-42-12
done: false
episode_len_mean: 28.964285714285715
episode_media: {}
episode_reward_max: 94.0
episode_reward_mean: 28.964285714285715
episode_reward_min: 9.0
episodes_this_iter: 140
episodes_total: 939
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.44999998807907104
cur_lr: 4.999999873689376e-05
entropy: 0.6493598818778992
entropy_coeff: 0.0
kl: 0.008863288909196854
policy_loss: -0.009960012510418892
total_loss: 130.17213439941406
vf_explained_var: -0.04542897269129753
vf_loss: 130.1781005859375
num_agent_steps_sampled: 24000
num_agent_steps_trained: 24000
num_steps_sampled: 24000
num_steps_trained: 24000
num_steps_trained_this_iter: 0
iterations_since_restore: 6
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 87.96944444444443
ram_util_percent: 84.64722222222221
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12465831386460766
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12816464580125522
mean_inference_ms: 3.389497130627828
mean_raw_obs_processing_ms: 0.2190003983187485
time_since_restore: 130.94173955917358
time_this_iter_s: 26.420197010040283
time_total_s: 130.94173955917358
timers:
learn_throughput: 287.926
learn_time_ms: 13892.464
load_throughput: 11985152.518
load_time_ms: 0.334
sample_throughput: 208.532
sample_time_ms: 19181.747
update_time_ms: 4.777
timestamp: 1638398532
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 24000
training_iteration: 6
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 28000
custom_metrics: {}
date: 2021-12-01_23-42-32
done: false
episode_len_mean: 35.74107142857143
episode_media: {}
episode_reward_max: 111.0
episode_reward_mean: 35.74107142857143
episode_reward_min: 10.0
episodes_this_iter: 112
episodes_total: 1051
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.44999998807907104
cur_lr: 4.999999873689376e-05
entropy: 0.6521294116973877
entropy_coeff: 0.0
kl: 0.008087929338216782
policy_loss: -0.001228039851412177
total_loss: 156.5521697998047
vf_explained_var: -0.018433474004268646
vf_loss: 156.5497589111328
num_agent_steps_sampled: 28000
num_agent_steps_trained: 28000
num_steps_sampled: 28000
num_steps_trained: 28000
num_steps_trained_this_iter: 0
iterations_since_restore: 7
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 79.4392857142857
ram_util_percent: 84.67857142857142
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1278596423707056
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12899242071720216
mean_inference_ms: 3.353225638041823
mean_raw_obs_processing_ms: 0.2171522871719407
time_since_restore: 151.05801963806152
time_this_iter_s: 20.11628007888794
time_total_s: 151.05801963806152
timers:
learn_throughput: 291.389
learn_time_ms: 13727.331
load_throughput: 9323635.44
load_time_ms: 0.429
sample_throughput: 202.114
sample_time_ms: 19790.796
update_time_ms: 4.81
timestamp: 1638398552
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 28000
training_iteration: 7
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 32000
custom_metrics: {}
date: 2021-12-01_23-42-52
done: false
episode_len_mean: 28.52857142857143
episode_media: {}
episode_reward_max: 117.0
episode_reward_mean: 28.52857142857143
episode_reward_min: 8.0
episodes_this_iter: 140
episodes_total: 1191
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.44999998807907104
cur_lr: 4.999999873689376e-05
entropy: 0.6189269423484802
entropy_coeff: 0.0
kl: 0.010187552310526371
policy_loss: -0.01204092800617218
total_loss: 135.2486572265625
vf_explained_var: -0.08293487131595612
vf_loss: 135.25611877441406
num_agent_steps_sampled: 32000
num_agent_steps_trained: 32000
num_steps_sampled: 32000
num_steps_trained: 32000
num_steps_trained_this_iter: 0
iterations_since_restore: 8
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 80.66551724137932
ram_util_percent: 84.77931034482759
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12512978011718637
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12649312220671538
mean_inference_ms: 3.3173461928775416
mean_raw_obs_processing_ms: 0.21560295664752305
time_since_restore: 171.38141465187073
time_this_iter_s: 20.323395013809204
time_total_s: 171.38141465187073
timers:
learn_throughput: 292.724
learn_time_ms: 13664.765
load_throughput: 7989150.476
load_time_ms: 0.501
sample_throughput: 202.015
sample_time_ms: 19800.559
update_time_ms: 4.709
timestamp: 1638398572
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 32000
training_iteration: 8
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 36000
custom_metrics: {}
date: 2021-12-01_23-43-12
done: false
episode_len_mean: 35.785714285714285
episode_media: {}
episode_reward_max: 91.0
episode_reward_mean: 35.785714285714285
episode_reward_min: 10.0
episodes_this_iter: 112
episodes_total: 1303
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.44999998807907104
cur_lr: 4.999999873689376e-05
entropy: 0.616661548614502
entropy_coeff: 0.0
kl: 0.00952129065990448
policy_loss: -0.00046885418123565614
total_loss: 135.04832458496094
vf_explained_var: -0.05452437326312065
vf_loss: 135.0445098876953
num_agent_steps_sampled: 36000
num_agent_steps_trained: 36000
num_steps_sampled: 36000
num_steps_trained: 36000
num_steps_trained_this_iter: 0
iterations_since_restore: 9
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 76.94814814814812
ram_util_percent: 84.89629629629631
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.1238827989880333
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12481765181822614
mean_inference_ms: 3.2899851537455973
mean_raw_obs_processing_ms: 0.21254970774456478
time_since_restore: 190.8527319431305
time_this_iter_s: 19.471317291259766
time_total_s: 190.8527319431305
timers:
learn_throughput: 295.82
learn_time_ms: 13521.73
load_throughput: 8987794.286
load_time_ms: 0.445
sample_throughput: 201.373
sample_time_ms: 19863.604
update_time_ms: 4.629
timestamp: 1638398592
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 36000
training_iteration: 9
trial_id: '88823_00000'
Result for PPO_StatelessCartPole_88823_00000:
agent_timesteps_total: 40000
custom_metrics: {}
date: 2021-12-01_23-43-31
done: true
episode_len_mean: 35.06140350877193
episode_media: {}
episode_reward_max: 147.0
episode_reward_mean: 35.06140350877193
episode_reward_min: 10.0
episodes_this_iter: 114
episodes_total: 1417
experiment_id: 7e1494afa3414dd998e7cde489d370fd
hostname: nb-stschn
info:
learner:
default_policy:
custom_metrics: {}
learner_stats:
cur_kl_coeff: 0.44999998807907104
cur_lr: 4.999999873689376e-05
entropy: 0.610243558883667
entropy_coeff: 0.0
kl: 0.0024006376042962074
policy_loss: -0.00499696284532547
total_loss: 148.4215850830078
vf_explained_var: -0.0558871254324913
vf_loss: 148.42550659179688
num_agent_steps_sampled: 40000
num_agent_steps_trained: 40000
num_steps_sampled: 40000
num_steps_trained: 40000
num_steps_trained_this_iter: 0
iterations_since_restore: 10
node_ip: 127.0.0.1
num_healthy_workers: 2
off_policy_estimator: {}
perf:
cpu_util_percent: 78.55714285714284
ram_util_percent: 84.91785714285713
pid: 12464
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 0.12214391459572835
mean_env_render_ms: 0.0
mean_env_wait_ms: 0.12507529134994974
mean_inference_ms: 3.262920007962691
mean_raw_obs_processing_ms: 0.21362642628938813
time_since_restore: 210.48571395874023
time_this_iter_s: 19.63298201560974
time_total_s: 210.48571395874023
timers:
learn_throughput: 298.029
learn_time_ms: 13421.519
load_throughput: 9986438.095
load_time_ms: 0.401
sample_throughput: 201.72
sample_time_ms: 19829.44
update_time_ms: 4.566
timestamp: 1638398611
timesteps_since_restore: 0
timesteps_this_iter: 0
timesteps_total: 40000
training_iteration: 10
trial_id: '88823_00000'
Option 3c: Training finished successfully
Current time: 2021-12-01 23:40:22 (running for 00:00:58.71)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 1 | 20.7648 | 4000 | 23.9394 | 76 | 10 | 23.9394 |
Current time: 2021-12-01 23:40:27 (running for 00:01:03.74)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 1 | 20.7648 | 4000 | 23.9394 | 76 | 10 | 23.9394 |
Current time: 2021-12-01 23:40:33 (running for 00:01:08.80)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 1 | 20.7648 | 4000 | 23.9394 | 76 | 10 | 23.9394 |
Current time: 2021-12-01 23:40:38 (running for 00:01:13.84)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 1 | 20.7648 | 4000 | 23.9394 | 76 | 10 | 23.9394 |
Current time: 2021-12-01 23:40:43 (running for 00:01:19.70)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 2 | 39.7072 | 8000 | 27.7917 | 122 | 9 | 27.7917 |
Current time: 2021-12-01 23:40:49 (running for 00:01:25.75)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 2 | 39.7072 | 8000 | 27.7917 | 122 | 9 | 27.7917 |
Current time: 2021-12-01 23:40:56 (running for 00:01:31.81)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 2 | 39.7072 | 8000 | 27.7917 | 122 | 9 | 27.7917 |
Current time: 2021-12-01 23:41:01 (running for 00:01:37.66)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 3 | 58.6449 | 12000 | 23.0171 | 66 | 8 | 23.0171 |
Current time: 2021-12-01 23:41:07 (running for 00:01:43.73)
Memory usage on this node: 9.9/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 3 | 58.6449 | 12000 | 23.0171 | 66 | 8 | 23.0171 |
Current time: 2021-12-01 23:41:13 (running for 00:01:48.78)
Memory usage on this node: 10.0/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 3 | 58.6449 | 12000 | 23.0171 | 66 | 8 | 23.0171 |
Current time: 2021-12-01 23:41:18 (running for 00:01:54.22)
Memory usage on this node: 10.0/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 3 | 58.6449 | 12000 | 23.0171 | 66 | 8 | 23.0171 |
Current time: 2021-12-01 23:41:24 (running for 00:02:00.18)
Memory usage on this node: 10.0/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 4 | 79.0444 | 16000 | 27.2789 | 65 | 10 | 27.2789 |
Current time: 2021-12-01 23:41:29 (running for 00:02:05.30)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 4 | 79.0444 | 16000 | 27.2789 | 65 | 10 | 27.2789 |
Current time: 2021-12-01 23:41:34 (running for 00:02:10.38)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 4 | 79.0444 | 16000 | 27.2789 | 65 | 10 | 27.2789 |
Current time: 2021-12-01 23:41:39 (running for 00:02:15.49)
Memory usage on this node: 10.0/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 4 | 79.0444 | 16000 | 27.2789 | 65 | 10 | 27.2789 |
Current time: 2021-12-01 23:41:44 (running for 00:02:20.59)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 4 | 79.0444 | 16000 | 27.2789 | 65 | 10 | 27.2789 |
Current time: 2021-12-01 23:41:49 (running for 00:02:25.71)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 5 | 104.522 | 20000 | 23.4464 | 102 | 9 | 23.4464 |
Current time: 2021-12-01 23:41:55 (running for 00:02:30.94)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 5 | 104.522 | 20000 | 23.4464 | 102 | 9 | 23.4464 |
Current time: 2021-12-01 23:42:00 (running for 00:02:36.06)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 5 | 104.522 | 20000 | 23.4464 | 102 | 9 | 23.4464 |
Current time: 2021-12-01 23:42:05 (running for 00:02:41.16)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 5 | 104.522 | 20000 | 23.4464 | 102 | 9 | 23.4464 |
Current time: 2021-12-01 23:42:10 (running for 00:02:46.26)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 5 | 104.522 | 20000 | 23.4464 | 102 | 9 | 23.4464 |
Current time: 2021-12-01 23:42:16 (running for 00:02:52.11)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 6 | 130.942 | 24000 | 28.9643 | 94 | 9 | 28.9643 |
Current time: 2021-12-01 23:42:21 (running for 00:02:57.22)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 6 | 130.942 | 24000 | 28.9643 | 94 | 9 | 28.9643 |
Current time: 2021-12-01 23:42:26 (running for 00:03:02.28)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 6 | 130.942 | 24000 | 28.9643 | 94 | 9 | 28.9643 |
Current time: 2021-12-01 23:42:31 (running for 00:03:07.37)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 6 | 130.942 | 24000 | 28.9643 | 94 | 9 | 28.9643 |
Current time: 2021-12-01 23:42:37 (running for 00:03:13.23)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 7 | 151.058 | 28000 | 35.7411 | 111 | 10 | 35.7411 |
Current time: 2021-12-01 23:42:42 (running for 00:03:18.32)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 7 | 151.058 | 28000 | 35.7411 | 111 | 10 | 35.7411 |
Current time: 2021-12-01 23:42:47 (running for 00:03:23.38)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 7 | 151.058 | 28000 | 35.7411 | 111 | 10 | 35.7411 |
Current time: 2021-12-01 23:42:52 (running for 00:03:28.45)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 7 | 151.058 | 28000 | 35.7411 | 111 | 10 | 35.7411 |
Current time: 2021-12-01 23:42:57 (running for 00:03:33.61)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 8 | 171.381 | 32000 | 28.5286 | 117 | 8 | 28.5286 |
Current time: 2021-12-01 23:43:02 (running for 00:03:38.66)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 8 | 171.381 | 32000 | 28.5286 | 117 | 8 | 28.5286 |
Current time: 2021-12-01 23:43:07 (running for 00:03:43.70)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 8 | 171.381 | 32000 | 28.5286 | 117 | 8 | 28.5286 |
Current time: 2021-12-01 23:43:13 (running for 00:03:49.08)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 9 | 190.853 | 36000 | 35.7857 | 91 | 10 | 35.7857 |
Current time: 2021-12-01 23:43:18 (running for 00:03:54.13)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 9 | 190.853 | 36000 | 35.7857 | 91 | 10 | 35.7857 |
Current time: 2021-12-01 23:43:23 (running for 00:03:59.18)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 9 | 190.853 | 36000 | 35.7857 | 91 | 10 | 35.7857 |
Current time: 2021-12-01 23:43:28 (running for 00:04:04.24)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | RUNNING | 127.0.0.1:12464 | 9 | 190.853 | 36000 | 35.7857 | 91 | 10 | 35.7857 |
Current time: 2021-12-01 23:43:32 (running for 00:04:07.82)
Memory usage on this node: 10.1/11.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/3 CPUs, 0/0 GPUs, 0.0/1.31 GiB heap, 0.0/0.65 GiB objects
Result logdir: C:\Users\Stefan\ray_results\PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
---|---|---|---|---|---|---|---|---|---|
PPO_StatelessCartPole_88823_00000 | TERMINATED | 127.0.0.1:12464 | 10 | 210.486 | 40000 | 35.0614 | 147 | 10 | 35.0614 |
print_reward(results3c)
Reward after 10 training iterations: 35.06140350877193
plot_rewards(results3c)
c:\users\stefan\git-repos\private\blog\venv\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
="1: Full Observations")
plot_learning(results1, label="2: Partial Observations")
plot_learning(results2, label="3a: Stacked, Partial Observations")
plot_learning(results3a, label="3b: LSTM")
plot_learning(results3b, label="3c: Attention") plot_learning(results3c, label
Attention with Stacked Observations
This blog post is still work in progress. Currently, there seems to be an issue with attention in RLlib.