Tensorflow在GPU下的Poolallocator Message


我在在用GPU跑我一个深度模型的时候,发生了以下的问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
...
2018-06-27 18:09:11.701458: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 63521 get requests, put_count=63521 evicted_count=1000 eviction_rate=0.0157428 and unsatisfied allocation rate=0.0173171
2018-06-27 18:09:11.701503: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Global_step 2000 Train_loss: 0.0758
Global_step 3000 Train_loss: 0.0618
Global_step 4000 Train_loss: 0.0564
Global_step 5000 Train_loss: 0.0521
Global_step 6000 Train_loss: 0.0492
Global_step 7000 Train_loss: 0.0468
Global_step 8000 Train_loss: 0.0443
Global_step 9000 Train_loss: 0.0422
Global_step 10000 Train_loss: 0.0410
Global_step 11000 Train_loss: 0.0397
Global_step 12000 Train_loss: 0.0383
2018-06-27 18:13:59.743133: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 71532 get requests, put_count=71532 evicted_count=1000 eviction_rate=0.0139798 and unsatisfied allocation rate=0.0143013
2018-06-27 18:13:59.743167: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
...

除了常规的loss数据之外,我看到穿插在之间的warming informations ,虽然最后的结果没有任何问题,但是我抱着好奇的心态在stackoverflow找到了原因:

TensorFlow has multiple memory allocators, for memory that will be used in different ways. Their behavior has some adaptive aspects.
In your particular case, since you’re using a GPU, there is a PoolAllocator for CPU memory that is pre-registered with the GPU for fast DMA. A tensor that is expected to be transferred from CPU to GPU, e.g., will be allocated from this pool.
The PoolAllocators attempt to amortize the cost of calling a more expensive underlying allocator by keeping around a pool of allocated then freed chunks that are eligible for immediate reuse. Their default behavior is to grow slowly until the eviction rate drops below some constant. (The eviction rate is the proportion of free calls where we return an unused chunk from the pool to the underlying pool in order not to exceed the size limit.) In the log messages above, you see “Raising pool_sizelimit“ lines that show the pool size growing. Assuming that your program actually has a steady state behavior with a maximum size collection of chunks it needs, the pool will grow to accommodate it, and then grow no more. It behaves this way rather than simply retaining all chunks ever allocated so that sizes needed only rarely, or only during program startup, are less likely to be retained in the pool.
These messages should only be a cause for concern if you run out of memory. In such a case the log messages may help diagnose the problem. Note also that peak execution speed may only be attained after the memory pools have grown to the proper size.

加粗部分解释机制、处理方式和原因。总结起来就是,PoolAllocator会有一个内存分配机制,GPU和CPU之间不是独立的可以相互传输,如果你使用的空间太多,他就会提高原有的预设的空间大小,如果够用了,就没有什么影响了,但是,需要注意的是,兄弟你的数据加载量太大了,看看是不是改改batch size,一次性少加载点数据,或者干掉隔壁同事的任务。

打赏的大佬可以联系我,赠送超赞的算法资料