Файл подкачки слишком мал для завершения операции python

Но в настоящее время с этим есть две проблемы:

Почему это не происходит каждый раз , когда я выполняю перестройку? Если я правильно понимаю, компилятору не хватило памяти при компиляции моего проекта. Итак, если я сделаю перестройку, которая очистит всю предыдущую работу, не должна ли она исчерпать память в следующий раз, если я ничего не изменю?

4 ответа

Я знаю, что это старый, но я оказался здесь, так что могу ответить.

Есть отличная статья о проблемах PCH здесь.

1) Почему это не происходит каждый раз, когда я выполняю перестройку?
Конечно, на этот вопрос сложно ответить. Поскольку это происходит не каждый раз, может быть несколько проблем. Скорее всего, это связано с выделением памяти. Из статьи:

Фрагментация диапазонов адресов виртуальной памяти, необходимых для PCH перед CL.EXE сможет загрузить его в память.
Отказ ОС Windows при больших нагрузках увеличить размер файла подкачки в течение определенного порогового времени.

c1xx: ошибка C3859: не удалось создать виртуальную память для PCH [. Project.vcxproj] c1xx: примечание: система вернула код 1455: файл подкачки слишком мал для завершения этой операции

Также может помочь установить PreferredToolArchtecture на x64:

Если вы используете MSBuild из командной строки, вы можете передать / p: PreferredToolArchtecture = x64 в MSBuild. Если вы создаете с помощью MSBuild из Visual Studio, вы можете отредактировать файл .vcxproj, включив в него PropertyGroup, содержащую это свойство.

Это легко упустить из виду, но такие проблемы также возникают, когда предварительно скомпилированный заголовок слишком велик. Также неплохо было бы провести небольшую уборку.

Просто вмешиваюсь в то, чем оказалось решение для меня. Похоже, что Visual Studio пыталась скомпилировать мою программу для нескольких архитектур, хотя я думал, что удалил профили, в диспетчере конфигурации была куча фиктивных записей для сборки в режиме x86. Для меня это было бесполезно, так как я хочу собирать только x64. После удаления этих записей программа снова скомпилировалась, и эта ошибка исчезла. Надеюсь, это кому-то поможет.

Еще одна причина этой проблемы. Я не совсем понимаю, как проект попал в это состояние, но он пытался использовать файлы PCH, для параметра «Файл предварительно скомпилированного заголовка» было установлено значение pch.h , но параметр «Вывод предварительно скомпилированного заголовка» чуть ниже был пусто.

Неудивительно, но Visual Studio очень сильно разошлась по этому поводу, в частности, выдавая многочисленные ошибки C3859 во время сборки.

Сортировка конфигурации проекта до «Наследовать» это значение исправила его.

Я столкнулся с этим, собирая большую базу кода на локальной виртуальной машине. Пытался увеличить размер файла подкачки и т. Д., Но ничего не вышло. Единственное, что сработало в моем случае, - это отключить динамическую память в настройке виртуальной машины Hyper-V и дать виртуальной машине больше ОЗУ, 8 ГБ -> 16 ГБ.

По-видимому, VS выделяет память заранее, поэтому он использует только начальное значение, данное виртуальной машине, и не запускает никаких изменений динамической памяти.

after run the main_fine_tuning.py file, i got this trace back:

i tried to set the BATCH_SIZE =1 , but this problem still occur. Do you have any solution for this one?

The text was updated successfully, but these errors were encountered:

cobryan05 commented Nov 22, 2021

python fixNvPe.py --input C:\ProgramData\Anaconda3\lib\site-packages\torch\lib*.dll

Hello, I don't get it clearly. All those step is put the fixNvPe.py in the C:\ProgramData\Anaconda3\lib\site-packages\torch\lib*.dll ?

If I wrong please explain for me the step. Thank you so much.

You can place fixNvPe.py wherever you want. You then run this script using python , and you tell it what files to run on by passing an --input parameter with the path of the files you want to modify.

For example, OP's error message was

OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.

It is failing to load C:\ProgramData\Anaconda3\lib\site-packages\torcfh\lib\caffe2_detectron_ops_gpu.dll . You could pass exactly this as the --input parameter to fixNvPe.py, or you could replace caffe2_detectron_ops_gpu.dll with *.dll to instead process every DLL in that directory (the '*' is a wildcard, so *.dll is 'every file ending in .dll).

For example, if you downloaded fixNvPe.py to C:\Downloads\fixNvPe.py then you could open a command prompt and type something like

python C:\Downloads\fixNvPe.py --input=C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\*.dll

This will 'fix' all of the DLL files in the torch\lib directory. Your specific computer may use different paths, and you may have to run this tool on multiple folders, depending on your exact setup. Just look at the error message you are getting for a hint on what the correct paths are.

If you get an error message about failing to import pefile then you need to first run python -m pip install pefile

tufail117 commented Feb 20, 2021

Well, i managed to resolve this.
open "advanced system setting". Go to the advanced tab then click settings related to performance.
Again click on advanced tab--> change --> unselect 'automatically. '. for all the drives, set 'system managed size'. Restart your pc.

MarcinMisiurewicz commented Jun 6, 2019

I've also encountered that problem and it seems that this is a multiprocessing problem. What worked for me was reducing the number of workers in DataLoader (line 108 in your code). Your number is quite high - 25. Workers are subprocesses that load the data, so if you have 25 of them your cpu can rebell :)
Try reducing it to 1, and if that works you can try to increase it. If I'm resoning correctly it shouldn't exceed number of your logical processors in CPU (but if you are comupting something parallely, like me rigth now, with another dataloader, you should decrease it even more).

Hope that help future generations

Javierete commented Jan 27, 2021

Hi there,
I find the same problem with my setups (both in Windows).
Originally had a X99 with a 8 core CPU with 64GB of RAM and 2x RTX2080ti and was able to run up to 6x pytorch RL algorithms with up to 10 multiprocessing workers each (total 60 workers running in parallel - obviously they were taking turns). If I pushed passed those numbers, I would get those errors as described above.
Now, I changed my setup to be a 3970X with 32 cores 64GB Ram and the same 2x GPUs. I can barely run 3x of the same algos with up to 8 workers each. Any loading more than that generates the same error.
When running them the RAM used never more than 40-50%. Any pointing in the right direction will be highly appreciated.
Thanks!

krisstern commented Sep 29, 2021

I was having the same error thrown with yolov5 , which was fixed by changing the number of workers nw to 4 manually in the "datasets.py" file.

PonyPC commented Sep 29, 2021 •

cobryan05 commented Dec 3, 2021 •

This problem is about DataLoader.
you must reduce the value of num_workers.
in folder : python\Lib\site-packages\torch\utils\data open dataloader.py and in line 189 write self.num_workers = 2.

The issue is with how multi-process Python works on Windows with the pytorch/cuda DLLs. The number of workers you set in the DataLoader directly relates to how many Python processes are created.

Each time a Python process imports pytorch it loads several DLLs. These DLLs have very large sections of data in them that aren't really used, but space is reserved for them in memory anyways. We're talking in the range of hundreds of megabytes to a couple gigabytes, per DLL.

When Windows is asked to reserve memory, if it says that it returned memory then it guarantees that memory will be available to you, even if you never end up using it.

Linux allows overcommitting. By default on Linux, when you ask it to reserve memory, it says "Yeah sure, here you go" and tells you that it reserved the memory. But it hasn't actually done this. It will reserve it when you try to use it, and hopes that there is something available at that time.

So, if you allocate memory on Windows, you can be sure you can use that memory. If you allocate memory on Linux, it is possible that when you actually try to use the memory that it will not be there, and your program will crash.

On Linux, when it spawns num_workers processes and each one reserves several gigabytes of data, Linux is happy to say it reserved this, even though it didn't. Since this "reserved memory" is never actually used, everything is good. You can create tons of worker processes. Just because pytorch allocated 50GB of memory, as long as it never actually uses it it won't be a problem. (Note: I haven't actually ran pytorch on Linux. I am just describing how Linux would not have this crash even if it attempted to allocate the same amount of memory. I do not know for a fact that pytorch/CUDA overallocate on Linux)

On Windows, when you spawn num_workers processes and each one reserves several gigabytes of data, Windows insists that it can actually satisfy this request should the memory be used. So, if Python tries to allocate 50GB of memory, then your total RAM + page file size must have space for 50GB.

Your suggestion of lowering num_workers decreases NumPythonProcesses . The suggestions to modify the page file size increase PageFileSize . My FixNvPe.py script decreases MemoryPerProcess .

The trick is to find a balance of all of these variables that keeps that equation true.

У меня очень маленькая сеть, которую я хочу протестировать с разными случайными семенами. Сеть едва использует 1% вычислительной мощности моего графического процессора, поэтому теоретически я мог бы запустить 50 процессов одновременно, чтобы попробовать много разных семян одновременно.

Проблема

К сожалению, я даже не могу импортировать pytorch в несколько процессов. Когда количество процессов превышает 4, я получаю отслеживание относительно слишком маленького файла подкачки.

Минимально воспроизводимый код§ – dispatcher.py

§Я увеличил количество семян, чтобы люди с лучшими машинами тоже могли это воспроизвести.

Минимально воспроизводимый код — ml_model.py

Дальнейшее расследование

Я заметил, что каждый процесс загружает много dll в оперативную память. И когда я закрываю все другие программы, которые используют много оперативной памяти, я могу получить до 10 процессов вместо 4. Так что это похоже на ограничение ресурсов.

Вопросы

Есть ли обходной путь?

Каков рекомендуемый способ обучения множества небольших сетей с помощью pytorch на одном графическом процессоре?

Должен ли я вместо этого написать собственное ядро CUDA или использовать другую структуру для достижения этой цели?

Моя цель состояла бы в том, чтобы запустить около 50 процессов одновременно (на машине с 16 ГБ ОЗУ, 8 ГБ ОЗУ графического процессора).

Привет. не могли бы вы перечислить файлы в папке "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\lib\". Если cudnn_cnn_infer64_8.dll отсутствует, возможно, у вас есть проблема с установкой pytorch с поддержкой GPU

glenn-jocher commented Sep 29, 2021 •

@ardeal @krisstern @PonyPC you can set dataloader workers during training, i.e.:

It seems like a lot of windows users are encountering this problem, but as @PonyPC mentioned reducing workers will generally also result in slower training. Are you guys encountering this during DDP or single-GPU training?

EDIT: just realized this is YOLOv3 repo and not YOLOv5. I would strongly encourage all users to migrate to YOLOv5, which is much better maintained. It's possible this issue is already resolved there.

WhiTExB3AR commented Nov 20, 2021

python fixNvPe.py --input C:\ProgramData\Anaconda3\lib\site-packages\torch\lib*.dll

this fixed it (although you mentioned "not completely") this has been a better suggestion than anything found elsewhere.

Hello, I don't get it clearly. All those step is put the fixNvPe.py in the C:\ProgramData\Anaconda3\lib\site-packages\torch\lib*.dll ?

If I wrong please explain for me the step. Thank you so much.

glenn-jocher commented Sep 29, 2021 •

@PonyPC please raise a bug report issue citing a reproducible example in the YOLOv5 repo in that case.

Javierete commented Jan 27, 2021

Not sure if it's the best way to solve the problem but it worked so far (fingers crossed)

4 ответа

Я немного изучил это сегодня вечером. У меня нет решения (изменить: у меня есть решение, см. правку в конце), но у меня есть немного больше информации.

Похоже, проблема вызвана загрузкой в память файлов NVidia fatbins (.nv_fatb). Некоторые библиотеки DLL, такие как cusolver64_xx.dll, torcha_cuda_cu.dll и некоторые другие, содержат разделы .nv_fatb. Они содержат множество различных вариантов кода CUDA для разных графических процессоров, поэтому в итоге получается от нескольких сотен мегабайт до пары гигабайт.

Каждый процесс Python, который загружает эти библиотеки DLL, выделяет несколько ГБ памяти для загрузки этих библиотек DLL. Таким образом, если 1 процесс Python тратит впустую 2 ГБ памяти, а вы пытаетесь запустить 8 рабочих процессов, вам потребуется 16 ГБ свободной памяти только для загрузки DLL. На самом деле не похоже, что эта память используется, просто зафиксирована.

Получите его и установите его зависимость от pefile ( python -m pip install pefile ).

Запустите его на своих факелах и библиотеках cuda. В случае OP командная строка может выглядеть так:

(Вы также можете запускать его везде, где находится ваш cusolver64_*.dll и другие файлы. Он может находиться в папке torch\lib или, например, в C:\Program Files\NVIDIA GPU Computing. Toolkit\CUDA\vXX.X\bin . Если он находится в Program Files, вам нужно будет запустить скрипт с правами администратора)

Что этот сценарий будет делать, так это сканировать все библиотеки DLL, указанные во входном глобусе, и, если он найдет раздел .nv_fatb, он создаст резервную копию DLL, отключит ASLR и пометит раздел .nv_fatb как доступный только для чтения.

ASLR — это «рандомизация макета адресного пространства». Это функция безопасности, которая рандомизирует место загрузки DLL в память. Мы отключаем его для этой DLL, чтобы все процессы Python загружали DLL по одному и тому же базовому виртуальному адресу. Если все процессы Python, использующие библиотеку DLL, загружают ее по одному и тому же базовому адресу, все они могут совместно использовать библиотеку DLL. В противном случае каждому процессу нужна собственная копия.

Пометка раздела «только для чтения» позволяет Windows знать, что содержимое не будет изменяться в памяти. Если вы сопоставляете файл с памятью для чтения/записи, Windows должна выделить достаточно памяти, поддерживаемой файлом подкачки, на тот случай, если вы внесете в него изменения. Если раздел доступен только для чтения, нет необходимости создавать его резервную копию в файле подкачки. Мы знаем, что для него нет модификаций, поэтому его всегда можно найти в DLL.

Теория сценария заключается в том, что при изменении этих двух флагов для .nv_fatb будет выделено меньше памяти, и больше памяти будет совместно использоваться процессами Python. На практике это работает. Не так хорошо, как я надеялся (он по-прежнему выделяет намного больше, чем использует), поэтому мое понимание может быть ошибочным, но это значительно уменьшает выделение памяти.

В моем ограниченном тестировании я не столкнулся с какими-либо проблемами, но я не могу гарантировать, что нет путей кода, которые пытаются записать в тот раздел, который мы пометили как «только для чтения». Однако, если вы начнете сталкиваться с проблемами, вы можете просто восстановить резервные копии.

PonyPC commented Jun 9, 2021

reduce number of workers will reduce train speed efficiently.

VahidFe96 commented Dec 3, 2021

This problem is about DataLoader.
you must reduce the value of num_workers.
in folder : python\Lib\site-packages\torch\utils\data open dataloader.py and in line 189 write self.num_workers = 2.

XuChang2020 commented Apr 30, 2021

1.try counting down the num_workers to 1or 0.
2.try modifying batch-size = 2 or 1.
Hope to help u.

toiyeumayhoc commented Jan 28, 2019

@brianFruit still stuck in this one.

tufail117 commented Feb 20, 2021

Any update on this? I am also facing the same issue. Have tried many things for the last 3 days, but no success.

abhishekstha98 commented Oct 18, 2021

python fixNvPe.py --input C:\ProgramData\Anaconda3\lib\site-packages\torch\lib*.dll

this fixed it (although you mentioned "not completely") this has been a better suggestion than anything found elsewhere.

cobryan05 commented Oct 10, 2021

I have managed to mitigate (although not completely solve) this issue. I posted a more detailed explanation on a related StackOverflow link but basically try this:

Install dependency:
python -m pip install pefile

Run (for OPs paths) (NOTE: THIS WILL MODIFY YOUR DLLS [although it will back them up]):
python fixNvPe.py --input C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\*.dll

github-actions bot commented Feb 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

woodrow73 commented Feb 19, 2021 •

@Javierete This solution is working for me - thanks! I noticed the error return for me when free space dipped below 7-8 GB for the application I'm running.

from core.cv2ex import *
from core.cv2ex import *
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\cv2ex.py", line 5, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\cv2ex.py", line 5, in
from core import imagelib from core import imagelib

File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\imagelib_init_.py", line 9, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\imagelib_init_.py", line 9, in
from .morph import morph_by_pointsfrom .morph import morph_by_points

File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\imagelib\morph.py", line 3, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\imagelib\morph.py", line 3, in
from scipy.spatial import Delaunayfrom scipy.spatial import Delaunay

File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy_init_.py", line 156, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy_init_.py", line 156, in
from . import fftfrom . import fft

File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\fft_init_.py", line 76, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\fft_init_.py", line 81, in
from ._basic import (from ._helper import next_fast_len

File "", line 971, in _find_and_load
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\fft_helper.py", line 4, in
File "", line 955, in _find_and_load_unlocked
File "", line 665, in _load_unlocked
File "", line 674, in exec_module
File "", line 764, in get_code
File "", line 833, in get_data
MemoryError
from . import pocketfft
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\fft_pocketfft_init.py", line 3, in
from .basic import *
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\fft_pocketfft\basic.py", line 8, in
from . import pypocketfft as pfft
ImportError: DLL load failed: Файл подкачки слишком мал для завершения операции.
Traceback (most recent call last):
File "", line 1, in
Traceback (most recent call last):
File "multiprocessing\spawn.py", line 105, in spawn_main
File "", line 1, in
File "multiprocessing\spawn.py", line 115, in main
File "multiprocessing\spawn.py", line 105, in spawn_main
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\samplelib_init.py", line 1, in
File "multiprocessing\spawn.py", line 115, in main
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\samplelib_init.py", line 1, in
from .Sample import Sample
from .Sample import Sample File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\samplelib\Sample.py", line 7, in

from core.cv2ex import *
from core.cv2ex import * File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\cv2ex.py", line 5, in

from core import imagelib
from core import imagelib File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\imagelib_init_.py", line 9, in

from .morph import morph_by_points
from .morph import morph_by_points File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\DeepFaceLab\core\imagelib\morph.py", line 3, in

from scipy.spatial import Delaunay
from scipy.spatial import Delaunay File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\spatial_init_.py", line 107, in

from . import distance, transform
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\spatial\distance.py", line 125, in
from . import distance, transform
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\spatial\distance.py", line 125, in
from ..special import rel_entrfrom ..special import rel_entr

File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\special_init_.py", line 637, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\special_init_.py", line 637, in
from . import _basicfrom . import _basic

File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\special_basic.py", line 20, in
File "H:\DeepFaceLab_NVIDIA_RTX2080Ti_and_earlier_internal\python-3.6.8\lib\site-packages\scipy\special_basic.py", line 20, in
from ._comb import _comb_intfrom ._comb import _comb_int

ImportErrorImportError: : DLL load failed: Файл подкачки слишком мал для завершения операции.DLL load failed: The page file is too small to complete the operation.

Is the issue related with CUDA or GPU memory size?

Thanks and Best Regards,
Ardeal

The text was updated successfully, but these errors were encountered:

mondrasovic commented Mar 5, 2021

Well, i managed to resolve this.
open "advanced system setting". Go to the advanced tab then click settings related to performance.
Again click on advanced tab--> change --> unselect 'automatically. '. for all the drives, set 'system managed size'. Restart your pc.

This works, but only temporarily. Nowadays I am facing the problem of encountering a crash after few hours of training. It usually happens at the beginning of the epoch, when it is loading.

Windows 10
NVidia CUDA 11.1
Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] on win32
torch==1.8.0+cu111
torchvision==0.9.0+cu111
numpy==1.19.5

An interesting and at the same time the reproducible crash happened when I loaded the Microsoft Teams application. Even MS Teams reported an exception regarding virtual memory. No other app stopped working. Thus, MS Teams and PyTorch training became "mutually exclusive". After I applied the trick mentioned above, the problem remains only on the PyTorch side, and only sometimes. A lot of ambiguous words, I know, but that's how it is.

PonyPC commented Sep 29, 2021

YOLOv5 has same problem.
@glenn-jocher

Читайте также: