[yam] multi-core problem

Tom Eulenfeld tom.eulenfeld at uni-jena.de
Tue Mar 6 16:49:27 CET 2018


Hi Weijun,

I am also writing to the mailing list. Maybe others face similar 
problems in the future.

Yes, the output is not very helpful.

I've seen that you run Python 3.6.1 and I found this bug which might be 
related:
https://bugs.python.org/issue28699

Can you try to upgrade your Python installation? I suggest to use 
Anaconda. This probably will not fix the failure, but it might resolve 
the dead lock and give a more meaningful error message.

Cheers!
Tom



On 06.03.2018 15:07, Weijun Wang wrote:
> Hi, Tom,
> 
> I am not sure which line I should send to you, so copy all the outputs to you.  Sorry it looks like still no useful information.
> 
> Thanks,
> 
> Weijun.
> 
> __________________
> 
> (obspy) [wwj at t570 yam_test]$ yam-runtests -v

...

>> yam correlate 1 -vvv
> CLI tests passed:  35%|██████████████████████████████████████████████████████▊                                                                                                     | 26/74 [00:20<00:19,  2.43it/s]
> ***CTRL+C here***
> 
> CLI tests passed:  36%|████████████████████████████████████████████████████████▉                                                                                                   | 27/74 [03:48<17:41, 22.58s/it]Traceback (most recent call last):
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/pool.py", line 684, in next
>      item = self._items.popleft()
> IndexError: pop from an empty deque
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>    File "/home/wwj/anaconda3/envs/obspy/bin/yam-runtests", line 11, in <module>
>      load_entry_point('yam', 'console_scripts', 'yam-runtests')()
>    File "/home/wwj/old/gits/obspy/yam/yam/tests/__init__.py", line 27, in run
>      ret = not runner.run(suite).wasSuccessful()
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/runner.py", line 176, in run
>      test(result)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 84, in __call__
>      return self.run(*args, **kwds)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 122, in run
>      test(result)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 84, in __call__
>      return self.run(*args, **kwds)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 122, in run
>      test(result)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 84, in __call__
>      return self.run(*args, **kwds)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 122, in run
>      test(result)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/case.py", line 649, in __call__
>      return self.run(*args, **kwds)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/case.py", line 601, in run
>      testMethod()
>    File "/home/wwj/old/gits/obspy/yam/yam/tests/test_main.py", line 168, in test_cli
>      self.out('correlate 1')  # takes long
>    File "/home/wwj/old/gits/obspy/yam/yam/tests/test_main.py", line 82, in out
>      self.script(cmd.split())
>    File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 388, in run_cmdline
>      run(**args)
>    File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 147, in run
>      run2(command, **args)
>    File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 211, in run2
>      yam.commands.start_correlate(io, **args)
>    File "/home/wwj/old/gits/obspy/yam/yam/commands.py", line 167, in start_correlate
>      total=len(tasks)):
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/tqdm/_tqdm.py", line 959, in __iter__
>      for obj in iterable:
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/pool.py", line 688, in next
>      self._cond.wait(timeout)
>    File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/threading.py", line 295, in wait
>      waiter.acquire()
> KeyboardInterrupt
> 
> 
> 
>> -----原始邮件-----
>> 发件人: "Tom Eulenfeld" <tom.eulenfeld at uni-jena.de>
>> 发送时间: 2018-03-06 21:00:48 (星期二)
>> 收件人: "Weijun Wang" <wjwang at cea-ies.ac.cn>
>> 抄送:
>> 主题: Re: [yam] multi-core problem
>>
>> Hi Weijun,
>>
>> good to hear that it is at least working for a single core.
>>
>> Unfortunately, I cannot reproduce your error. I think the child process
>> is dying somehow. Can you please post the last view lines of
>> yam-runtests -v
>>
>> I think I need to add more debug statements in the code to find the bug.
>>
>> Cheers!
>> Tom
>>
>>
>> On 06.03.2018 11:58, Weijun Wang wrote:
>>>
>>> Hi, Tom,
>>>
>>> Sorry I got your name wrong at my first email.
>>>
>>> the enviroments I run are:
>>>
>>> OS: CentOS Linux release 7.4.1708 (Core)
>>> Python:  3.6.1
>>> obspy:                     1.1.0                    py36_1    conda-forge
>>> obspyh5:                   0.3.2                     <pip>
>>> yam:  0.3.1-dev
>>>
>>>
>>> yes,the error messages I posted before were come from running the demo notebooks( notebooks yam_velocity_variations_patcx ) .
>>> yam-runtests got stuck at somewhere, such as:
>>> -----------------------------------
>>> (obspy) [wwj at t570 yam_test]$ yam-runtests
>>> CLI tests passed:  32%|██████████████████████████████████████████████████▌                                                                                                         | 24/74 [00:17<00:38,  1.30it/s]
>>> -----------------------------------
>>> and will never continue, when I ctrl+c, will get:
>>> -------------------------------------
>>> Traceback (most recent call last):
>>>     File "/home/wwj/anaconda3/envs/obspy/bin/yam-runtests", line 11, in <module>
>>>       load_entry_point('yam', 'console_scripts', 'yam-runtests')()
>>>     File "/home/wwj/old/gits/obspy/yam/yam/tests/__init__.py", line 27, in run
>>>       ret = not runner.run(suite).wasSuccessful()
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/runner.py", line 176, in run
>>>       test(result)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 84, in __call__
>>>       return self.run(*args, **kwds)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 122, in run
>>>       test(result)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 84, in __call__
>>>       return self.run(*args, **kwds)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 122, in run
>>>       test(result)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 84, in __call__
>>>       return self.run(*args, **kwds)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/suite.py", line 122, in run
>>>       test(result)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/case.py", line 649, in __call__
>>>       return self.run(*args, **kwds)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/unittest/case.py", line 601, in run
>>>       testMethod()
>>>     File "/home/wwj/old/gits/obspy/yam/yam/tests/test_main.py", line 168, in test_cli
>>>       self.out('correlate 1')  # takes long
>>>     File "/home/wwj/old/gits/obspy/yam/yam/tests/test_main.py", line 82, in out
>>>       self.script(cmd.split())
>>>     File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 388, in run_cmdline
>>>       run(**args)
>>>     File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 147, in run
>>>       run2(command, **args)
>>>     File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 211, in run2
>>>       yam.commands.start_correlate(io, **args)
>>>     File "/home/wwj/old/gits/obspy/yam/yam/commands.py", line 167, in start_correlate
>>>       total=len(tasks)):
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/tqdm/_tqdm.py", line 959, in __iter__
>>>       for obj in iterable:
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/pool.py", line 688, in next
>>>       self._cond.wait(timeout)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/threading.py", line 295, in wait
>>>       waiter.acquire()
>>> KeyboardInterrupt
>>> ^CError in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/util.py", line 254, in _run_finalizers
>>>       finalizer()
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/util.py", line 186, in __call__
>>>       res = self._callback(*self._args, **self._kwargs)
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/pool.py", line 535, in _terminate_pool
>>>       cls._help_stuff_finish(inqueue, task_handler, len(pool))
>>>     File "/home/wwj/anaconda3/envs/obspy/lib/python3.6/multiprocessing/pool.py", line 520, in _help_stuff_finish
>>>       inqueue._rlock.acquire()
>>> KeyboardInterruptij
>>> ----------------------------------
>>>
>>> thanks,
>>>
>>> Weijun.
>>>
>>>
>>>
>>>> -----原始邮件-----
>>>> 发件人: "Tom Eulenfeld" <tom.eulenfeld at uni-jena.de>
>>>> 发送时间: 2018-03-06 18:14:51 (星期二)
>>>> 收件人: seistools at listserv.uni-jena.de
>>>> 抄送: wjwang at cea-ies.ac.cn
>>>> 主题: Re: [yam] multi-core problem
>>>>
>>>> Hello Weijun,
>>>>
>>>> sorry, your mail got somehow lost by the Mailman instance. I attach it
>>>> below.
>>>>
>>>> Regarding your problem:
>>>>
>>>> 1. Did you run yam-runtests? Does it show the same error? Which
>>>> operating system are you using?
>>>> 2. Is your installation up to date? Check yam --version. The latest
>>>> version is 0.3.0.
>>>> 3. If you are already on the latest version. Can you try out the
>>>> development version of yam? You can install dev with
>>>>
>>>> pip install https://github.com/trichter/yam/archive/master.zip
>>>>
>>>> Recently, I reworked how things are written to the HDF5 file. In version
>>>> 0.3.0 and prior versions an extra process was spanned just for writing
>>>> into HDF5 files to circumvent the concurrent writing problem. In the dev
>>>> version writing is done from the main process which is simpler and less
>>>> error prone.
>>>>
>>>> Best,
>>>> Tom
>>>>
>>>>
>>>>
>>>> -------- Forwarded Message --------
>>>>
>>>> Hello, Yawar,
>>>> When I run yam with multi-core, errors frequently appear as a example
>>>> following.  It should be the problem about concurrent writting to hdf5
>>>> file in commands.py.  I am not familar with hdf5, so don't know whether
>>>> the website( http://docs.h5py.org/en/latest/swmr.html) and
>>>> "Multiprocess concurrent write and read" segment  can help.
>>>> Thanks,
>>>>
>>>> -----------------------------------------
>>>>
>>>> $ yam correlate 1b
>>>>
>>>> --------------------error message--------------------------------
>>>>     20%|████████▌                                 | 75/366 [02:52<11:08,
>>>> 2.30s/it]Traceback (most recent call last):
>>>>      File
>>>> "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/h5py/_hl/files.py",
>>>> line 111, in make_fid
>>>>        fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
>>>>      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
>>>>      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
>>>>      File "h5py/h5f.pyx", line 78, in h5py.h5f.open
>>>> OSError: Unable to open file (unable to lock file, errno = 11, error
>>>> message = 'Resource temporarily unavailable')
>>>>
>>>> During handling of the above exception, another exception occurred:
>>>>
>>>> Traceback (most recent call last):
>>>>      File "/home/wwj/anaconda3/envs/obspy/bin/yam", line 11, in <module>
>>>>        load_entry_point('yam', 'console_scripts', 'yam')()
>>>>      File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 388, in run_cmdline
>>>>        run(**args)
>>>>      File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 147, in run
>>>>        run2(command, **args)
>>>>      File "/home/wwj/old/gits/obspy/yam/yam/main.py", line 211, in run2
>>>>        yam.commands.start_correlate(io, **args)
>>>>      File "/home/wwj/old/gits/obspy/yam/yam/commands.py", line 168, in
>>>> start_correlate
>>>>        _write_stream(result)
>>>>      File "/home/wwj/old/gits/obspy/yam/yam/commands.py", line 156, in
>>>> _write_stream
>>>>        result[key].write(io[key], 'H5', mode='a')
>>>>      File
>>>> "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/obspy/core/stream.py",
>>>> line 1443, in write
>>>>        write_format(self, filename, **kwargs)
>>>>      File
>>>> "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/obspyh5.py",
>>>> line 186, in writeh5
>>>>        with h5py.File(fname, mode, libver='latest') as f:
>>>>      File
>>>> "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/h5py/_hl/files.py",
>>>> line 269, in __init__
>>>>        fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
>>>>      File
>>>> "/home/wwj/anaconda3/envs/obspy/lib/python3.6/site-packages/h5py/_hl/files.py",
>>>> line 113, in make_fid
>>>>        fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
>>>>      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
>>>>      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
>>>>      File "h5py/h5f.pyx", line 98, in h5py.h5f.create
>>>> OSError: Unable to create file (unable to open file: name = 'corr.h5',
>>>> errno = 17, error message = 'File exists', flags = 15, o_flags = c2)
>>>>
>>>>
>>>> --
>>>> Weijun Wang
>>>>
>>>> Institute of Earthquake Forecasting, China Earthquake Administration
>>>> Beijing, China

-- 

Dr. Tom Eulenfeld
Institute for Geosciences
Friedrich-Schiller-University Jena


More information about the seistools mailing list