[OpenSPIM] asynchronous writing of stacks bug?

Wed Aug 21 10:36:15 CDT 2013

Hi Alexis,

> some more tests and hopefully some progresses. I scanned a 82Mb stack (1024*1024*41 @ 16bits)
> 
> - RAM usage by Fiji and stack writing to disk: 
> ~20Mb used at start (out of 1Gb allocated) 
> as the scan progresses in ramps up to ~110Mb and then slowly decreases as the size of the file on disk increases; finally the garbage collector resets to 20Mb
> it takes up to 2 minutes to write the 84Mb and at least 1 minute more to write the remaining 2Mb (the metadata, I presume...)
Okay; although the writing times are not at all ideal (my 400MB stacks were written synchronously and in their entirety in a little over a minute and a half), the RAM usage is more or less as expected.

> If you open that stack & metadata, you will see that the planes are not written in the right order. Look at the sequence (toward the end of the file above):
> 
> <Plane DeltaT="18.37296911600015" PositionX="3205.0" PositionY="2628.0" PositionZ="3085.0" TheT="0" TheZ="36">
> <Plane DeltaT="12.293132814000273" PositionX="3205.0" PositionY="2628.0" PositionZ="3025.0" TheT="0" TheZ="37">
> <Plane DeltaT="18.884906363000027" PositionX="3205.0" PositionY="2628.0" PositionZ="3090.0" TheT="0" TheZ="38">
> The plane with the index TheZ=37 (physical position in Z 3025 and the timestamp 12.29) is intercalated between two obviously consecutive slices (TheZ=36 and TheZ=38). This is obviously wrong…

I have some mixed feelings on this; this is the definitive proof I was looking for, but I'm genuinely at a loss as to how this may have come about. My best guess is that the synchronization is too loose, and somehow the acquisition ("main") thread is writing slices (because it feels it needs to) before the writing thread.

> I have the feeling that my problems are somehow linked to Bio-Format	not behaving properly. 

I've tried the new Bio-Formats daily builds update site through the Fiji updater, and haven't noticed any problems from it; perhaps try enabling that update site and downloading the very latest versions?

Finally, I have most of the promised plugin updates. There are two main additions:
- First, it does some timing over the acquisition, outputting the results to the log; although it isn't very detailed, it gives a breakdown of how time is used while acquiring (for example, if the async isn't appropriately asynchronous, the acquisition time will be roughly the same as though it were off).
- Second, and more importantly for you, the async writer has been completely revamped. It now queues the begin/end stack events as well as new slices, so these operations shouldn't *ever* get out of order. It also puts up a monitor showing the state of its queue, which will let us know if it's filling up or not.
If you're willing to play guinea pig again, please try replacing mmplugins/SPIMAcquisition.jar with the version from: https://dl.dropboxusercontent.com/u/57890359/SPIMAcquisition.jar
Anyone is free to try this JAR out, but I haven't put it on our Fiji update site for a few reasons -- the new features are only conditionally useful, and can't (yet) be disabled. That said, if the new async fixes your output problems, I'll polish these changes and incorporate them.

Thanks,
Luke

On 08/21/13, Alexis Maizel  wrote:
> Hi Luke,
> 
> some more tests and hopefully some progresses. I scanned a 82Mb stack (1024*1024*41 @ 16bits)
> 
> - RAM usage by Fiji and stack writing to disk: 
> ~20Mb used at start (out of 1Gb allocated) 
> as the scan progresses in ramps up to ~110Mb and then slowly decreases as the size of the file on disk increases; finally the garbage collector resets to 20Mb
> it takes up to 2 minutes to write the 84Mb and at least 1 minute more to write the remaining 2Mb (the metadata, I presume...)
> - Metadata: 
> I have updated Fiji
> I have tried to update LOCI to the trunk/daily/stable versions but after downloading some files I get an error message ("there was an error updating LOCI")
> It takes a very long time to write the metadata, much longer than the images. You can retrieve one stack with metadata here: http://dl.dropbox.com/u/484859/spim_TL01_Angle0.ome.tiff
> If you open that stack & metadata, you will see that the planes are not written in the right order. Look at the sequence (toward the end of the file above):
> 
> <Plane DeltaT="18.37296911600015" PositionX="3205.0" PositionY="2628.0" PositionZ="3085.0" TheT="0" TheZ="36">
> <Plane DeltaT="12.293132814000273" PositionX="3205.0" PositionY="2628.0" PositionZ="3025.0" TheT="0" TheZ="37">
> <Plane DeltaT="18.884906363000027" PositionX="3205.0" PositionY="2628.0" PositionZ="3090.0" TheT="0" TheZ="38">
> The plane with the index TheZ=37 (physical position in Z 3025 and the timestamp 12.29) is intercalated between two obviously consecutive slices (TheZ=36 and TheZ=38). This is obviously wrong…
> 
> I have the feeling that my problems are somehow linked to Bio-Format	not behaving properly. 
> 
> With my best regards,
> 
> Alexis
> 
> 
> On 20 Aug 2013, at 20:24, Luke Stuyvenberg <stuyvenberg at wisc.edu> wrote:
> 
> > Hi Alexis,
> > 
> >> I scaled them down to 8bits and saved them on my Mac. You can get an original "shuffled" stack here: http://dl.dropbox.com/u/484859/spim_TL01_Angle0.ome.tiffUnfortunately this stack, too, seems to have no metadata attached to it. If the plugin is outputting this kind of data, I may have a serious problem. :-)
> > 
> > 
> > Are you by any chance clicking the Abort! button at the end of an acquisition? Although acquiring the data may finish, the text will keep reading Abort! until the writing is finished; with large data sets, it may seem the process is done and the button has just gotten stuck, when in fact the OME-TIFF writer is updating the metadata in each of its images (I explain this in more detail below.). Clicking Abort! at such a time would stop the metadata from being written.
> > 
> > 
> >> Definitively the writing to disk finishes way after the stack has finished to be acquired. 
> > Upon review, this makes more sense than it did. The basic OME-TIFF specification requires that each file in the dataset have the complete metadata. Unfortunately, there's no way to know what the complete metadata is until all the files have been written. So as the output handler writes each file, it builds up the metadata, until the end of the acquisition. At this point, it reopens each file in the dataset and rewrites the comment to have the correct metadata (this is the long delay I mentioned above -- the more files you write, the longer it takes to put the finished metadata in place). The question, then, is whether or not the last *slice* of each is actually being held back -- I suspect not; the part of the code that appends a slice to the file is unambiguous in its function, while the metadata can easily reach such a size that writing it seems much like writing a new slice. In essence, there's not much we can presently do about this particular quirk, though now that Bio-Formats is outputting daily builds, there may be other options soon.
> > 
> > 
> >>> If so, the next time you're running an acquisition, could you try monitoring Fiji's memory usage (Plugins > Utilities > Memory Monitor) as the sequence progresses? I suspect that the bursts you are observing happen because the queue is filling up, forcing the output handler to synchronously write out one or more slices; you should see the memory use climb in a stair-step pattern until one of these bursts, when it should plummet back to nearly nothing.
> > 
> > Actually, thinking on it, the plummeting most likely will not coincide with the writes -- more likely, the images will stick around in memory until the garbage collector decides to run. So the aforementioned pattern should appear in some form or other, though the timing may not tell us very much.
> > 
> > 
> > A few other thoughts:
> > - Do any other programs make regular use of your hard drive? Perhaps Windows 7's automatic (background) defragmentation, or a filesystem-filter antivirus (like avast!, which scans files as they are read and written)? The writing thread is given a low priority so the acquisition can run freely, but this might cause it to be blocked outright by low-level processes making regular use of the hard drive.
> > - Is your RAM mostly free when you start the acquisition? The maximum length of the image queue is decided based on the available memory at the beginning of an acquisition; I don't yet have a display for it, but it might be being set too low...
> > 
> > 
> > I'm working on a slight rework of the async handler which might help to fix these issues; I'll upload it to our update server once it's done. Of course, it's hard to gauge its efficacy until I can reproduce this behavior, so I'll see what I can do on that front as well.
> > 
> > 
> > Thanks again,
> > Luke
> > 
> > 
> > On 08/20/13, Alexis Maizel wrote:
> >> Hi Luke,
> >> 
> >>> Did you re-save this data, or are these stacks fresh from the OpenSPIM plugin?
> >> 
> >> I scaled them down to 8bits and saved them on my Mac. You can get an original "shuffled" stack here: http://dl.dropbox.com/u/484859/spim_TL01_Angle0.ome.tiff
> >> 
> >>> I ask because the OME-TIFF metadata have been clobbered in both stacks (you can check using the Bio-Formats importer); if this is fresh output, the plugin may have a serious error in its metadata generation. Alternatively, Bio-Formats might be out-of-date or acting up. At any rate, the image slices obviously shouldn't be written out of order, but even if they were to be written in the wrong order, were the OME metadata intact, Bio-Formats should be able to display the slices in the correct order (because now it has a table relating the slice's index in the file to its physical Z position).
> >> 
> >> I forced Fiji to open the fresh-from-openSPIM stacks on my mac using the Bio-format importer (updated) and the slices are still in the wrong order.
> >> 
> >>> The data isn't truly meant to be written in bursts -- ideally it would just chug along in the background... I'll try tweaking the async a little more; probably, the code is doing something incorrectly leading to this. As for the timepoint images, this too is very unusual -- I specifically have the writer finish writing any pending images before finalizing a stack. I'll look into this as well.
> >> 
> >> Definitively the writing to disk finishes way after the stack has finished to be acquired. 
> >> 
> >>>> Also, if one abort a time lapse recording in between two time points, but when all planes have not yet been written to disk, then there is a warning windows displayed (which is empty…) after some seconds, the window disappear and the recording has been aborted.
> >>> I'm aware of this; I've never been able to determine what that message box is trying to display. No warnings should be appearing during an abort, but this one mysteriously shows up. It's a low priority for me, however; it doesn't seem to affect the abort or change the data that was written (well, any more than aborting mid-acquisition already does).
> >> 
> >> I concur that it is not critical, I just wanted to point it out. Glad to read that at least some of what I am experiencing can be reproduced :-)
> >> 
> >>> Is everything completely up-to-date, including Bio-Formats? (Note that I haven't yet tried the Bio-Formats daily builds -- whether they will fix or break the plugin has yet to be seen.)
> >> 
> >> I will update everything and perform more tests.
> >> 
> >>> If so, the next time you're running an acquisition, could you try monitoring Fiji's memory usage (Plugins > Utilities > Memory Monitor) as the sequence progresses? I suspect that the bursts you are observing happen because the queue is filling up, forcing the output handler to synchronously write out one or more slices; you should see the memory use climb in a stair-step pattern until one of these bursts, when it should plummet back to nearly nothing.
> >> 
> >> I'll do that and get back to you.
> >> 
> >> Thanks for your help,
> >> 
> >> Alexis
> >> 
> >> 
> >> 
> >>> 
> >>> 
> >>> Thanks,
> >>> Luke
> >>> 
> >>> On 08/14/13, Alexis Maizel wrote:
> >>>> Hi Pavel,
> >>>> 
> >>>> I did some more tests this morning. 
> >>>> The asynchronous writing is indeed the culprit. You can get two stacks here: http://dl.dropbox.com/u/484859/async_ON_vs_OFF.zip
> >>>> 
> >>>> With async OFF: the images are written to the disk as soon as they are acquired; it is slow but the planes are in the right order.
> >>>> 
> >>>> With async ON: the stack is acquired and written 'in bursts' to the disk. Sometimes there is a gap of several seconds between two bursts of disk writing and I have the feeling this corresponds to points when the planes are written in the wrong order. I have also noticed that irrespective of the time delay between two time lapse acquisition, the last image(s) of time point N are written to the disk instant before the time point N+1 starts to be acquired. Also, if one abort a time lapse recording in between two time points, but when all planes have not yet been written to disk, then there is a warning windows displayed (which is empty…) after some seconds, the window disappear and the recording has been aborted. 
> >>>> So I do not know, what's wrong but something does not seem right in this 'asynchronous writing' option, at least on our config:
> >>>> Windows 7 pro
> >>>> 1024Mb of RAM allocated to Fiji (we can not allocate more, otherwise the Orca won't operate)
> >>>> JRE 1.6.0
> >>>> everything 32bits version
> >>>> 
> >>>> Would be great to get that fixed :-)
> >>>> 
> >>>> With my best regards,
> >>>> 
> >>>> Alexis
> >>>> 
> >>>> 
> >>>> On 14 Aug 2013, at 00:30, Pavel Tomancak <tomancak at mpi-cbg.de> wrote:
> >>>> 
> >>>>> Hi Alexis,
> >>>>> 
> >>>>> That is very strange. I have never seen that. We made some acquisitions today and nothing like that was going on. Its obviously a serious issue. Does it happen only when you have the asynchronous writing enabled? I assume you are running on Windows 7.
> >>>>> 
> >>>>> All the best
> >>>>> 
> >>>>> PAvel
> >>>>> 
> >>>>> -----------------------------------------------------------------------------------
> >>>>> Pavel Tomancak, Ph.D.
> >>>>> 
> >>>>> Group Leader
> >>>>> Max Planck Institute of Molecular Cell Biology and Genetics
> >>>>> Pfotenhauerstr. 108
> >>>>> D-01307 Dresden Tel.: +49 351 210 2670
> >>>>> Germany Fax: +49 351 210 2020
> >>>>> tomancak at mpi-cbg.de
> >>>>> http://www.mpi-cbg.de
> >>>>> -----------------------------------------------------------------------------------
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> On Aug 13, 2013, at 8:14 PM, Alexis Maizel <Alexis.Maizel at cos.uni-heidelberg.de> wrote:
> >>>>> 
> >>>>>> Hi,
> >>>>>> 
> >>>>>> I have noticed that when acquiring stacks during a time lapse and writing them to disk, using the 'asynchronous writing' option, the order in which the individual images are laid into the stack is imprecise. What I mean is that an image obviously in the middle of the stack is shifted toward the end. I did not observed a fixed pattern, except that usually the first 15-20 planes are in the right order and the mess is a the end.
> >>>>>> 
> >>>>>> I have carefully observed and the problem does not come from the stage 'going back and forth' during acquisition. It is upon writing to the disk that the problem seems to occur. Also I have noticed that it takes quite a long time (up to 3 minutes) to write to disk a ~400Mb stack. 
> >>>>>> 
> >>>>>> You can see more precisely what I am talking about by looking at two representative stacks: http://dl.dropbox.com/u/484859/Stacks.zip
> >>>>>> 
> >>>>>> With my best regards,
> >>>>>> 
> >>>>>> Alexis
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> OpenSPIM mailing list
> >>>>>> OpenSPIM at openspim.org
> >>>>>> http://openspim.org/mailman/listinfo/openspim
> >>>>>