Image dimensions seem to be assigned incorrectly by aicsimageio when running deep learning segmenter

I’ve been trying to figure out this issue for a while and I’ve gotten pretty stumped, so I figured I would ask here to see if anyone can help. I have a feeling the fix isn’t that hard, I’m just not very experienced dealing with image metadata.

The background: I’m trying to train a deep learning model to segment cell membranes in my tissue of interest. I did a trial run of this some time ago using outputs from one of the classic segmentation workflows provided in the Allen Cell Segmenter. These initial segmentations that I used to train the model had some imperfections.

I am now using a different method that I created myself to initially segment the cell membranes with the hope of using this to train a better deep learning model. The new initial segmentation method I am using outputs single channel .tiff files that have ~100 z slices. The shapes of these images are given in the metadata like [1, 126, 856, 856]. The version of aicsimageio I am using is 3.3.1. I am also using version 0.0.8.dev0 of the aics-ml-segmenter (with the fixes implemented here).

When I try to run curator_sorting (with the --d flag), the following shows up in my terminal:

[2020-12-15 12:00:45,478 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: C
[2020-12-15 12:00:45,478 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: Z

…but this doesn’t prevent me from creating excluding masks, and the side-by-side images that pop up in the curator screen look normal. (One thing I am confused about is where “dimension: Q” is coming from.) However, once the excluding masks are created, the following error occurs during training data creation:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 126 but corresponding boolean dimension is 1

If I open up the excluding mask .tiffs in ImageJ and look at the metadata, I can see that the dimensions were guessed incorrectly, probably because the default order for the ome_tiff_writer is STZCXY but mine are arranged with Z and C switched.

It seems like if I could resave my original segmentation images so that C and Z are explicitly defined in the metadata in the OME-TIFF format, I would be able to fix this problem (in the future, I will likely implement this in the original segmentation code, but for now I was hoping to just fix the images I already have). However, it seems like no matter what I do, even changing the original order of dimensions in the numpy array corresponding to the image, when I go to save the image the values for C and Z are still switched.

How can I go about explicitly defining the values for C and Z in my images so that they get saved correctly in the image metadata?

Hello @lynn, I would be happy to help. We are in a transition to adopt the new aicsimageio. There could be some version mis-match. In order to help me understand the issues, can you post the version of aicsmlsegment and aicsimageio in your environment? You may see this by running pip show aicsimageio in your command line.

Thanks,
Jianxu

Hi Jianxu, thanks for the response. I did as you suggested and here are the versions I am running:

aicsimageio Version: 3.3.1
aicsmlsegment Version: 0.0.8.dev0

I should add that I updated to the more recent version of aicsmlsegment after originally having a similar problem with the older version I was running. Upon creating a new environment and downloading the new version, it seemed to resolve the problem temporarily, but then I started having other issues. However I still have the old environment with the old installation if that helps for troubleshooting.

Hi @lynn,

I just did some stress test on the compatibility with the new aicsimageio. I find that in certain situtation, the images may not be loaded into the correct shape. To solve this problem, I just updated the image loading part to be more “defensive” to make sure the loading is done properly.

You can just pull the new development HEAD from the repo, which now should work properly with the new aicsimageio.

Please note that our repo has a new home: https://github.com/AllenCell/aics-ml-segmentation

Any new changes or releases will be available via the new repo.

In your specific case, you can simply switch the remote of your local repo.

git remote -v

You will see that your current remote is set as our old repo: git@github.com:AllenInstitute/aics-ml-segmentation.git.

Now, you can remove this remote

git remote remove origin

Then, you can set your remote as the new repo:

git remote add origin git@github.com:AllenCell/aics-ml-segmentation.git

Now, if you check your remote git remote -v, you will see the new repo. Then, git pull will fetch the new fix to your local machine. If you installed the package following the instruction, naming with pip install -e ., you don’t need to update your conda environment. All the changes should take effect automatically.

Please let me know if you run into more issues. I would be more than happy to help.

Thanks,
Jianxu

Hi Jianxu, thanks for your help. I did not know of the new repository, that is good to know. I updated my local installation using git pull as you advised and attempted to run curator_sorting again from terminal on the same files. I do still see this output, I am not sure if it is problematic or not:

[2020-12-16 14:37:23,426 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: C
[2020-12-16 14:37:23,426 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: Z

A little further down I also see this:

mid_frame = round(histogram_otsu(z_profile)*bw.shape[0]).astype(int)
AttributeError: 'int' object has no attribute 'astype'

which might be a bug? But I am not sure. Let me know if there is something I can change to fix this or if you need more information.

I also noticed that the installation instructions now indicate using python version 3.7, but my conda environment still uses python version 3.6.12. Is this something I should update as well, or is the package still compatible with python 3.6?

I appreciate your help with this. I probably could have avoided these problems if I saved my files in a different way to begin with, but live and learn I suppose!

Hi @lynn,

You may ignore the warning message about aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: Z. That usually does not indicate any bug.

Based on the other error message AttributeError: 'int' object has no attribute 'astype', I think there may be something wrong with the image. (python 3.6 should be fine)

However, I need a little more info to understand what is going wrong. To help you debug this easily, I just add some extra debug messages in the code to print out necessary info for easy debugging.

All you need to do is

(1) go to your local repo, and do git pull to get the new changes I just made for better debugging
(2) run your command again.
(3) If you see the error again, then you can copy and paste what has been printed out in your command line (including everything printed out, not only the line about error). Then, I may be able to tell what is the problem.

Thanks,
Jianxu

OK, I used git pull again to update my installation and ran the command again.

I get a bit further this time - the window to select whether images are good or bad appears. If I select the image as “bad,” I do not get an error and can move onto inspecting the next image. Likewise, if I select the image as “good” but answer “no” to if it needs a mask, it proceeds without issue. However, if I select that the image is “good” and answer “yes” to the question of whether it needs a mask, the error occurs.

Here is the entire output to the terminal so you can see what is happening. (Note: Normalization --19 refers to a normalization recipe that I added to utils.py)

[2020-12-16 16:44:34,561 - root - 221][DEBUG] --------------------------------------------------------------------------------
[2020-12-16 16:44:34,561 - root - 256][DEBUG] Working Dir:
[2020-12-16 16:44:34,561 - root - 257][DEBUG] 	/home/maddy/projects_2/aics-ml-segmentation
[2020-12-16 16:44:34,561 - root - 258][DEBUG] Command Line:
[2020-12-16 16:44:34,561 - root - 259][DEBUG] 	/home/maddy/miniconda3/envs/mlsegmenter-ver2/bin/curator_sorting --d --raw_path /home/maddy/projects/claudin_gfp_5dpf_airy_live/stack_aligned --input_channel 0 --data_type .tiff --seg_path /home/maddy/projects/claudin_gfp_5dpf_airy_live/napari_seg_results/ --train_path /home/maddy/projects/claudin_gfp_5dpf_airy_live/training_data_2/ --csv_name /home/maddy/projects/claudin_gfp_5dpf_airy_live/curator_sorting_tracker_2.csv --mask_path /home/maddy/projects/claudin_gfp_5dpf_airy_live/curator_sorting_excluding_mask_3 --Normalization 19
[2020-12-16 16:44:34,561 - root - 260][DEBUG] Args:
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	debug: True
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	output_dir: ./
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	struct_ch: 0
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	xy: 0.108
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	raw_path: /home/maddy/projects/claudin_gfp_5dpf_airy_live/stack_aligned
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	data_type: .tiff
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	input_channel: 0
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	seg_path: /home/maddy/projects/claudin_gfp_5dpf_airy_live/napari_seg_results/
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	train_path: /home/maddy/projects/claudin_gfp_5dpf_airy_live/training_data_2/
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	mask_path: /home/maddy/projects/claudin_gfp_5dpf_airy_live/curator_sorting_excluding_mask_3
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	csv_name: /home/maddy/projects/claudin_gfp_5dpf_airy_live/curator_sorting_tracker_2.csv
[2020-12-16 16:44:34,561 - root - 262][DEBUG] 	Normalization: 19
[2020-12-16 16:44:34,561 - root - 223][DEBUG] --------------------------------------------------------------------------------
the csv file for saving sorting results exists, sorting will be resumed
[2020-12-16 16:44:35,436 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: C
[2020-12-16 16:44:35,436 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: Z
trying to find the best Z to display ...
the raw image has z profile [    0    11    62   514  1577  3185  5074  6455  8019 10831 13315 15802
 18703 21091 22552 24454 27155 28995 30495 31462 31166 30948 32660 34153
 34399 34197 34119 35568 37755 38384 38921 38442 37675 38391 39071 38115
 36318 35444 35253 36672 37959 38066 37227 36992 38059 39649 38865 37145
 36551 36612 37224 37554 38149 38064 38397 39503 39689 39222 39060 39191
 38140 37585 39439 40587 39917 38937 39292 40353 41309 42797 45276 47290
 48651 50638 51627 48823 45945 46950 48950 48655 46892 44504 42418 41818
 41822 42001 42393 44245 46528 46761 45067 43402 42604 42250 41819 41420
 41406 39917 40163 40822 38289 36418 35271 33579 33717 34895 35304 36357
 36677 33963 29930 26981 25044 24980 25323 24714 23394 21810 20641 18145
 14038  8019  2811   560   137    67]
find best Z = 63.504
You selected this image as BAD
/home/maddy/miniconda3/envs/mlsegmenter-ver2/lib/python3.6/site-packages/pandas/core/indexing.py:670: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)
[2020-12-16 16:44:39,751 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: C
[2020-12-16 16:44:39,751 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: Z
trying to find the best Z to display ...
the raw image has z profile [    0     0     0     0     0     2    30   259   942  1954  3238  5188
  6941  8669 10857 12429 13611 15373 16682 17180 18709 20205 19887 19968
 20780 21298 21918 22181 23529 25173 25672 26406 27819 28567 28224 27900
 28490 29388 29386 28985 28620 28655 29211 30341 31517 32289 31938 31280
 32746 34987 35407 35161 34304 33008 32799 32932 31507 29372 28856 29444
 29407 28793 28802 29651 31016 31950 32374 31908 31616 32602 33807 35369
 36521 36774 36753 36014 35033 35197 35138 34355 34466 36379 39278 39882
 39261 38404 36001 36031 37029 35830 33829 32631 32134 31409 31880 32874
 34565 34796 34350 36095 36935 36937 36940 36855 37135 35966 32050 27865
 24560 20608 15757  9878  3710   398    29     2     0]
find best Z = 63.543103448275865
You selected this image as GOOD
Do you need to add a mask for this image, enter y or n:  n
[2020-12-16 16:44:47,078 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: C
[2020-12-16 16:44:47,078 - aicsimageio.readers.tiff_reader - 236][INFO] Unsure how to handle dimension: Q. Replaced with guess: Z
trying to find the best Z to display ...
the raw image has z profile [    0     0     0     0    62   272   545  1206  2474  4790  8592 13518
 18155 20516 22032 23119 24237 26382 28789 30923 31444 30970 30479 30879
 31410 32652 34121 33145 30905 30402 32354 35599 36286 33335 32247 32679
 31750 30264 29386 29261 28839 28185 27811 27776 28323 28907 28867 29001
 30059 30963 31708 32411 33153 33736 32780 31182 30912 32525 34439 34342
 32957 32129 31716 31669 32379 32540 32466 32686 33523 33886 33439 34105
 34696 34169 35356 36407 36056 34314 32738 32134 31603 31856 32482 32242
 29851 28020 28625 29362 29133 28565 27984 27524 26603 24954 23412 21691
 20389 20215 19677 19136 18723 18776 18599 17389 16946 16589 15209 14764
 14311 13260 12555 12302 12214 12176 11096  8545  5613  3368  1906  1204
  1200  1201   944   794   733]
find best Z = 59.4758064516129
You selected this image as GOOD
Do you need to add a mask for this image, enter y or n:  y
[2020-12-16 16:44:54,578 - root - 388][ERROR] =============================================
[2020-12-16 16:44:54,578 - root - 390][ERROR] 

Traceback (most recent call last):
  File "/home/maddy/projects_2/aics-ml-segmentation/aicsmlsegment/bin/curator/curator_sorting.py", line 385, in main
    exe.execute(args)
  File "/home/maddy/projects_2/aics-ml-segmentation/aicsmlsegment/bin/curator/curator_sorting.py", line 314, in execute
    create_mask(raw_img, seg.astype(np.uint8))
  File "/home/maddy/projects_2/aics-ml-segmentation/aicsmlsegment/bin/curator/curator_sorting.py", line 163, in create_mask
    mid_frame = round(histogram_otsu(z_profile)*bw.shape[0]).astype(int)
AttributeError: 'int' object has no attribute 'astype'

[2020-12-16 16:44:54,578 - root - 391][ERROR] =============================================
[2020-12-16 16:44:54,578 - root - 392][ERROR] 

'int' object has no attribute 'astype'

[2020-12-16 16:44:54,578 - root - 393][ERROR] =============================================

Ahaa… now I think I know what is happening here. I just made one more change. If you do git pull one more time. The problem should go away.

For some reason, .astype() is not valid in certain environment. int() is a more stable way to do this job. I will try to figure out how this happens. But, for now, the new changes should allow you to move forward.

Thanks,
Jianxu

I see, how interesting! I just tried it again and it seems to be working now, at least for drawing the masks. I’ll keep going through the process this week. Hopefully everything will go smoothly with training a new model, but I will post again if something happens.

Thanks for helping me so quickly with this issue!

Hi Jianxu,

I am running the model trainer today and I wanted to let you know of another potential issue. When first I ran the trainer, I got this error:

File "/home/maddy/projects_2/aics-ml-segmentation/aicsmlsegment/DataLoader3D/Universal_Loader.py", line 91, in __init__
label[ci,zz,:,:] = int(new_labi)
TypeError: only size-1 arrays can be converted to Python scalars

Taking a quick look at the code, I think this may be happening because new_labi is a numpy array rather than a scalar. It looks to me like you just changed this line in Universal_Loader.py recently, so I tried changing back to label[ci,zz,:,:] = new_labi.astype(int). The error no longer occurs and the model begins training like I am used to. But since I am not as familiar with the software as you (or very experienced with programming), I don’t know if this would break anything else, so I thought I would let you know.

Thanks @lynn, You are correct. That is an error I mistakenly introduced in my last commit, which tried to solve the data cast issue. I also got a bug report from one of our engineers about the same issue today. I will include this bug-fix in the next version.

Please let us know if you have any more quesitons. I would also be interested to learn more about your training results and how well it works. If the results are not as you would expect, I would be happy to look into further improvement that we can make.

Best,
Jianxu

Hi Jianxu,

To update you on the results, the newer model using better annotated training data seems significantly improved compared to the first iteration, at least qualitatively. When I use the new model’s boundary predictions to perform seeded watershed, I get better results than before and fewer manual annotations are required. This is true both for the original training data and for a separate (but similar) dataset from a different experiment.

I do notice that the model predictions are not very “confident” (sorry if this is the wrong word, I hope you understand what I mean). If I were going to set a threshold, I would have to set it pretty low to capture all of the cell membranes without fairly large holes. To get around this, I usually just run the watershed on the model predictions without setting a cutoff value. I saw a couple of papers that did this and it seems to work for what I’m doing, but I wouldn’t be surprised if there is a better method.

I do wonder if there are other ways to improve the predictions. Maybe more training data could help? I’m using a pretty small dataset right now. Even though I’ve often heard deep learning requires a ton of data, I thought I would try it anyway and it still seems to work better than other methods I’ve tried for my problem. Maybe it could also help to tweak some other parameters, but I don’t think I have the expertise to do that efficiently.

I’d be interested to hear your thoughts if you have time. If you would like to know more details about the images I’m collecting and my plans for the segmented images, I would prefer to discuss those specifics via email. But I also don’t want to take up too much of your time, so I’m happy to talk about things more broadly here if that works better for you.

Hi @lynn, glad to know you are getting better results. To improve the performance, there are certainly a few ways to do so. In general, we want to achieve both easy-to-use and flexible-to-customize, when we design our Segmenter. In your case, it seems like it is already working okay in general, but may need further customization to achieve more accurate results.

To deal with certain areas not performaning well, a common solution is called “hard example mining”. If you look at your training data (those files automatically generated after curation), there should be some images like “_CM.ome.tif”. Here, “CM” is “cost-map”. If you open one of them, you will see it is almost all pixels with value 1, except the areas annotated by exclusion mask, if any.

Now, you may adjust your training data a little bit to fine-tuning your model (i.e., starting from your current model). Specifically, (1) applying your model on all training data, (2) apply a reasonable cutoff, not too low, (3) automatically do a logical_xor between the result and the ground truth (i.e., images with name “_GT.ome.tif”), (4) dilate the logical_xor result a little bit (e.g., using ball(3) as structuring element), (5) on the current costmap (i.e., “_CM.ome.tif”), for pixels being part of the dilation results, update the value to a larger number (e.g., 2). (6) saving the costmap.

This is just one way to make a special costmap, basically giving higher cost on the areas the current model is not working well. You may use other operations, e.g., by taking advantage of your watershed results, to make better costmap. (BTW, your watershed trick is very good. We used similar things in our cell segmentation model, you may check out our Segmenter paper on BioRxiv, which was just updated a week ago to include the new cell and nuclear segmentation models.)

Note: currently, the input image, ground truth, costmap have to be saved following specific naming convention. To easy update your costmap, I would suggest (1) suppose your current traing data are saved in folder “train_v1”, then make a copy of this folder and save as “train_v2”, (2) load the “_CM.ome.tif” in “train_v2” and do the operation above, then simply overwrite the “_CM.ome.tif” files in “train_v2”.

Another common way to improve the performance is to use a better model. For this, I may need to know more about your data to make a better suggestion. If you need help, feel free to email me at

jianxuc@alleninstitute.org

Thanks,
Jianxu