Hi! I am exploring the data in Drug perturbation pilot study - ALLEN CELL EXPLORER for drug function study. However, the metadata provided in Download cell data: images, genomics, features - ALLEN CELL EXPLORER is not perfectly matched with the raw data.
More specificly, there is only 395 non-duplicate rows in metadata but 867 .czi files in raw data.
Could you kindly provide/upload a full version of metadata? Thank you!
Hello, sorry for the issue. The same metadata should be made available via bff. Can you provide an example to help us understand the problem and better assist you. Thank you for your interest in this dataset.
Thanks for your reply!
I downloaded the metadata and the raw data from the following links:
Major inconsistencies between the metadata, raw data and information shown on the website include:
- File count mismatch: There is 867 .czi image files after unpacking all the tar.gz archives in the raw data. The metadata has 1519 rows in original file but only 395 non-duplicated rows( many rows are refer to the same file). A large number of the .czi files have no matching entries in the metadata.
- Structure list mismatch: Sturcture Cell-cell junctions is shown on the website but not in the metadata. Structure Lysosome is listed in the metadata but not on the website.
- Missing structure–drug pairs: Some structure–drug pairs shown on the website do not appear in the metadata. For example, Actomyosin bundles treated after Staurosporine and Golgi treated after Rapamycin are labeled as observable target / non-target on the website, but neither pair is present in the metadata.
In addition, I checked the data in bff. The data in bff matches the metadata but does not mtach the raw data and information shown on the website.
Dear Zongkai Li,
Thank you for bringing these discrepancies to our attention. Depending on which list you are viewing, some files have been curated out and will not be made available. We will address this shortly and make the situation explicit on our website.
Regarding the file count mismatch, some of the .csv metadata files include one row per channel. Because each image file can contain multiple channels, this results in a higher number of metadata rows than image files.
I hope this clarification is helpful. Please let us know if you have any further questions.
Nathalie
Nathalie Gaudreault, Ph.D.
Sr Advisor | data management, FAIR data practices, and community integration
Allen Institute for Cell Science
615 Westlake Ave N
Seattle, WA
98109-4307