Answer to the referees’ comments

Journal: Composites Science and Technology

Manuscript Number: CSTE-D-21-00531

Title: “Descriptive Modeling of Textiles using FE Simulations and Deep Learning”

Authors: Arturo Mendoza, Roger Trullo, Yanneck Wielhorski

Our answers are printed in italics interwoven with the original reviewer’s comments. The major additions or corrections are highlighted in red in the revised version.

1 Reviewer # 1

The paper addresses the task of processing CT images of composite reinforcements for creating computational mesh for FE/CFD simulations. The topic is of significant interest for the composite manufacturing community and readers of the journal of Composite Science and Technology. Here, the authors propose the use of an advanced deep learning technique for segmentation of the CT images. Typically, semantic segmentation approach is applied for such a problem; however, the instant segmentation of yarns, proposed here, by employing Mask R-CNN is novel and should be appreciated. This approach allowed the authors to obtain the yarn cross sections as a collection of control points that can be directly used for constructing the FE mesh. The authors have overcome the lack of annotated data for training the Mask R-CNN by using of virtual data in the form of pseudo-CT images. The methodology has been applied here on a 3D angle interlock fabric and the resulting model has been validated by assessing the yarn path and local intra-yarn fiber volume fraction.

Overall, the paper is well written with a smooth flow of information. The methodology and preliminary results presented are promising and should be further pursued. Therefore the paper should be accepted for publication in the journal of Composite Science and Technology. However, the authors should address the following concerns before acceptance.

Major:

1.

The authors have taken a lengthy procedure to get the annotated data for training the Mask R-CNN. Do we really need deep learning in the first part? It is understandable that data handling and preparation is the most crucial part and there is a serious lack of annotated data. I am impressed by the second part where Mask R-CNN is employed for instance segmentation but I cannot understand the technical necessity and feasibility in the first part using a U-Net.
Answer
1. (a)
  
  Using deep learning is not a requirement for generating pseudo-CT images. However, we believe that the deep neural network approach achieves considerably better results than any other ad hoc approach would.
2. (b)
  
  There is no technical necessity for generation of pseudo-CT images. The only necessity is to have a properly labeled dataset for training the Mask R-CNN. Since we want to avoid the tedious task of procuring manually annotated data, we use deep learning for circumventing the problem.
2.

Section 2.5 Extraction of 2D slices: It is fine that you are using a 2.5D approach but will that not be confusing for the CNN? If the slices from both warp and weft are fed to the same CNN, it would be redundant and more confusing, as you have also mentioned in section 3.1. So, how this would be more useful than the 2D approach? Are the datasets linked and images paired? For example, the image $x_{CT}[1]$ corresponds to the image $x_{GT}[1]$ ? The value of k is 1 to 12?
Answer
1. (a)
  
  The revised version further discusses these aspects of the 2.5D approach.
2. (b)
  
  We do not think that two networks would be necessary. As can be seen from the results, the current approach of one network proves to be capable enough.
3. (c)
  
  As detailed in Section 2.5, both datasets have the same number of slices ( $i\in[1,3512]$ ). The slices are indeed paired, since the original volumes are naturally “paired” as one is obtained from the other.
3.

Section 2.6 Overview of the method: The use of the term “image translation” is completely misleading. The first step is sort of “inverse segmentation” where you have a set of raw and annotated images and you are training the U-Net to how to convert an annotated image from the FE model into a gray scale image.
Answer

The term “inverse segmentation” has been employed in the revised version.
4.

Section 3.1 Pre-processing of labeled images: Again, I am not convinced by the way the weft/warp slices are treated. The best way here seems to be training separate CNN’s for each direction and then coming up with a sort of decision tree or SVM at the end. The efforts dedicated here are commendable and the approach may be acceptable as this just for creating the pseudo-CT images. I am just wondering how the original labels are preserved and reverted back during the inference/application phase.
Answer

Indeed, the goal is not to create hyper-realistic CT images, but “good-enough” images that would constitute a good training dataset for the Mask-RCNN. As such, the pseudo-CT generation and instance segmentation tasks are not necessarily symmetric. It would certainly be interesting to explore on well-annotated CT images (which we do not have).
5.

Section 3.2 Neural network architecture: What is the purpose of modifying the U-Net? Why is it necessary?
Answer

Although it is not absolutely “necessary”, adding skip-connections in image generation tasks has proven to be successful. For example, in reference [29] the authors obtained substantial improvements by using this kind of connections. We have also found it useful in internal unpublished studies. It is important to note that this modification has almost no impact in the computational time [29]. We have added this reference to the revised version.
6.

Section 3.3 Training of the network: The justification for not testing the trained network seems irrational. How would you determine if the model is over-fit or under-fir without testing it? How would the trained network be tested, hypothetically? Secondly, why you needed a custom loss function? I cannot understand the use of a pre-trained image classification network here? The images in ImageNet dataset are completely different from the ones encountered here. Why not just go with a cross-entropy loss function?
Answer
1. (a)
  
  It should be noted that our so-called validation dataset is indeed a testing dataset, as we did not use its results for choosing a model (we kept the last one after 40 epochs).
2. (b)
  
  The results in Appendix A clearly show that our network generalizes well enough to very distinct cases (i.e., no over-fitting). We think that this is a very good approach for testing performance. Other methods employ external metrics such as the Fréchet inception distance (FID) used in GANs. However, in this case, it would be very equivalent to the perceptual loss we implemented. A final approach would be to show pairs of real and generated images to experts and ask which one they prefer. As a side note, we did show the generated images to some composite material experts and they said that they could have been fooled into believing that the generated images were real.
3. (c)
  
  The cross-entropy is relevant for classifications tasks. It is not relevant here since we do not have distinct discrete distributions and our problem is more akin to regression than classification. Usually, the $L_{2}$ norm is used for regression tasks, however it has been shown to produce blurred results if used in isolation. Hence, the use of the perceptual loss [31] to counter-act this phenomenon.
4. (d)
  
  The use of a network pre-trained on an unrelated dataset is now discussed in the revised version. It should be pointed out that classification is mostly concentrated in the last Dense3 block and the intermediate features are high level features useful for describing any type of images (from ImageNet or otherwise).
7.

Section 3.5 Results: How can the pseudo-CT images be comparable to the real ones when the final images generated by the trained U-Net model do not have rings, noise and other artifacts?
Answer

The revised version discusses this aspect.
8.

Section 4.1.1. Convolutional Features: How will you justify the use of a network pre-trained on ImageNet dataset?
Answer

The revised version discusses this aspect (same as for U-Net).
9.

Section 5.1. Completion of missing sections: How yarns with connected cross-sections can be separated? Are they separated automatically by the instant segmentation model? How the center key point was recomputed? Will it be more reasonable to use the centroid of the 10 key points of the yarn contours?
Answer

Yes, they are “automatically” separated into different instances by the Mask-RCNN segmentation model. The important detail on the center keypoint being computed as the centroid has been added to the revised version.
10.

Section 5.3. Mesh of reinforcement: Are the key points put in certain order to create quadrilateral elements? I am a bit confused how the connectivity between the nodes (key points) established for this purpose? How we can ensure a good quality mesh elements here?
Answer

The revised version dives further into these details. Moreover, the mesh quality for the final volume mesh is guaranteed by the employed mesh software.
11.

Section 6.2. Section assessment: The model seems to under predict the yarn cross-sections. I would expect the yarn fiber volume fraction of the cross-sections in the range of 50–70 %. Can the authors point to any literature where this value is in the range of 60–90 %?
Answer

The revised version provides more explanations.
12.

Figure B.25: It is indeed very useful to look at the feature map to understand the inner functions of the CNN. As can be seen in the figure, a number of the filters produce the same output feature map. I would suggest showing only few representative feature maps from a given convolutional layer rather than showing all of them. For example, the filter numbers 2, 11, 30, 32, 36, 38, 51 and 52 in Conv2 layer extract the same features.
Answer

As this is a matter of preference, the revised version keeps all the feature maps as it is interesting to observe that some feature maps are indeed very similar. This quirk would be missed if we cherry-picked the feature maps.

Minor:

1.

Graphical Abstract: There are a lot of negative space on the top-right and bottom-left corners of the figure I would suggest modifying the figure to reduce the empty space Also, in the PDF file, the image quality is very poor I think it would be better to label the inference phase as “well-trained U-Net” and “well-trained Mask R-CNN” as in this step, you are just using a trained network for the specific tasks It is basically confusing how many times you are using the CNN’s, 4 or 2 times? Same comment for Figure 1 as well.
Answer

Figure 1 has been updated in the revised version. Moreover, the figure 1 has been exported to a high resolution image for the revised graphical abstract.
2.

Abstract: The opening two sentences are trivial and complex.
Answer

The abstract has been updated in the revised version.
3.

Abstract: Does the term “binary masks” refer to the “segmented volumes”? In that case, it is usually not “binary”; it has to be enquotetrinary as least, as is the case here as well (Figure 5/6 b).
Answer

The revised version drops the too-limiting term “binary masks” and favors the term “voxel-wise masks”.
4.

Introduction Para 2 Line 2: “These model the yarns …” these what? “These approaches model the yarns …”?
Answer

The sentence has been modified in the revised version.
5.

Introduction Para 3: Only the “structure tensor” approach has been discussed. What about the approach based on Grey Level Co-occurrence Matrix (GLCM)? For details, please refer to the articles numbered as 18 and 38 in the manuscript by Naouar et al..
Answer

The revised version focuses on the structure tensor and the approaches that use machine learning techniques on derived features. We have opted for not discussing the textural approach so as not break the flow of the text, since it does not use either the structure tensor nor machine learning techniques (relevant for the present paper). We would argue that the introduction in the revised version covers most of the relevant literature and that this single omission does not constitute an issue.
6.

Introduction Para 4: Clustering techniques are applied on which variable? Are they applied on the raw gray scale values? Or they are applied on any derived variable? The clustering techniques are used regardless it is easy or difficult. How else are you going to separate matrix and yarns and then waft and warp yarn?
Answer

The revised version addresses these aspects of most yarn segmentation methods.
7.

Introduction Para 2–4: Well, the techniques for segmentation have been briefly discussed, however, the answer to the basic question is still not provided. Why is it a challenging task to separate the yarns into weft and warp groups? Why simple clustering/thresholding techniques cannot be applied directly on the raw grey values and we really need to compute the eigenvalues/eigenvectors from the structure tensor? For details, please refer to the review article Composites Science and Technology 184 (2019) 107828 by Ali et al.. Also, how these methods can handle a fabric width a z-binder yarn?
Answer

These details are deepened in the revised version.
8.

Introduction: The task you are addressing here is basically of segmenting CT images, yet, you are reluctant to use this terminology for the first half of your introduction.
Answer

The revised version better highlights the difference between “descriptive modeling” and “yarn segmentation”. As the latter is included in the former, we choose to use the more generic and complete term.
9.

Section 2.1 Tomographic volume: I agree with the authors that 20 µm is a reasonable resolution for the meso-scale models. How are the reconstructed images normalized, divide by what value, 255?
Answer

The normalization depends on the image type the user may have. In our case, our CT images are encoded in unsigned 16 bits, hence we divide by 65535. The only important aspect of our normalization is that the images be in the [0, 1] range so that we can employ a simple sigmoid function. However, if we would have wanted to keep the images in the original unsigned 16 bits range, we could have just multiplied the output of the sigmoid by the corresponding factor. While it is true that one could explore more “advanced” normalization procedures (e.g., histogram normalization, histogram equalization), these are largely unnecessary as the network is capable of performing equivalent operations during the first layers (if it needs it).
10.

Section 2.2 Simple Textile Model: What you mean by “crossing points”? Are you referring to the centroid of the yarn cross section? How the yarn fiber volume fraction was adjusted to match the overall fiber volume fraction? Please describe briefly or refer to a source.
Answer

The revised version gives more details on both aspects.
11.

Figures 5 and 6 should precede Figure 2 and linked to section 3.1.
Answer

While the proposed order would have the “introductory” images closer to the section in which the materials are described, we current order has them closer to the part in which our method starts (e.g., the trinary conversion). As this is a matter of preference, we prefer to keep them in the current order
12.

Section 3.3 Training of the network: It may not be technically correct to say that VGG-16 is a “backbone”. It is rather a “branch” that does not need training, as stated.
Answer

While the term “branch” does present a nice analogy, the term “backbone” is commonly used in the literature [26].
13.

Section 4.1 Neural network architecture: First sentence is confusing.
Answer

This sentence has been modified in the revised version.
14.

Section 4.1.2 Region Proposal Network: The use of k in Section 2 and $\kappa$ in Section 4.1.2 is a bit confusing.
Answer

The revised version uses $\alpha$ for denoting the number of anchors.
15.

Section 4.1.4 Box Classification and Refinement: Is it a correct title?
Answer

The Section title has been updated in the revised version.
16.

Figure 13: Are the boxes in (a) related to (b) and (c)? Please put arrows to link.
Answer

The figure and associated caption have been updated in the revised version to better display this relationship.
17.

Figures: Some of the figures do not appear in the text in the same order as they are numbered. For example, check the order of figures 17, 18 and 19.
Answer

While the proposed order would have the figures in a “linear” sequence, we want to keep the in the current order as it allows us to group all meshing results in one last figure (Figure 19). As such, one can analyze only this figure to grasp the results of our method. Admittedly, this has the side-effect that figures are referenced in the text in a non-linear order as pointed out by the reviewer. Nonetheless, we would argue that the stated benefit outweighs this minor inconvenience.
18.

Figure B.25: Please put an arrow showing the direction of flow of information.
Answer

The figure caption has been updated to better guide the reader in such sense.
19.

References: Please check the details of references 18. Name of the journal is missing
Answer

The reference is correctly cited in the revised version.

2 Reviewer # 2

The authors have presented original research work in the field of composite materials in general and 3D interlock composites in particular. This work presents is a great advancement in the application of computer vision (deep learning) to material science. Considered materials are complex by their heterogeneous and multi-scale nature inspate that fact, the authors achieved, to my opinion, excellent results by hybridizing deep learning from medical science and deterministic composite mechanics as fibrous multi-scale material. To achieve these results the authors used a complex and precisely verified methodology using an impressive number of in-house codes or open-source libraries.

I would suggest the following improvements for better reading of the manuscript. My remarks as follows:

1.

I suggest explaining with figures as simple as possible four initial steps (2.2. Simple textile model, 2.3. Mechanical simulations, 2.4. Voxel conversion, 2.5. Extraction of 2D slices) but also the link between them. For example, how exactly looks the selection of yarns, can you put some schematic example? How did you used generated geometry in FE simulations? Can you provide more details about them? Similar questions and remarks arized while reading the Voxel conversion step.
Answer

We added more details on the concerned sections so as to better guide the reader. We did not add diagrams for these sections as they are not inherent to our method but rather on the manner in which the data is obtained.
2.
Small questions:
1. (a)
  
  Why only 10 points for cross-section?
2. (b)
  
  How you labeled CT images?
3. (c)
  
  Please check that all operations and simbols used in the equation were well introduced in the text.
Answer
1. (a)
  
  See section 7: “10 keypoints were found to be sufficient for mechanic elastic homogenization”.
2. (b)
  
  The extraction of the parametric model from the CT images is detailed in Section 2.2.
3. (c)
  
  All operations and symbols are defined in the revised version.
3.

Regarding the validation part (keypoints detection and FSF comparison) have you compared with real measurements of the FSF for each yarn? As a general remark for µ-CT images, it looks like the cross-section is systematically underestimated. Even though you have set the maximum limit at 91% (which is already extremely high) and showed on the figure, would it be possible to see all mapping?
Answer
1. (a)
  
  We don’t have any real measurements of the FSF for each yarn. For this reason we have chosen to plot the distribution of the yarn fiber volume fraction for the whole textile. This allows to indirectly compare the predicted cross sections to the theoretically admissible values.
2. (b)
  
  The maximum limit of 91%, is the theoretical one for a compact hexagonal arrangement, we don’t impose it. However, we cap the yarn fiber volume fraction to 100% whenever it is higher than 100%.
3. (c)
  
  The spatial distribution of the FSF is shown in a new figure in the revised version.
4.

Did you consider the gap between yarns for conformal mesh?
Answer

We do not need to take it into account during our processing as the in-house software used for generating the conformal mesh handles it automatically.
5.

As a matter of reading improvement, could you please put the titles on the axis Fig 20 and Fig 22.
Answer

The revised version adds the axis labels on the concerned figures.
6.

I would recommend English proofread by a native speaker.
Answer

The revised version has been proofread.

As the authors are pointed out there are still many improvements possible and this work opens the doors to do so. To sum up I would highly recommend this work for publishing.

3 Reviewer # 3

I find this article interesting because it highlights a method (deep neural network architectures) which is relatively in the air of time of segmentation of tomography images for reinforcements of composites.

•

In general, what I found very frustrating is that you don’t detail enough the mechanical and material parts. You could perhaps start, for example, concerning the part 2.1, by proposing a Geometric representation of a ply to ply angle interlock architecture of a 3D woven composite, by clarifying the number of fibers used …
Answer

We have added further details on the mechanical and material parts in the revised version. Some information such as the number of fibers is still not provided for confidentiality reason. However, these remaining details are not determinant for the proposed method.
•

Concerning the mechanical simulation part, you don’t give the behavior law used?
Answer

The revised version addresses the behavior law.
•

For figures 20 and 22, can you explain the parts b and c?
Answer

The revised version better explains the concerned sub-figures.

4 Editor

Please modify the format of the keywords list according to the journal’s requirement.

Answer

The revised version follows the journal’s guidelines on keywords.