Chapter4 by carsen-stringer · Pull Request #10 · aicjanelia/BioImagingAI

carsen-stringer · 2025-07-22T11:04:24Z

we did not add anything to the glossary, happy to help with this once all the chapters are in

carsen-stringer · 2025-07-22T11:07:43Z

fyi @ScientistRachel added the outline of chapter 5 on this branch (I did not edit Chapter 5)

docs/4-architectures/conv_happy.gif

docs/4-architectures/grad_descent.gif

docs/4-architectures/same_padding_no_strides.gif

docs/4-architectures.qmd

opp1231 · 2026-01-09T19:54:35Z

docs/4-architectures.qmd

-::: {.callout-warning}
-Be careful to avoid hallucinations.
-:::
+To create segmentations for each cell, a threshold is defined on the cell probability and any pixels above the threshold that are connected to each other are formed into objects. This threshold is defined using a validation set - images that are not used for training or testing - to help ensure the threshold generalizes to the held-out test images. The predicted segmentations with this loss function often contain several merges, because cells can often touch each other and the connected components of the image will combine the touching cells into single components. 


these concepts are first explained in Section 4.3. Perhaps consider moving that explanation up or cross-reference that section here.

Sorry I'm not sure if there is a repeat here, but after I push my changes, please make another suggestion

docs/4-architectures.qmd

opp1231 · 2026-01-09T19:55:32Z

docs/4-architectures.qmd

+
+Moving the parameters in the negative direction of the gradient reduces the loss for the given images or data points over which the loss is computed. We could compute the loss and gradients over all images in the training set, but this would take too long so in practice the loss is computed in batches of a few to a few hundred images - the number of images in a batch is called the *batch size*. The optimization algorithm for updating the weights in batches is called stochastic gradient descent (SGD). This is often faster than full-dataset gradient descent because it updates the parameters many times on a single pass through the training set (called “epoch”). Also, the stochasticity induced by the random sampling step in SGD effectively adds some noise in the search for a good minimum of the loss function, which may be useful for avoiding local minima.
+
+It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past. The updated version of $\vec{v}$ in this case is


Suggested change

It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past. The updated version of $\vec{v}$ in this case is

It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past, to avoid spurious changes due to noise. The updated version of $\vec{v}$ in this case is

Or some other worded motivation for momentum

FYI this is the classical reason for momentum https://en.wikipedia.org/wiki/Gradient_descent#Momentum_or_heavy_ball_method

docs/4-architectures.qmd

opp1231

Hi Carsen!

As we discussed previously, we have added notes for places where intermediate steps or further explanations would be helpful. Please take a look, and of course, feel free to reach out to us if anything is unclear.

One general comment is we would like to ensure that we have permission to re-use the images that are borrowed throughout. If you could confirm the access, that would be great.

Additionally, this is a list of terms that we will add to the glossary based on your chapter. You are welcome to provide a definition, or we will pull the definition from your chapter:

Neural Network
Architecture
Natural Image
Probability Vector
Skip Connection
Down sampling
Up sampling
Perceptron
Activation Function
Linear Layer
Bias
ReLU
Non-linearity
Convolutional filter/kernel
Receptive Field
Pooling Layer
Sliding Window
Stride
Padding
Encoder-Decoder Structure
Autoencoder
Self-attention
Patches
Embedding Space
Auxiliary Variables
Back propagation
Optimizer
Stardist
CellPose
Gradient Descent
Regularization

Glossasry entries that are/will be in other chapters, but you are welcome to chime in on if you'd like:

Convolution
Foundation Model
Channels: Disambiguate fluorescence channels from convolutional outputs
Validation Images
Test Images
Overfitting

Thank you for your dedicated effort to this chapter. We really appreciate your insight and assistance.

Best,
Owen and Rachel

carsen-stringer · 2026-03-08T14:29:25Z

Hi @opp1231 and @ScientistRachel I pushed my updates, addressing all of @opp1231's comments (except for the repetition question), and I created two new figures to try to clarify the linear layer section. Please let me know if anything is still unclear.

carsen-stringer · 2026-03-08T15:14:40Z

Small updates for two of the comments I had missed are now in!

ScientistRachel and others added 4 commits April 2, 2025 09:48

Chapter 5 initial outline

f545556

Chapter 4 initial outline

e07555c

adding ch 4

312b702

adding references

51cd73e

carsen-stringer requested review from ScientistRachel and opp1231 as code owners July 22, 2025 11:04

update trainloss fig

8aaba36