Skip to content

Chapter4#10

Open
carsen-stringer wants to merge 8 commits intomainfrom
Chapter4
Open

Chapter4#10
carsen-stringer wants to merge 8 commits intomainfrom
Chapter4

Conversation

@carsen-stringer
Copy link
Collaborator

we did not add anything to the glossary, happy to help with this once all the chapters are in

@carsen-stringer
Copy link
Collaborator Author

fyi @ScientistRachel added the outline of chapter 5 on this branch (I did not edit Chapter 5)

::: {.callout-warning}
Be careful to avoid hallucinations.
:::
To create segmentations for each cell, a threshold is defined on the cell probability and any pixels above the threshold that are connected to each other are formed into objects. This threshold is defined using a validation set - images that are not used for training or testing - to help ensure the threshold generalizes to the held-out test images. The predicted segmentations with this loss function often contain several merges, because cells can often touch each other and the connected components of the image will combine the touching cells into single components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these concepts are first explained in Section 4.3. Perhaps consider moving that explanation up or cross-reference that section here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not sure if there is a repeat here, but after I push my changes, please make another suggestion


Moving the parameters in the negative direction of the gradient reduces the loss for the given images or data points over which the loss is computed. We could compute the loss and gradients over all images in the training set, but this would take too long so in practice the loss is computed in batches of a few to a few hundred images - the number of images in a batch is called the *batch size*. The optimization algorithm for updating the weights in batches is called stochastic gradient descent (SGD). This is often faster than full-dataset gradient descent because it updates the parameters many times on a single pass through the training set (called “epoch”). Also, the stochasticity induced by the random sampling step in SGD effectively adds some noise in the search for a good minimum of the loss function, which may be useful for avoiding local minima.

It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past. The updated version of $\vec{v}$ in this case is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past. The updated version of $\vec{v}$ in this case is
It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past, to avoid spurious changes due to noise. The updated version of $\vec{v}$ in this case is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or some other worded motivation for momentum

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@opp1231 opp1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Carsen!

As we discussed previously, we have added notes for places where intermediate steps or further explanations would be helpful. Please take a look, and of course, feel free to reach out to us if anything is unclear.

One general comment is we would like to ensure that we have permission to re-use the images that are borrowed throughout. If you could confirm the access, that would be great.

Additionally, this is a list of terms that we will add to the glossary based on your chapter. You are welcome to provide a definition, or we will pull the definition from your chapter:

  • Neural Network
  • Architecture
  • Natural Image
  • Probability Vector
  • Skip Connection
  • Down sampling
  • Up sampling
  • Perceptron
  • Activation Function
  • Linear Layer
  • Bias
  • ReLU
  • Non-linearity
  • Convolutional filter/kernel
  • Receptive Field
  • Pooling Layer
  • Sliding Window
  • Stride
  • Padding
  • Encoder-Decoder Structure
  • Autoencoder
  • Self-attention
  • Patches
  • Embedding Space
  • Auxiliary Variables
  • Back propagation
  • Optimizer
  • Stardist
  • CellPose
  • Gradient Descent
  • Regularization

Glossasry entries that are/will be in other chapters, but you are welcome to chime in on if you'd like:

  • Convolution
  • Foundation Model
  • Channels: Disambiguate fluorescence channels from convolutional outputs
  • Validation Images
  • Test Images
  • Overfitting

Thank you for your dedicated effort to this chapter. We really appreciate your insight and assistance.

Best,
Owen and Rachel

@carsen-stringer
Copy link
Collaborator Author

Hi @opp1231 and @ScientistRachel I pushed my updates, addressing all of @opp1231's comments (except for the repetition question), and I created two new figures to try to clarify the linear layer section. Please let me know if anything is still unclear.

@carsen-stringer
Copy link
Collaborator Author

Small updates for two of the comments I had missed are now in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants