Conversation
|
fyi @ScientistRachel added the outline of chapter 5 on this branch (I did not edit Chapter 5) |
| ::: {.callout-warning} | ||
| Be careful to avoid hallucinations. | ||
| ::: | ||
| To create segmentations for each cell, a threshold is defined on the cell probability and any pixels above the threshold that are connected to each other are formed into objects. This threshold is defined using a validation set - images that are not used for training or testing - to help ensure the threshold generalizes to the held-out test images. The predicted segmentations with this loss function often contain several merges, because cells can often touch each other and the connected components of the image will combine the touching cells into single components. |
There was a problem hiding this comment.
these concepts are first explained in Section 4.3. Perhaps consider moving that explanation up or cross-reference that section here.
There was a problem hiding this comment.
Sorry I'm not sure if there is a repeat here, but after I push my changes, please make another suggestion
docs/4-architectures.qmd
Outdated
|
|
||
| Moving the parameters in the negative direction of the gradient reduces the loss for the given images or data points over which the loss is computed. We could compute the loss and gradients over all images in the training set, but this would take too long so in practice the loss is computed in batches of a few to a few hundred images - the number of images in a batch is called the *batch size*. The optimization algorithm for updating the weights in batches is called stochastic gradient descent (SGD). This is often faster than full-dataset gradient descent because it updates the parameters many times on a single pass through the training set (called “epoch”). Also, the stochasticity induced by the random sampling step in SGD effectively adds some noise in the search for a good minimum of the loss function, which may be useful for avoiding local minima. | ||
|
|
||
| It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past. The updated version of $\vec{v}$ in this case is |
There was a problem hiding this comment.
| It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past. The updated version of $\vec{v}$ in this case is | |
| It can also be beneficial to include momentum, with some value $\beta$ between zero and one, which pushes weight updates along the same direction they have been updating in the past, to avoid spurious changes due to noise. The updated version of $\vec{v}$ in this case is |
There was a problem hiding this comment.
Or some other worded motivation for momentum
There was a problem hiding this comment.
FYI this is the classical reason for momentum https://en.wikipedia.org/wiki/Gradient_descent#Momentum_or_heavy_ball_method
opp1231
left a comment
There was a problem hiding this comment.
Hi Carsen!
As we discussed previously, we have added notes for places where intermediate steps or further explanations would be helpful. Please take a look, and of course, feel free to reach out to us if anything is unclear.
One general comment is we would like to ensure that we have permission to re-use the images that are borrowed throughout. If you could confirm the access, that would be great.
Additionally, this is a list of terms that we will add to the glossary based on your chapter. You are welcome to provide a definition, or we will pull the definition from your chapter:
- Neural Network
- Architecture
- Natural Image
- Probability Vector
- Skip Connection
- Down sampling
- Up sampling
- Perceptron
- Activation Function
- Linear Layer
- Bias
- ReLU
- Non-linearity
- Convolutional filter/kernel
- Receptive Field
- Pooling Layer
- Sliding Window
- Stride
- Padding
- Encoder-Decoder Structure
- Autoencoder
- Self-attention
- Patches
- Embedding Space
- Auxiliary Variables
- Back propagation
- Optimizer
- Stardist
- CellPose
- Gradient Descent
- Regularization
Glossasry entries that are/will be in other chapters, but you are welcome to chime in on if you'd like:
- Convolution
- Foundation Model
- Channels: Disambiguate fluorescence channels from convolutional outputs
- Validation Images
- Test Images
- Overfitting
Thank you for your dedicated effort to this chapter. We really appreciate your insight and assistance.
Best,
Owen and Rachel
|
Hi @opp1231 and @ScientistRachel I pushed my updates, addressing all of @opp1231's comments (except for the repetition question), and I created two new figures to try to clarify the linear layer section. Please let me know if anything is still unclear. |
|
Small updates for two of the comments I had missed are now in! |
we did not add anything to the glossary, happy to help with this once all the chapters are in