Skip to content

Some recommendations for the lfp_analysis.py code #1

@ChristopherMarais

Description

@ChristopherMarais

I have briefly looked at the code and will slowly more systematically look at it all, but just have a few small recommendations for now. These aren't really critical, they're just guides. I just wanted this to be saved in a place where we don't lose them.

  • I found the following hard coded value and do not know what it means (0.6745). It might be useful to have that as a parameter to function or as an explicitly named default value explaining what it is for with a comment. maybe this is also more a question for Leo than for you. The same goes for "frame_rate*3". Where is the 3 coming from? I know this seems pedantic, but it helps a lot for usability if all the default values are defined and initialized in some place. hard coded values are tricky to figure out. Ideally we should define these things at the top of the script.
  • I didn't see that the import of openpyxl is really used. Maybe just make sure all the imports are used in the code. This way it is easier for us to make sure our environments only have the minimum required packages and stay nice and simple.
  • I see when printing custom string the "text {variable}".format(variable) is used. This is not wrong but, it is much more readable to do it using f-strings as follows f"text{variable}".
  • In the following line the lambda function calculates the median of the column every time it iterates over a row. It might be better to just store the median as a variable and save some processing time at the cost of some RAM. lfp_traces_df[updated_column] = lfp_traces_df.apply(lambda x: 0.6745 * (x[col] - np.median(x[col])) / x[MAD_column], axis=1)
  • On the note of .apply(lambda x: x This is really useful for complex functions, but I spotted a large number of places where this is used. This defeats the vectorization of what makes pandas fast by manually iterating over the rows. The case wehre you use brain_region = col.split("_")[0] for isntance is much faster than the other case where .apply(lambda x: x.split("_")) is used. Make sure to utilize the vectorization of pandas as much as possible, it should speed up the code by quite a bit. In some cases where the lambda functions are more complex I recommend saving intermediate steps in RAM and using vectorization for the parts of the function where it is possible. Ideally when lambda functions become so complex it would be better to have specific defined functions for them to make things more readable. Having defined functions also gives you the ability to take your time and have intermediate variables.
  • Some of the functions can be quite big, I think a decent rule of thumb is that if a function should only have one purpose. As long as this is followed then it is probably fine if the functions are reasonably large.
  • I recommend also just making sure that no unnecessary data is sent to a function. from what I can tell there are cases where dataframes are passed on between functions whereafter only a few columns are used for processing in the functions. This is not really wrong, but it does make the code more readable when we know exactly which columns to send to each function. It also helps with debugging when you step away from code for a while to know exactly what data goes into a function.
  • Don't be afraid to comment the code mid function with some tips

Almost all these changes can be made by just popping the code into chatGPT and asking for these changes. just double check the changes adn comments so that they make sense. :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions