Skip to content

[WIP] # Complete Agent Directive: Implement Excel Cleavage Mapping Pipeline ## Mission Statement You are tasked with implementing a complete Excel-based cleavage mapping pipeline for bioinformatics protein sequence analysis. This is a **formula-only automat...#2

Closed
Copilot wants to merge 1 commit into
masterfrom
copilot/fix-85d969d4-734c-4423-b8d4-ad458147dcfa

Conversation

Copilot AI commented Sep 17, 2025

Copy link
Copy Markdown
Contributor

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original description:

Complete Agent Directive: Implement Excel Cleavage Mapping Pipeline

Mission Statement

You are tasked with implementing a complete Excel-based cleavage mapping pipeline for bioinformatics protein sequence analysis. This is a formula-only automation solution (no VBA) that processes CSV data through structured Excel templates with dynamic output tables.

Project Context

  • Repository: mnechromancer/cleavage_mapper
  • Current State: Documentation and planning phase - Excel templates need to be built from specifications
  • Target Users: Bioinformatics researchers who need Excel-based protein sequence analysis
  • Core Philosophy: Pure Excel formulas, template-based workflow, no external scripts

Primary Deliverable

Create excel/CleavagePipeline_Template.xlsx with the following structure:

Required Worksheets

  1. Raw_100, Raw_200, Raw_500 - Data input sheets for different mgd concentrations
  2. Calc_100, Calc_200, Calc_500 - Formula processing sheets with automated calculations
  3. Dashboard (optional) - Summary view with key metrics

Raw Sheets Structure (Columns A-I)

Column A: # (Row numbers from source data)
Column B: Sequence (Protein sequences with cleavage indicators: "(E)AEDLQVGQVELGGGPGAGSLQ(P)")
Columns C-I: Function data (Han_StemCells_[concentration]mgdl_Glucose_AspN_Fxn2 through Fxn8)

Calc Sheets Structure (Columns A-S)

Core Processing (A-I):

  • A: Sequence=Raw_100!B2
  • B: LeftResidue=IF(ISERROR(FIND("(",A2)),"",MID(A2,FIND("(",A2)+1,FIND(")",A2)-FIND("(",A2)-1))
  • C: RightResidue=IF(LEN(A2)-LEN(SUBSTITUTE(A2,"(",""))<2,"",MID(A2,FIND("(",SUBSTITUTE(A2,"(","♦",LEN(A2)-LEN(SUBSTITUTE(A2,"(",""))-1))+1,FIND(")",A2,FIND("(",SUBSTITUTE(A2,"(","♦",LEN(A2)-LEN(SUBSTITUTE(A2,"(",""))-1)))-FIND("(",SUBSTITUTE(A2,"(","♦",LEN(A2)-LEN(SUBSTITUTE(A2,"(",""))-1))-1))
  • D: CorePeptide=IF(OR(ISERROR(FIND(")",A2)),LEN(A2)-LEN(SUBSTITUTE(A2,"(",""))<2),"",MID(A2,FIND(")",A2)+1,FIND("(",SUBSTITUTE(A2,"(","♦",LEN(A2)-LEN(SUBSTITUTE(A2,"(",""))-1))-FIND(")",A2)-1))
  • E: Left_Sum=SUM(Raw_100!C2:I2)
  • F: Right_Sum=0 (placeholder for future Right function columns)
  • G: Total_Sum=E2+F2
  • H: Left_Percentage=IF(G2=0,0,E2/G2*100)
  • I: Right_Percentage=IF(G2=0,0,F2/G2*100)

Dynamic Output Tables (K-S):

  • K-N: Left-Anchored Cleavage Table=FILTER(B2:E100,(B2:B100<>"")*(E2:E100>0))
  • P-S: Right-Anchored Cleavage Table=FILTER(C2:E100,(C2:C100<>"")*(E2:E100>0))

Critical Implementation Requirements

Formula Standards

  1. Error Handling: All formulas must include IF(ISERROR()) checks
  2. Dynamic Arrays: Use FILTER, SORTBY, HSTACK for output tables (Office 365/Excel 2021+)
  3. Sheet References: Adjust Raw_100! references for Raw_200 and Raw_500 variants
  4. Copy-Pasteable: Formulas must work when copied down rows automatically

Data Format Specifications

Input Sequence Format: (LEFT_RESIDUE)CORE_PEPTIDE(RIGHT_RESIDUE)

  • Example: (E)AEDLQVGQVELGGGPGAGSLQ(P)
  • LeftResidue: E
  • CorePeptide: AEDLQVGQVELGGGPGAGSLQ
  • RightResidue: P

Testing Requirements

Create sample data that demonstrates:

  • Multiple protein sequences with different residue patterns
  • Function values that generate meaningful Left_Sum calculations
  • Sequences that test edge cases (missing parentheses, single residues)

Secondary Deliverables

Documentation Files

  1. docs/FORMULAS.md - Complete technical formula reference with explanations
  2. docs/WORKFLOW.md - Step-by-step user guide for non-technical researchers
  3. data/sample_[100|200|500]mgd.csv - Test data files for validation

Sample Data Requirements

Each CSV should contain:

  • 3-5 rows of realistic protein sequence data
  • Function values (Fxn2-Fxn8) that create testable sum calculations
  • Mix of different LeftResidue/RightResidue combinations
  • Headers matching Raw sheet column structure

Technical Constraints

  • Excel Version: Modern Excel with dynamic array support required
  • No VBA: Pure formula-based solution only
  • No External Dependencies: Standard Excel functions only
  • Performance: Formulas should recalculate efficiently with 100+ rows of data

Success Criteria

  1. Template Functionality: Users can paste CSV data into Raw sheets and see automatic calculations in Calc sheets
  2. Dynamic Tables: Left/Right-anchored views update automatically when new data is added
  3. Formula Accuracy: Residue extraction and sum calculations match expected bioinformatics analysis patterns
  4. Documentation Quality: Technical and user documentation enables independent usage
  5. Sample Data Validation: Provided test files demonstrate complete workflow

File Organization Priority

  1. excel/CleavagePipeline_Template.xlsx (primary deliverable)
  2. data/sample_*.csv files (testing)
  3. docs/FORMULAS.md (technical reference)
  4. docs/WORKFLOW.md (user guide)

Execute this implementation systematically, ensuring each component works independently and integrates seamlessly into the complete pipeline workflow.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@mnechromancer mnechromancer deleted the copilot/fix-85d969d4-734c-4423-b8d4-ad458147dcfa branch September 17, 2025 21:24
Copilot AI requested a review from mnechromancer September 17, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants