Exploratory coding and other edits in branch issue-152
From Sam:
The Jupyter files currently tabulate in part the values listed in the XLSX file for the particular case being evaluated. This provides a quick indicator of whether values were read in correctly. Perhaps more useful would be to generate a graph of the data as one can then more readily determine whether the values make sense, appropriately change with the case, identify any outliers, etc. Interactive management of these inputs, such as whether or not to include outliers, would then be useful. [see note below]
This could also add a layer of information for decision makers by clearly showing the variability in inputs (technology designs and parameters) and how that gets translated into variability in outputs (metrics).
Investigation notes
- Once the Designs class is instantiated, it will have one or more built-in viz methods to sample from input data distributions and visualize the different technology design and parameter data.
- Potential difficulty: units. Because Tyche is units-agnostic the input designs and parameters data can be in any units. There may then be issues with scale and visualizing several input data categories on one graph - it might also be difficult to group data categories for viz on the same axes (?)
- On second thought this is really only an issue for parameters - the designs data categories can safely be assumed to be in the same units (no one's going to measure capital lifetime in two time units, for instance)
- Input amounts and prices will have this issue
- Related, possibly more complex: What is of interest to the analyst will vary from decision context to decision context; moreover, there's no one "correct" way for these visualizations to turn out. So while we can build a few visualizations that will work with the underlying data structure, those visualizations may not be informative for the analyst - it depends on the technology and on where the input distributions show up in the data.
- Continuing the thought, it would be a relatively light lift to create default visualizations for Input (amounts), Input efficiencies, Input prices, Capital lifetimes, and Scale. However, from our experience in developing decision contexts, the "interesting" information is more likely to be in the Parameters dataset, and having default visualizations for Parameters is (a) difficult and (b) not valuable, because Parameters can be any value in any units.
- Sampling from input distributions and showing/storing this data: Not sure if there's functionality for this. There's definitely sampling being done but is the sampled data stored as is or processed immediately into output information?
Designs.vectorize_designs and Designs.vectorize_parameters seems to do this - currently those methods are only called internally to evaluate so we haven't examined the raw output to date.
- Outliers: The information in the input datasets will be the already processed form of elicited data - raw expert responses won't appear. (Individual probability distributions may but will be combined into a single representative distribution rather than being sampled separately.)
Recommendation (As of June 26)
Let's not add any input data visualization functionality to the codebase itself. Rather, let's add several demonstrations of creating these visualizations to the analysis notebooks for the decision contexts. Then, similarly to how we've demonstrated using the Jupyter notebooks to access Tyche's ensemble simulation and optimization functionalities, the notebooks can also demonstrate visual input data checking and exploration.
Steps to implementation
Exploratory coding and other edits in branch
issue-152From Sam:
The Jupyter files currently tabulate in part the values listed in the XLSX file for the particular case being evaluated. This provides a quick indicator of whether values were read in correctly. Perhaps more useful would be to generate a graph of the data as one can then more readily determine whether the values make sense, appropriately change with the case, identify any outliers, etc. Interactive management of these inputs, such as whether or not to include outliers, would then be useful. [see note below]
This could also add a layer of information for decision makers by clearly showing the variability in inputs (technology designs and parameters) and how that gets translated into variability in outputs (metrics).
Investigation notes
Designs.vectorize_designsandDesigns.vectorize_parametersseems to do this - currently those methods are only called internally toevaluateso we haven't examined the raw output to date.Recommendation (As of June 26)
Let's not add any input data visualization functionality to the codebase itself. Rather, let's add several demonstrations of creating these visualizations to the analysis notebooks for the decision contexts. Then, similarly to how we've demonstrated using the Jupyter notebooks to access Tyche's ensemble simulation and optimization functionalities, the notebooks can also demonstrate visual input data checking and exploration.
Steps to implementation