visual exploration of input datasets to check for data entry errors

Exploratory coding and other edits in branch `issue-152`

From Sam:

The Jupyter files currently tabulate in part the values listed in the XLSX file for the particular case being evaluated.  This provides a quick indicator of whether values were read in correctly.  Perhaps more useful would be to **generate a graph of the data as one can then more readily determine whether the values make sense, appropriately change with the case, identify any outliers, etc.**  Interactive management of these inputs, such as *whether or not to include outliers*, would then be useful. [see note below]

This could also add a layer of information for decision makers by clearly showing the variability in inputs (technology designs and parameters) and how that gets translated into variability in outputs (metrics).

# Investigation notes

* Once the Designs class is instantiated, it will have one or more built-in viz methods to sample from input data distributions and visualize the different technology design and parameter data.
   * Potential difficulty: units. Because Tyche is units-agnostic the input designs and parameters data can be in any units. There may then be issues with scale and visualizing several input data categories on one graph - it might also be difficult to group data categories for viz on the same axes (?)
    * On second thought this is really only an issue for parameters - the designs data categories can safely be assumed to be in the same units (no one's going to measure capital lifetime in two time units, for instance)
    * Input amounts and prices will have this issue
* Related, possibly more complex: What is of interest to the analyst will vary from decision context to decision context; moreover, there's no one "correct" way for these visualizations to turn out. So while we can build a few visualizations that will work with the underlying data structure, those visualizations may not be informative for the analyst - it depends on the technology and on where the input distributions show up in the data.
* Continuing the thought, it would be a relatively light lift to create default visualizations for Input (amounts), Input efficiencies, Input prices, Capital lifetimes, and Scale. However, from our experience in developing decision contexts, the "interesting" information is more likely to be in the Parameters dataset, and having default visualizations for Parameters is (a) difficult and (b) not valuable, because Parameters can be any value in any units.
* Sampling from input distributions and showing/storing this data: Not sure if there's functionality for this. There's definitely sampling being done but is the sampled data stored as is or processed immediately into output information?
    * `Designs.vectorize_designs` and `Designs.vectorize_parameters` seems to do this - currently those methods are only called internally to `evaluate` so we haven't examined the raw output to date.
* *Outliers*: The information in the input datasets will be the already processed form of elicited data - raw expert responses won't appear. (Individual probability distributions may but will be combined into a single representative distribution rather than being sampled separately.) 

## Recommendation (As of June 26)

Let's not add any input data visualization functionality to the codebase itself. **Rather, let's add several demonstrations of creating these visualizations to the analysis notebooks for the decision contexts.** Then, similarly to how we've demonstrated using the Jupyter notebooks to access Tyche's ensemble simulation and optimization functionalities, the notebooks can also demonstrate visual input data checking and exploration.

## Steps to implementation

- [ ] Identify 1-2 decision contexts to use for a demo: the template context is probably a good choice and then one of the more complex technology models with a lot of parameters
- [ ] Implement some exploratory viz and downselect which types of graphs are the most useful
- [ ] Fully document and explain the viz code so it can be altered and re-purposed across different decision contexts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

visual exploration of input datasets to check for data entry errors #152

Investigation notes

Recommendation (As of June 26)

Steps to implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

visual exploration of input datasets to check for data entry errors #152

Description

Investigation notes

Recommendation (As of June 26)

Steps to implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions