-
Notifications
You must be signed in to change notification settings - Fork 16
Meta data
I use the file data/meta.xls to store meta-data related to analysis projects.
Spreadsheets are very easy to work with and enter data.
I use the xls file extension rather than xlsx because the gdata import package does not throw an error when the xls file is open (which is a nice feature when you are reimporting metadata while still viewing the metadata).
I often deal with psychological tests (e.g., personality, well-being, clinical measures). Such tests include multiple where each item is typically measured on a common scale (e.g., 1 to 5 strongly disagree to strongly agree). This meta data can then be combined with scoring functions such as scoreItems in the psych package. Variables in such sheets include
-
id: name of the variable in R (one per item; typically namedfoo1,foo2, ...fooKwherefoois a short abbreviation of the test name and the number indicates the item number) -
itemnumber: A number typically 1, 2, and so on. This is important for sorting. -
reverse: Many tests often have reversed items. I.e., ((max + min) - score); I indicate reversal status using = not reversed and -1 = reversed. -
text: The item text -
scale: Scoring of tests vary. In the simplest form, there is a one to one mapping between items and scales and this can be recorded in a single variable calledscale. However, other scenarios require different data structures: (a) many items to many scales requires one column per scale with indicators for item inclusion; (b) two-levels of one-to-one item to scale mapping can be represented by two columns. This includes the simple set of scales plus a total score as well as hierarchical tests (e.g., 30 facets and 5 factors of personality where each factor has six facets).
It is common that variables names need to be replaced with labels in some form of tabular output.
Thus, meta.variablelabels is a place to store these replacement rules (i.e., variable corresponds to the variable name). label corresponds to the label that will replace the variable name in some tabular output.