Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions 1_Introduction_(draft).ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
" \n",
"## 1.1 Coding and classification in statistical organisations \n",
"--- \n",
"Classifying a text description into pre-defined category is a very common task in statistical organisation. In Labour Force Survey (LFS) survey, for example, a respondent is asked to describe their occupation in a text which later classified as a statistical classification such as Standard Occupational Classification (SOC). This task, henceforth called coding & classification (C&C), is not limited to responses from survey questionnaire. Administrative register also requires classification of texts into codes. For example, a new company is asked to provide a description of their business activity for a business registration which is then classified into a statstistical classification such as Standard Industrial Classification (SIC). The categories that are classified at this stage of the statistical production process are used for all subsequent downstream tasks such as aggregation editing or imputation, therefore the quality of C&C is critical to ensure the quality of the final output.\n",
"Classifying a text description into pre-defined category is a very common task in statistical organization. In Labour Force Survey (LFS) survey, for example, a respondent is asked to describe their occupation in a text which later classified as a statistical classification such as Standard Occupational Classification (SOC). This task, henceforth called coding & classification (C&C), is not limited to responses from survey questionnaire. Administrative register also requires classification of texts into codes. For example, a new company is asked to provide a description of their business activity for a business registration which is then classified into a statstistical classification such as Standard Industrial Classification (SIC). The categories that are classified at this stage of the statistical production process are used for all subsequent downstream tasks such as aggregation editing or imputation, therefore the quality of C&C is critical to ensure the quality of the final output.\n",
"\n",
"Traditionally, C&C was often done by human coders, experts who are trained to read a text description and classify it into pre-defined category. This manual classification is highly resource-intensive in terms of both time and money. For example, US Bureau of Labour Statistics (BLS) collects approximately 300,000 for its Survey of Occupational Injuries and Illnesses (SOII) and this require estimated 25,000 hours of manual work per year [ref_BLS].\n",
"\n",
Expand Down Expand Up @@ -139,4 +139,4 @@
]
}
]
}
}