From 36f090ea8da0d26ef525bcebb9620b4194b36551 Mon Sep 17 00:00:00 2001 From: unece-stat <52666356+unece-stat@users.noreply.github.com> Date: Tue, 5 May 2020 15:21:15 +0200 Subject: [PATCH] Update 1_Introduction_(draft).ipynb --- 1_Introduction_(draft).ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/1_Introduction_(draft).ipynb b/1_Introduction_(draft).ipynb index d163328..9ebae4d 100644 --- a/1_Introduction_(draft).ipynb +++ b/1_Introduction_(draft).ipynb @@ -34,7 +34,7 @@ " \n", "## 1.1 Coding and classification in statistical organisations \n", "--- \n", - "Classifying a text description into pre-defined category is a very common task in statistical organisation. In Labour Force Survey (LFS) survey, for example, a respondent is asked to describe their occupation in a text which later classified as a statistical classification such as Standard Occupational Classification (SOC). This task, henceforth called coding & classification (C&C), is not limited to responses from survey questionnaire. Administrative register also requires classification of texts into codes. For example, a new company is asked to provide a description of their business activity for a business registration which is then classified into a statstistical classification such as Standard Industrial Classification (SIC). The categories that are classified at this stage of the statistical production process are used for all subsequent downstream tasks such as aggregation editing or imputation, therefore the quality of C&C is critical to ensure the quality of the final output.\n", + "Classifying a text description into pre-defined category is a very common task in statistical organization. In Labour Force Survey (LFS) survey, for example, a respondent is asked to describe their occupation in a text which later classified as a statistical classification such as Standard Occupational Classification (SOC). This task, henceforth called coding & classification (C&C), is not limited to responses from survey questionnaire. Administrative register also requires classification of texts into codes. For example, a new company is asked to provide a description of their business activity for a business registration which is then classified into a statstistical classification such as Standard Industrial Classification (SIC). The categories that are classified at this stage of the statistical production process are used for all subsequent downstream tasks such as aggregation editing or imputation, therefore the quality of C&C is critical to ensure the quality of the final output.\n", "\n", "Traditionally, C&C was often done by human coders, experts who are trained to read a text description and classify it into pre-defined category. This manual classification is highly resource-intensive in terms of both time and money. For example, US Bureau of Labour Statistics (BLS) collects approximately 300,000 for its Survey of Occupational Injuries and Illnesses (SOII) and this require estimated 25,000 hours of manual work per year [ref_BLS].\n", "\n", @@ -139,4 +139,4 @@ ] } ] -} \ No newline at end of file +}