• Data labeling is the annotation (ground truth labeling) of raw data to train AI models; hence the participation of healthcare professionals is essential in this task, as the diagnosis process for certain diseases must be processed in a specialized and intricate way.
• Since the diagnosis criteria can differ by medical staff due to social bias and differences in specific expertise and experiences, it is crucial to choose multiple specialists and undergo an agreement process to ensure the quality of datasets, as well as train workers and prepare detailed work instructions.
• Depending on the data type, the data to be labeled, the scope, specific procedures, and the labeling tool can differ in data labeling. The following outlines the general labeling process, and there must be training for workers and guidelines according to the work procedure.
✓ Acquisition and cleansing of data: Acquire raw data and cleanse the data.
✓ Arrangement of the target and range of labeling: Define the target and range of items to be labeled within the raw data. Specific standards, in particular, must be prepared for each data type (e.g. partial labeling of data, de-identification of personal data, definition and management of class).
✓ Establishment of labeling methods and procedures: Determine work methods (automated, semi-automated, or manual) according to the information needing to be labeled, and prepare detailed work standards, including work allocation and labeling standards by data.
✓ Labeling: Perform data labeling after training the worker based on detailed work (according to the pre-determined work method, select an appropriate labeling tool and conduct training in the case of automated or semi-automated work).