바로가기 메뉴 본문 바로가기 주메뉴 바로가기
  • 06-3Have you checked and prevented potential biases in data labeling?
    Determine applicability: Consider this question if you are collecting and labeling datasets to develop AI algorithms or models in the healthcare sector, and determine if the requirement has been satisfied.

    • When labeling data collected for the development of a healthcare AI model, medical expertise and prior labeling experience are required. Because of this, labelers in the healthcare sector can be divided into healthcare professionals with specialized domain knowledge and crowdworkers who lack specialized domain knowledge but can complete labeling tasks quickly.

    • There may be biases in labeling due to the reflection of labeler’s specific intentions, mistaken omission of features, and unconscious judgment. The following is an example of bias based on the classification of labelers (healthcare professionals and crowdworkers).
    ✓ When labeling is performed by healthcare professionals, bias may occur due to a lack of understanding of the labeling process, and incompetence with labeling work and tool use.
    ✓ When labeling is performed by crowdworkers, bias may occur due to a lack of medical expertise and inconsistent work and judgment standards.

    • Hence, labelers should identify potential causes of bias in advance and prevent bias by evaluating the labeling result and training on work standards. It is also best to recruit diverse labelers to minimize bias in each labeler, or have a sufficient number of reviewers to prevent bias.