• Generally, data collectors are chosen to collect data for the development and advancement of AI systems in the public sector.
• Not considering various combinations of scenarios in the data collection may result in bias due to the standards of the data collector. Examples of scenarios according to the purpose of each service are as follows:
✓ Scenario factors of data for AI image or video recognition: Weather, time, background, size of the object, etc.;
✓ Scenario factors of Q&A data for chatbot AI: Regional dialects, textese, etc.; and
✓ Attribute factors of speakers’ audio data for speech recognition AI: Tone, accent, speech tempo, etc.
• Prepare guidelines for data collection to reduce human bias in data, recruit a diverse group of data collectors to avoid data with specific backgrounds and dispositions, and secure a sufficient number of reviewers for collected data.