The Challenges
- Keenly detailed annotation to train ML model for image and video analysis
- Complex manual task for capturing intricate details such as objects, actions and environmental setting
- Consistency across video transcription and image caption
- High quality audio mapping, speaker identification for video transcription
- Specific datasets in standard formation for quick integration in ML model
The Solution
- Created a customized roadmap to define attributes and metadata for each image and video
- Ensured uniformity and precision for setting image and video annotation guidelines
- Manually reviews and captioned each image (descriptions, action, environment settings)
- Assigned team for audio to text transcription that includes speaker identification
- Implemented QA process featuring multiple reviews