At university, we were given a project: detect players of a marathon at checkpoints with a camera from only their BIB numbers.

Online, almost nothing existed; only one found had small dataset of ~100 images.

We used a normal YOLO to detect people and easyocr to detect text written on them, we created a dataset from YouTube video (~10'000 images after some cleaning)

From this dataset we trained a new specialised YOLO in bib number detection thanks to our university AI server, combined with paddleocr to read poor quality text.

We put everything in docker with an S3 minio to save images from the checkpoints, a postgres database for the records, a custom website for easier navigation, an OSSRS server to receive different camera streams, and finally our AI model with multithreading and batch processing to go as fast as possible.

I through that something was going to break, but in the end, everything worked almost perfectly. The only issue was that if a runner passed in the border of the camera, the numbers were cut but still read.

We expected way worse results, but it was way better than we though.