Auto-Count Symbols in Portable Document Format (PDF)
Florian, Andrew
Journal Title
Journal ISSN
Volume Title
Estimating electrical costs often involves counting symbols in a PDF document. Existing
software has sped up this process compared to manual counting, but there is room for
further improvement. The proposed solution builds on open source components to efficiently
search a PDF document for the outlines of all symbols, including letters or numbers, used
by electrical engineers to differentiate between otherwise similar symbols. It then sorts these
outlines into groups and counts each occurrence. Symbol for symbol, it takes less than half
the time required by two leading competitors. Unfortunately, current settings often produce
numerous sub-groups which need to be combined to provide meaningful totals. K-means and
other improved clustering methods are being explored. The proposed concept could also be
helpful in other similar applications that identify symbols or text in images.