Performance Report for July 2022

Name: Vedant Madane

Code: 1719

Month: July

Year: 2022

Designation: Project Trainee

Document OCR (Optical Character Recognization):

In this month, work was done on preparing the PoC (proof of concept) of extracting the document type and text contents of a PDF file.

Following approaches were tried:

  1. Extracting the text from the PDF directly using an appropriate language model.
  2. Converting the PDF to an image and then extracting the text.

The challenge in the first approach is that our models require that the language the document is in to be provided beforehand so that the appropriate language model can be loaded.

But this information is ambiguous and unclear especially for caste certificates which contain handwritten text in multiple languages.

Please see this document for implementation details:OCR Challenges

Entity Registration Testing:

The Entity Registration Page was thoroughly tested and bugs found were systematically documented as mentioned in this document: Entity registration bugs

214 hours were logged in the month of August out of which one fifth was spent on software development.One third of the total logged time was during work hours. Out of the time logged during work hours, ⅖ th was spent on software development with a productivity pulse of ¾. Productivity pulse was ⅛ th better during office hours in comparison to all time.24 days had more than 2 hours per day of total productive time:

22 days had less than 2 hours spent on unproductive activities like meetings:11 days had more than 4 hours spent on all productive activities:

Less than 1 hour a day was spent on distracting activities this month:Most of the time is spent on Visual Studio Code followed closely by email client: