Bug duplicated linebbox
Solved
- ignore last page if doctype 2 or 3
- if there are multiple same textlines in 02_xml, all but one is deleted in data/logs there is a csv file created with all the deleted duplicates
- when we call .xml on a Document object the corrected xml is automatically recomputed if it is older than CONSTANTS.XML_versions
- in data/test_set_file_ids there are now files containg ids of documents we are gonna use for testing
You can delete the branch afterwards