03-14-2021, 09:51 AM (This post was last modified: 03-14-2021, 09:52 AM by itsme.)
I don't have a CSV yet, but I can provide a good starting point.
I have a list of titles, composers and the page numbers as printed in the book: CubanFakeBook.xlsx
And I have the book as PDF with PDF bookmarks.
I exported the bookmarks file CubanFakebook_bookmarks.txt (using jPdfBookmarks) that contains the correct PDF pages, did some search&replace (using Notepad++) and converted it to CubanFakebook_EditedBookmarksExport.xlsx
If I had the time to do it I would proceed as follows:
- copy the columns of CubanFakebook_EditedBookmarksExport.xlsx additionally into CubanFakeBook.xlsx
- proof-read and correct titles and composers
- use LibreOffice Calc to calculate Page Order, example formulas can be found in CubanFakeBook.xlsx
- optional: add keys
- make a backup copy of the completed XLSX file
- delete all columns that are not required for the CSV file
- create CSV using "save as" out of LibreOffice Calc
I attached all the mentioned files so you can give it a try. It's worth getting familiar with a workflow as described above, it's much faster than typing everything. Good luck, have fun.
03-16-2021, 04:19 PM (This post was last modified: 03-16-2021, 04:22 PM by itsme.)
There are some small issues left, see attached screenshot
1.
e.g. Alardoso;10-nov;Cuban Fakebook 1
Excel / Calc misinterpreted as date
possible solution: set the format of the "Pages" column to "Text"
2.
some typical issues of OCR'ed texts are still there (you probably used the exported bookmarks file)
e.g. bodeguero, EI - EI with uppercase I instead of lowercase l
what might help
- use a font where these characters look different (e.g. Tahoma instead of Arial, I never understood why Arial became the standard font in so many cases)
- comparing "title" of the provided files CubanFakeBook.xlsx and CubanFakebook_EditedBookmarksExport.xls
it might be helpful to copy the columns temporarily into one XLSX file and add a column with a comparison formula like =WENN(B5<>D5;"x";"") that marks lines with differences
I corrected the mentioned lines, but there are probably more of the OCR issues.
btw.: I use "Albums" for fakebook names and "Collections" for the bands and line-ups I play with, but that's a matter of personal preferences and can be changed easily
(03-16-2021, 04:19 PM)itsme Wrote: There are some small issues left, see attached screenshot
1.
e.g. Alardoso;10-nov;Cuban Fakebook 1
Excel / Calc misinterpreted as date
possible solution: set the format of the "Pages" column to "Text"
2.
some typical issues of OCR'ed texts are still there (you probably used the exported bookmarks file)
e.g. bodeguero, EI - EI with uppercase I instead of lowercase l
what might help
- use a font where these characters look different (e.g. Tahoma instead of Arial, I never understood why Arial became the standard font in so many cases)
- comparing "title" of the provided files CubanFakeBook.xlsx and CubanFakebook_EditedBookmarksExport.xls
it might be helpful to copy the columns temporarily into one XLSX file and add a column with a comparison formula like =WENN(B5<>D5;"x";"") that marks lines with differences
I corrected the mentioned lines, but there are probably more of the OCR issues.
btw.: I use "Albums" for fakebook names and "Collections" for the bands and line-ups I play with, but that's a matter of personal preferences and can be changed easily
yes,
I don’t know how to solve the problem, 10-nov ( instead of 10-11 ) etc... I have put a "text" format for the cells but I always have a date instead of 10-11 when i open again