• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Script to convert pdf bookmarks to csv files
#1
Being a big believer in making the computer do the heavy lifting, I just wrote this simple script to extract the bookmarks from a pdf file and reformat them into a csv file for import into mobilesheets.

The script requires pdftk, paste and sed.  The latter two (paste and sed) will likely be installed by default on any Linux computer since they are standard Unix utilities.  pdftk is also available for most Linux distributions but it may not be present by default so you might have to install it if it's not already there.

(If you're not sure, the command "which pdftk" will tell you if it's already installed.)

To use the script simply mark it as executable and specify the name of the pdf file on the commandline.

Example:  pdfbookmarktocsv.scr myfile.pdf

That will process a pdf file named myfile.pdf and you will get a file named myfile.csv out of it.

Of course, the pdf file that you're processing must have bookmarks in it.  If there are no bookmarks, then the output file will have no content.

EDIT:  I should point out that the advantage of doing this rather than a straight pdf import (which is supported by mobilesheets) is that the csv file is more flexible and gives you a chance to add any fields or edits that you want before proceeding with the import.


Attached Files
.zip   pdfbookmarktocsv.zip (Size: 468 bytes / Downloads: 21)
If you're a zombie and you know it, bite your friend!
We got both kinds of music: Country AND Western
Reply
#2
Nice. Unfortunately pdftk is hard to get (it seems banned due to licensing issues).

A simple yet effective Perl program with a similar purpose can be found in my MSPro tools on github: https://github.com/sciurius/MSPro-Tools/...pdf2csv.pl .

Several other tools exist on the interwebs.
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply
#3
I used PDF2CSV several times successfully. It works nicely. Thanks for creating and providing that script.

I usually like to keep an additional column "PDFPage" containing only the start page of every song. It is useful e.g. to switch conveniently between sorting alfabetically and sort by page.
Find an example here:
https://zubersoft.com/mobilesheets/forum...-9320.html
This CSV file works fine for MSP as unknown columns are simply ignored.

How about adding an option that PDF2CSV adds such a column into its output file? Necessary data are probably already available internally.
first language: German
Acer A1-830, Android 4.4.2 - HP x2 210 G2 Detachable, Win 10 22H2 - Huawei Media Pad T5, Android 8.0 - Boox Tab Ultra C, Android 11
www.moonlightcrisis.de - www.basdjo.de - www.frankenbaend.de


Reply
#4
I've checked in a new version of pdf2csv.pl in the repo. It has additional possibilities for the extra columns:

Code:
--extra -x KEY=VAL
            Add additional fields with value *VAL* to the CSV.

            *VAL* may contain substitution strings in the form "%{XXX}",
            where *XXX* can be one of "startpage", "endpage", "pages" (the
            page range) and "pagecount" (the number of pages in the PDF.

            This may be repeated.

HTH
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
Reply




Users browsing this thread:
1 Guest(s)


  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2024 MyBB Group.