MobileSheets Forums

Full Version: Script to convert pdf bookmarks to csv files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Being a big believer in making the computer do the heavy lifting, I just wrote this simple script to extract the bookmarks from a pdf file and reformat them into a csv file for import into mobilesheets.

The script requires pdftk, paste and sed.  The latter two (paste and sed) will likely be installed by default on any Linux computer since they are standard Unix utilities.  pdftk is also available for most Linux distributions but it may not be present by default so you might have to install it if it's not already there.

(If you're not sure, the command "which pdftk" will tell you if it's already installed.)

To use the script simply mark it as executable and specify the name of the pdf file on the commandline.

Example:  pdfbookmarktocsv.scr myfile.pdf

That will process a pdf file named myfile.pdf and you will get a file named myfile.csv out of it.

Of course, the pdf file that you're processing must have bookmarks in it.  If there are no bookmarks, then the output file will have no content.

EDIT:  I should point out that the advantage of doing this rather than a straight pdf import (which is supported by mobilesheets) is that the csv file is more flexible and gives you a chance to add any fields or edits that you want before proceeding with the import.
Nice. Unfortunately pdftk is hard to get (it seems banned due to licensing issues).

A simple yet effective Perl program with a similar purpose can be found in my MSPro tools on github: .

Several other tools exist on the interwebs.
I used PDF2CSV several times successfully. It works nicely. Thanks for creating and providing that script.

I usually like to keep an additional column "PDFPage" containing only the start page of every song. It is useful e.g. to switch conveniently between sorting alfabetically and sort by page.
Find an example here:
This CSV file works fine for MSP as unknown columns are simply ignored.

How about adding an option that PDF2CSV adds such a column into its output file? Necessary data are probably already available internally.
I've checked in a new version of in the repo. It has additional possibilities for the extra columns:

--extra -x KEY=VAL
            Add additional fields with value *VAL* to the CSV.

            *VAL* may contain substitution strings in the form "%{XXX}",
            where *XXX* can be one of "startpage", "endpage", "pages" (the
            page range) and "pagecount" (the number of pages in the PDF.

            This may be repeated.