MobileSheets Forums

Full Version: Extracting songs from pdf files... ideas?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi. I have been using the MobileSheets companion to extract song sheets from a large pdf song book. For example, song "A" is on sheets 1-5, and song "B" is on sheets 6-8 in the pdf. I would like to create independent pdf files for each song A and B. While MS companion does do this, it creates .png (image) files which are harder to work with than pdf files in MS.
I can do this manually by 'printing to a file' from the large pdf, one at a time, but this takes a lot of work/time to do all the songs in a pdf.

Are there any tools out there that will take a pdf file and automatically create all of the independent pdf files for each song?

thanks!
Rod
I believe that the only application that can do this is the (expensive) full version of Adode. However there are some online sites that offer this functionality.

One such is: www.sodapdf.com/split-pdf/

It isn't totally anonymous as you have to provide an email address to receive the files and they probably bombard you with advertising afterwards. I have used similar sites for other purposes and I have a second, anoymous email address that I use when I have to provide it to sites that I don't totally trust. mail.com offers free anonymous emails that you don't have to provide mobile numbers or other data.

Good luck!
I do not split my fakebooks. I keep them as a complete book, prepare a CSV file for each book and import single songs as soon as I need them using MSPs "CSV import" feature.

But if your personal workflow is different and you still want to split your PDFs:
PDFsam (split and merge) is a fine tool to do that (there are others too), the free version is fully sufficient.
https://pdfsam.org/pdfsam-basic/#split-pdf
https://pdfsam.org/download-pdfsam-basic/
Splitting automatically will always require some sort of index that specifies the song titles and the pages.
If you have (or take the time to make) such an index, you can also use it to have MSPro treat your big pdf file as individual songs without splitting, as itsme wrotes in the previous posting.
As itsme and Sciurius already said, there will always be manual work required since you have to tell the program what you want to split and how to name the files.

If you want to use it for MSP only csv and an import with MSP is the way to go. There is already a big thread about it in this forum with several csv to download for fake books.

If you indeed want/need separate pdfs I wouldn't use Acrobat. I would split the pdf in single files with a tool like pdfsam or pdftk and then make an index or rather a script/a batch to rename the single pdfs (if they're one pagers) and join those needed for multipagers (again with a batch with pdftk). As I said it's a bit work but you get quicker results than doing it with PDF editors.
Thanks for the ideas everyone. As Sciurus says, all (present) methods require an index. I guess I was hoping that there was a "magical tool" that could do this without an index...

I'm no software guru, but I could see how a tool could input a large pdf songbook, reading it page by page. If a page had >40 point text (for example) at the top of the page (the song's title/name), this would be marked as a "first page" (and this large text would also be used as the split-out pdf name). The tool would continue reading the next pages, until it hit the next "first page", or the end of the file. Just dreaming ;-)
Theoretically, someone (e.g. me) could whip up such a tool. But I foresee that the criteria for a new "first page" will be very different from case to case.
Often there is a table of contents in the PDF that could be useful.

But creating an index is not such a hard job, needs to be done only once, and has multiple purposes.
In case the PDF has bookmarks, PDFsam can use the bookmarks for splitting and MSP can use the bookmarks instead of a CSV for import (just song title and pages, no additional meta data)
If I need a CSV I try at first to find one. See the thread in this forum, ask me or Robipad.
If the PDF has some  kind of table of content copy/paste often works.
If it is a scanned PDF, the TOC can be created via OCR. That usually needs proof reading.
If that fails I search the web. Many websites that sell fakebooks provide a list of contained titles.
There are websites out there that offer fake book indexes
Having created a list of song titles it needs proof reading and least adding or correcting the pages. I usually also add the keys of the songs.

If you have invested all that effort, please share your results