11-09-2023, 03:21 AM
You're doing something similar to what I do but unless I'm missing something you're not getting the last page automatically set in your csv file.
My workflow is this:
Either:
Create a csv file of the table of contents either by typing it manually or by ocr-ing it with tesseract, which works more-or-less well depending on the source material.
Edit the csv file with vim since it's very easy to to delete junk characters en masse using macros and regular expressions.
or:
Yank the existing bookmarks out of the pdf file using this script:
#!/bin/bash
echo Extract bookmarks from pdf and convert to csv file \(April 22, 2022\)
if [ $# -eq 0 ]; then
echo "pdf filename must be specified on commandline"
exit 1
fi
pdftk "$1" dump_data | grep -e BookmarkTitle -e BookmarkPageNumber > 1.tmp
sed -e "s/\™//g" -e "s/\"/\'/g" -e "s/\&/\&/g" -e "s/\'/\'/g" -e "s/\’/\'/g" -e "s/BookmarkTitle: //g" -e "s/BookmarkPageNumber: /;/g" 1.tmp > 2.tmp
paste -s -d' \n' 2.tmp > 1.tmp
file_name=${1%.pdf}.csv
sed "s/ ;/;/g" 1.tmp > "$file_name"
rm 1.tmp 2.tmp
Then I use libreoffice calc to recalcuate the page numbers if the toc doesn't match the actual page numbers. You can do that in one shot by creating the formula and copying it into the entire column then save-and-reload the csv file and delete the original column. (If you don't do the save-and-reload before deleting the original column it loses the page numbers altogether since it's still working off of the original formula.)
If I want to add bookmarks to my original pdf file (which I usually do if the original pdf file doesn't have any) I use this script:
#!/bin/bash
echo Convert csv file to bookmarks for use with pdftk update_info command
echo This expects the input file to be formatted as title\;pagenumber
if [ $# -eq 0 ]; then
echo "input csv filename must be specified on commandline"
exit 1
fi
sed "s/^/BookmarkBegin\nBookmarkTitle: /" "$1" > 1.tmp
sed "s/;/\nBookmarkLevel: 1\nBookmarkPageNumber: /" 1.tmp > "$1".formatted
rm 1.tmp
pdfname=${1%.csv}.pdf
pdftk "$pdfname" update_info "$1".formatted output "$pdfname".new
rm "$1".formatted
This works fine but I don't get the last page of each song when I import the file into mobilesheets.
One nifty side-effect of this is that I end up with a csv file of every song in my mobilesheets setup, which allows me to do this:
#!/bin/bash
cd ~/misc/sheet\ music\ csv\ files/sheet\ music/
SEARCHTERM=$(zenity --entry --title="Sheet Music Lookup" --text "Enter Search Term")
grep -R "$SEARCHTERM" . | zenity --text-info --title="Sheet Music Lookup Results" --height=800 --width=1500
That just reads my csv files and lists every song that matches the SEARCHTERM.
Since I keep a backup copy of the mobilesheets database file on my computer I could just read that directly and get the same information (and I've considered writing a little frontend program to do that) but so far I haven't put in the effort since the grep command here works fine even though it's rather inflexible compared to a direct lookup.
My workflow is this:
Either:
Create a csv file of the table of contents either by typing it manually or by ocr-ing it with tesseract, which works more-or-less well depending on the source material.
Edit the csv file with vim since it's very easy to to delete junk characters en masse using macros and regular expressions.
or:
Yank the existing bookmarks out of the pdf file using this script:
#!/bin/bash
echo Extract bookmarks from pdf and convert to csv file \(April 22, 2022\)
if [ $# -eq 0 ]; then
echo "pdf filename must be specified on commandline"
exit 1
fi
pdftk "$1" dump_data | grep -e BookmarkTitle -e BookmarkPageNumber > 1.tmp
sed -e "s/\™//g" -e "s/\"/\'/g" -e "s/\&/\&/g" -e "s/\'/\'/g" -e "s/\’/\'/g" -e "s/BookmarkTitle: //g" -e "s/BookmarkPageNumber: /;/g" 1.tmp > 2.tmp
paste -s -d' \n' 2.tmp > 1.tmp
file_name=${1%.pdf}.csv
sed "s/ ;/;/g" 1.tmp > "$file_name"
rm 1.tmp 2.tmp
Then I use libreoffice calc to recalcuate the page numbers if the toc doesn't match the actual page numbers. You can do that in one shot by creating the formula and copying it into the entire column then save-and-reload the csv file and delete the original column. (If you don't do the save-and-reload before deleting the original column it loses the page numbers altogether since it's still working off of the original formula.)
If I want to add bookmarks to my original pdf file (which I usually do if the original pdf file doesn't have any) I use this script:
#!/bin/bash
echo Convert csv file to bookmarks for use with pdftk update_info command
echo This expects the input file to be formatted as title\;pagenumber
if [ $# -eq 0 ]; then
echo "input csv filename must be specified on commandline"
exit 1
fi
sed "s/^/BookmarkBegin\nBookmarkTitle: /" "$1" > 1.tmp
sed "s/;/\nBookmarkLevel: 1\nBookmarkPageNumber: /" 1.tmp > "$1".formatted
rm 1.tmp
pdfname=${1%.csv}.pdf
pdftk "$pdfname" update_info "$1".formatted output "$pdfname".new
rm "$1".formatted
This works fine but I don't get the last page of each song when I import the file into mobilesheets.
One nifty side-effect of this is that I end up with a csv file of every song in my mobilesheets setup, which allows me to do this:
#!/bin/bash
cd ~/misc/sheet\ music\ csv\ files/sheet\ music/
SEARCHTERM=$(zenity --entry --title="Sheet Music Lookup" --text "Enter Search Term")
grep -R "$SEARCHTERM" . | zenity --text-info --title="Sheet Music Lookup Results" --height=800 --width=1500
That just reads my csv files and lists every song that matches the SEARCHTERM.
Since I keep a backup copy of the mobilesheets database file on my computer I could just read that directly and get the same information (and I've considered writing a little frontend program to do that) but so far I haven't put in the effort since the grep command here works fine even though it's rather inflexible compared to a direct lookup.
If you're a zombie and you know it, bite your friend!
We got both kinds of music: Country AND Western
We got both kinds of music: Country AND Western