MobileSheets Forums

Full Version: BIG MESS: Almost all songs seemed to be duplicated by Batch Import
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Big mess, for a while, a scary ghost experience. After a massive Batch Import (new fakebook), MSPC showed almost double the number of songs in the library, 8969 Songs. Yesterday if was less than half this. It not only added new songs I was importing, it re-added all my existing songs. Except, it was a ghost...

As I've done before, I selected the one folder on my PC where I store all songs I want on my tablet, then I had MSPC do a Batch Import. I always set it to Avoid Duplicate Songs, but to NOT Update songs if matching, not Automtically crop, etc. I figured this was a simple way to get all the new songs into MSP, rather than manually select each of them. This has worked before, no problems. 

MSPC ran overnight, showed the normal songs counting along, and then finished. Looking in MSPC, Batch Import added all the new songs, just once. But any song that already was in the library got added AGAIN, creating duplicate records for most than 4 thousand songs. Note that Avoid Duplicate Songs is and was always selected. 

Since the duplicates in MSPC show identical path\file name, theyh can't be duplicate files, they must be are massive duplicate database records. Could this happen if Avoid Duplicate Songs failed to function? Or something else, apparently...

Looking for other oddities, I tried to do a single song import into MSPC. It would let me select the song PDF, but then nothing happened. No error, but the file did not get added to MSPC. 

The only change in my system was MSPC auto-updating itself, which I believe happened just before the latest incident. 

More odd: Disconnecting tablet from PC, checking tablet, NONE of the just-imported files are there. Note that the import process ran for maybe 12 hours, then "completed". But what was it doing? When I terminated the connection on the tablet it popped up a java error. 

So, it seems the PC-tablet connection failed. But MSPC did not detect this, and the tablet didn't show a connection failure either. MSPC spent many many hours importing files and presumably updating the tablet, then seemed to complete. My only clue it didn't really happen was seeing all the dupes in the MSPC songs list. 

Upon reconnection, I created a current backup on tablet. Then I reconnected tablet and PC. This reloaded the MSPC database from the PC, showing that there are no duplicates, and none of the new songs either. It's like it didn't really happen, though MSPC kept itself very busy for 12 hours or so, counting through the songs. Why did it not detect that something had failed? 

Now reconnected with a seemingly normal database, I created a fresh local backup in MSPC. This seeemed to work, except it perhaps got hung. It shows this:
"Transferring data from tablet..." 
Progress shows 100% (all green)
Current Song shows a name. 
Current File is blank (how can there be a song name but no file (I know this song file, it's a 2-page PDF). 
The only button option is Cancel. 
I don't want to cancel, I want it to finish. Been this way for almost 15 minutes, seems frozen. 
Windows Task Manager shows no MSPC activity. 
Tablet shows no activity since I made a local backup 15 minutes ago. 

I can Cancel and try to make a backup again. 

Or I can Cancel MSPC backup for now, and try Batch Import again. MSPC backup is not crucial since I have all the PDFs separate stored, the main benefit is if/when the tablet version of the library fails (as it did recently when the storage card went bad). 

Or what? Am I overlooking something? Or was this just a random connection failure that didn't detect so it kept the Import chugging along? If \so, why has MSPC backup apparently stopped before completing? What should I try next?

PS: I really wish MSP / MSPC had an auto-backup option. Perhaps on a schedule, or simply upon closing ask if a backup is desired at that point. Could save a lot of grief.
Update: MSPC never finished backup, though tablet message implied that it did. Restarted everything. Tried again, MSPC backup completed. 

Now, armed with tablet and PC library backups, trying Batch Import again, again being sure Avoid Duplicate Songs is selected. Seems to be counting through songs more quickly. 

But unexpected: When Batch Import encounters a file of the same name, it stops and asks me to select an Action (Use Original File, Rename, Skip, etc). Since I told it to Avoid Duplicate Songs, why does it not just skip them? What does "Avoid" mean?

So, I select Skip File and check Apply selection to all conflicts. But is vague, I can only hope that "conflicts" means the current conflict, same File name. (Not sure what other conflicts would be possible...)

More oddly, it stopped on just one file about 3,000 files into the process. But many more duplicate files exist, a few thousand since I'm doing Batch Import on my PC's entire Tablet folder, hoping Avoid Duplicate Songs literally does that, so only truly new songs get imported. The first 3000 had many duplicates, probably 80% them, so why did I not get a same file name action request many times?
More update and PROBLEM: MSPC declares Batch import successful, 684 songs were added. Scanning the list, it shows the new songs in the database. There are no duplicates (yay).

BUT tablet shows several songs added, then it stopped, saying file is locked. There is no difference between that file and any others. The file is not open in Windows, nothing special about it.

Further, tablet says "Skipping to next fi" (literally, "file" is cut off), but nothing is happening. The process is hung. MSPC doesn't know this. Tablet is simply stuck.

This is different from yesterday's behavior, but likely the same root problem. But what is the problem? And solution?

After many minutes of tablet hung, no choice other than Disconnect (on tablet). Of course, this likely means that the tablet database will not match the actual few files that got sent from PC to tablet. Or maybe MSPC database won't match. Or what?

Then reconnect. But now, the Import is lost, the database on MSPC re-pulled from tablet, has none of the new songs. Seems pointless to try it all again, though I will.

What can I do to get out of Groundhog Day?
Seems to have succeeded, but not 100%. Many of the new songs were Batch Imported and are now in the database and on the tablet.

BUT, some songs were mysteriously skipped, including songs that used to be on the tablet and now are gone, plus new songs that should have been imported.

Keep in mind, I'm selecting an entire folder, so all the files should be processed, and those that are not on the tablet should be imported. Not what happened.

And, yet again, the import process stops on one particular file -- stops EVERY TIME on SAME FILE -- asking to choose an Action. My Import always specifies that the Title is the File name, so how could this file be deemed different? I always say to apply the action (Skip) to all other conflicts, otherwise I might be here all day (well, that's already happened), so I don't know how many more might trigger Action. But again, it's around file 3000, many more that are just as much same file name as this one. Pretty weird, because I find nothing unusual about the PDF. It's just mid-sized 80K and displays fine. AND, Import is supposed to Avoid files of same name, so why does it keep stopping asking for Action on just this one file? I'm confused.

Re-running Batch Import, it still skips the same bunch of PDFs in the folder, they are not in the database but not importing either.

I'm restoring database on tablet to try again.
Well, MSPC fails to Batch Import what it should, based on everything I see and have tried.

I restored a tablet backup from last January, 4211 songs. I ran Batch Import using my TAB folder as source. It has 4736 song PDFs. MSPC added 648, resulting in 4854. Hmmm... more than the source file. Well, that could be explained as songs already o the tablet that are not in my PC source folder, since MSPC does not truly synchronize (two-way). But even after it imports 648 files, a bunch of TAB folder files are not getting imported. Again and again, same omissions. These are NOT duplicates of what is on the tablet -- different file names, different song titles (only somewhat different in some cases, but definitely different).

Also, Import always stops on the same one PDF. Even if I say Overwrite, thinking I'm make the files identical, next time it stops on that file again.

On PC I can drag and drop missing files and then end up in MSP. So the files are not the problem. Batch Import is what is questionable. (Adding all the missing files manually is not practical, I'd have to work the entire library in MSPC and PC figuring out what is where.)

HOW can I get MSPC to Import ALL the files in my PC source folder???

Understand, since I have all my files on my PC, I could delete the tablet library and start over, but I would lose annotations and setlists and collections etc. Not the best outcome.
I'll try my best to respond to everything you've posted while keeping it cohesive:

First, I'll start by saying I don't typically try batch importing a folder of 3000 files very often using the companion application, so I'll have to run through some more tests with this. In theory, it shouldn't be a problem. 12 hours does seem like an extremely long time to run through 3000 files though, so something may be off there. It may have something to do with your network setup, I'm not sure, but I'll see how long it takes me to perform a similar import. As for the backup that hung, I don't see that problem with my network setup, but it usually means the companion app was waiting for some final data from the tablet, but it never got it. That could indicate data loss on the network, but I'm not sure. It's TCP, so in theory the data should be reliable and should be resent if needed. If you are seeing this problem often, perhaps I need to run repeated tests until I can reproduce the problem (if possible). How large is your backup file? I'm still not a fan of automatically backing anything up, but if you want me to create an option to prompt the user to create a backup every so often (something like, "It has been a week since your last backup, would you like to create one now?"), I can do that. Backups take a very long time to complete, so it's not something I want to be kicked off automatically without the user initiating it.

When it comes to determining whether a duplicate file is being imported (so that the "Avoid duplicate songs" setting takes effect), there are a number of things that have to be checked:

1) First, the output path for the file is constructed using the storage settings. This takes into effect things like "Create Subdirectory per Song" and "Add Unique Id to Filenames". If you change these settings, you are going to change the output paths for the files, so no duplicate files will be detected and all files will be imported. There are also probably scenarios that could cause issues like creating a copy of a song using the same file as the original with a unique ID at the end of the filename. When MS Pro goes to check for a duplicate file, it will use the database id of the copy, not the original, so no duplicate file will be detected.
2) If a file is found at the constructed output path, a CRC check is done to see if the file matches exactly what is on the tablet. If it does not match what is on the tablet, then it's obviously not a duplicate file as something in the file has changed.
3) If the CRC doesn't match, you will be prompted to choose how to handle the conflict (which is apparently happening every time for you with that one file).

You said at one point that the tablet hung and/or files were locked. I can't explain that. I've never seen that happen after a library backup or a batch import. If you are seeing that on the Android side, I'll have to do more testing to see if I can reproduce it.

I also don't know why any files would be skipped unless either the companion app can't process them due to them being locked or something, or if they have been determined to already reside on the tablet. Please remember that if a file exists on the tablet in the output location (REGARDLESS of whether a song is tied to that file), then the file will be skipped. This means that if a batch import completed, copied files over, but for some reason the database failed to be updated, there may be files remaining on the tablet that are going to cause issues if you perform the same import again. The only way for me to avoid this scenario would be if I change the duplicate file detection so that an existing song has to be present in the library using the file before I will consider it something to be ignored. Otherwise the conflict dialog would need to be shown. Would you prefer this behavior?

One thing you may want to consider doing is clearing the application data for MS Pro (and probably verify that the storage location is truly empty), and then perform a library restore, and finally the new batch import. This would probably eliminate some of the problems you are seeing.

I've decided that checking to make sure a file is actually being used in the library before declaring it a duplicate is probably the right behavior. So I'm modifying the code at the moment to perform that check.
Mike, thanks for the reply and explanations. I don't have other indications of network problem, but I moved the tablet to within a few feet of my router just in case.

What is really helpful is knowing how a duplicate is detected, and whether comparison happens via database or file. You may recall some time ago I suggested a database rebuild option or tool, for the (rare) times when they get out of sync. Re-read the files and adjust the database accordingly (for all records that should have attached file). I'm thinking that actual files are the baseline, because a database record that must point to a file (vs. a placeholder, etc) is worthless without a file. A PDF (or image) file without a database record is also worthless. Scanning either way would work, to be sure there are no orphans (even better, re-connect when possible, then list whatever is still un-matched).

But here's an example that hard to explain.

First: I use one flat folder for ALL my MSP files. No subdirectories. All my files are uniquely named by me, so no use of or need for Unique ID. Unless I slip up and do a typo, a file I add now is named using the same precise convention I've been using for years. Example: A Whiter Shade Of Pale [C] UFB.pdf. Format is Title [KEY] ID.pdf (ID is my source, such as initials of fakebook name, etc)

To experiment I chose two files that are not showing in MSP, but every time got skipped by Batch Import. I manually added two files from PC to tablet via MSPC. One of these files was new, the other had been in the PC library for a long time but not being imported. Both files manually added to the library with no problem, no indication that MSP already had the songs or files. Looking at the tablet, both files have their original correct name. Manual adding worked, yet MSPC Batch Import skipped both of them again and again This is the mystery.

I wondered about Batch Import with the size of my library, approaching 5000 songs and continuing to grow. If perhaps the size is breaking the import process, I could do subsets (A-D, E-K, etc) if you think that would help, and if there's a straightforward way. The challenge is how to select hundreds of files as a batch rather than thousands.

What else can I explain or try to help you?
That's what the "Find Missing Files" feature is for under Settings->Other Settings. It can't find orphan files though without any song attached to them. I could certainly add something to check for that.

Try placing those two problematic files in a separate folder and batch import just that folder. Are they identified as duplicates? If so and you go on the tablet and search for those files in the storage location, can you really not find them?

I don't know if batch importing that many files is a problem or not, but it's not something I test with often. You certainly could try breaking up your files into smaller sections, but I don't think you should have to. I'll just have to run more tests with a similar setup.

If you add it manually (meaning through a song editor window), it will just create the song and copy the file over, overwriting the original. That's why that works. If the song was not a duplicate, then you would have been presented with a file conflict dialog in the song editor window (I just verified it all works this way). If you drag & drop the file into the companion app, it should detect the duplicate file for you though. If that didn't work, then that's bizarre.

I'm running tests with the batch import right now. If I batch import the same directory twice, it correctly ignores the files first imported. I'm now going to run through tests with ~3k files.

One thing I just ran into during tests was the fact that Windows is not case sensitive when it comes to file names, but Android is. So this created some scenarios where the logic failed due to Android having stored the file with lower case but Windows asked it to find the file using upper case. It looks like my logic needs to be updated to properly handle this.

This is going to get messy. Different parts of the Android file system can be case sensitive or case insensitive. I won't really have a way of knowing beforehand. This creates some complexity in my logic when it comes to determining if any songs use a given file path.
Drag-and-drop to Add a file works, and detects if the file already exists. But before it existed on the tablet, the very same file when included in Batch Import was being skipped.

Case sensitive doesn't seem to be a factor in my specific files, unless somehow the case got changed during import which then conflicted with a later attempt. All my files are proper cased since they are the actual song titles I want in the database. All naming and manipulation happens in Windows.

Should I try drag-and-drop of my entire set of files (close to 5000), maybe tried in batches? I don't know which files are missing, that alone would be a huge job, though I'd start with the "A..." songs since I know some of them are missing. I'll end up with many many messages that files already exist, but doing this might eventually get all my currently-lost files added. Of course that wouldn't seem to "fix" the issue with Batch Import that I'll need for the next fakebook, etc.

New clue, maybe:

I'm definitely missing songs in the top of the alphabet, down to "An..." But another "An..." imported, so likely the exact letters aren't the problem. Looking further, in Explorer that's my first 252 or so files. However, MSPC shows them in different order, dropping leading articles and certain punctuation, so I don't really know how whether the skipped files are a significant number that would be a strong clue. .But maybe Batch Import is skipping approximately the first 250 files, could be an additional 50 or more that have leading articles/punctuation.

Or is it by total list size? The entire folder I'm trying to Batch Import (skipping dupes) is 4658 + 78 in my one subfolder (XMAS songs), total 4736. Maybe the Import only handles 4500 files (not counting subfolder)? Or could it be affects by file/title lengths? (I could try Batch Import omitting the subfolder if you wish).

I see a file counter zip by when Batch Importing, could that have an affect or provide clues?

Also, can't prove this but I'm getting the sense that certain songs have been missing from MSP even before this. I keep everything in my Windows TAB folder, and I'm spotting random songs that are there but not on my tablet. Didn't notice so I don't know when or why or how it was overlooked. Could be my error, or could have been skipped by prior Batch Import.

By the way, the reason I'm doing what I'm doing is my system for managing sheet music PDFs. I have many thousands, 5 times more than I'm pulling into MSP. Each scanned fakebook/music book is in its own folder, named with the ID of the source (Ultimate FakeBook is UFB, etc). I find the best version(s) of each song that I care to play and copy it into my TAB (for Tablet) folder). Since all my files have the source ID at the end of the filename, I can manage this easily. When I find a better version, I put it in TAB and remove the old version.

With folder TAB on my PC as my master, I simply need to copy (Add or Import) the entire folder into MSPC and thereby into MSP on my tablet. Then periodically update, I thought via Batch Import with Avoid Duplicates, when I add more new songs to TAB, or update by adding/deleting. I really just need to have MSP match TAB. (This is a common professional method, nothing I invented, with bands/orchestras/shows etc where they maintain a master library.)

In my situation (and others who do the same method), I would love a way to batch import/refresh/sync the entire MSP with the Windows folder. Some control over this would be needed.

Option: If a PC file does not exist on tablet, import it absolutely (what I'm trying to do). Bypass any checking other than the OS limitation of non-duplicate path/file name. Something like "Copy file to tablet even if the song already exists with same or different title". Right now, only manually Adding seems to do this with the Batch-skipped songs. Respect that the file name can't already exist, but make that the only constraint. (And/or, give an option to auto-rename the old or new file, appending a counter for instance.)

But also, help deal with files on the tablet that are no longer on the PC (in the specified folder) because right now that's a manual process. (Currently, when I find a song on tablet that I don't want, I put the song in a special collection then later review that collection vs. what is on the PC's TAB folder. I don't know that MSPC should delete files from the PC, but it would be really helpful to flag them, or present a Only-On-Tablet list. Maybe stick a value in a standard or Custom field so all such files can be found.

PS: A handy MSPC feature would a column that shows the full path of each file, to debug goofy and confusing situations.

Sorry to be dumping so much detail, just hoping something gives you a "a-ha".
There should be no limit to the number of files you can batch import. I've been working on fixes for this throughout the day, so I'll want you to try the batch import again after I release the next couple updates and we can see if it all works better for you. We can think about future changes after the current features work reliably for you.

Please let me know when there's something to test.
UPDATE: After applying new MSP and MSPC updates to tablet and Windows, I tried the Import again. Starting library had 4211 songs. Originally 648 would be imported, should have been more. Restoring tablet backup, then trying again just now, MSPC imported 764. When the process said it a file already existed on tablet, I told it to Retain Original, and checked the box to always do this.

764 was good, but not great. Several songs still did not Import. Looking directly at tablet storage, I find at least some of the files already exist, they just don't have database records. How this happened is a mystery. Whether this explains all the failures is a mystery.

Given that import recognizes that a file already exists, could it also recognize that there's no database record for it? Then either (re)create the record, or at least identify the file name so I could further deal with it. Because it would be a huge job to determine which files are orphans.

Find Missing Files didn't help, of course. Though it found some MP3 files, don't know what it thinks is missing, but I deleted them.

Running Fix File Paths found nothing, not surprising because I have always stored all sheet music PDFs in a single folder.

What else can I do? What I'm trying right now is another Batch Import. I have to check Avoid Duplicate Songs or thousand would be re-imported, I presume. Instead, when it found the first duplicate file and asked me what to do, instead of choosing Use Original File I chose Overwrite, hoping this would recreate the missing db record. (Of course, the action dialog shows the name of the file, but it is not one that seems to be missing, but perhaps other such messages would be...??)

Import just finished, and this time says it imported 766, for a total of 4213. (Prior attempt resulted in 4213). I don't know which 2 files it just added, but the missing records I'm monitoring are still unchanged, files are on tablet but records not in database.

Seems like there needs to be a more brute force repair options, some way to assures that everything exists everywhere:
-- Add Every File on PC to Tablet, skip only if already exists (perhaps this action already happens) AND make sure a database record exists for every file being added.
And/or this repair action:
-- Check Every File for a Database Record and Create Any Missing Records. Likely slow but could be a powerful cure for a messed-up database.

Would an attempt at this be to run Batch Import with Avoid Duplicates not checked, will likely take hours, but would that be worthwhile? My big hope is that I don't wipe out my annotations, collections and setlists. Any re-loaded files would be identical in filename if that helps. What does the Import process do to annotations, lists, etc when the underlying file is replaced with same filename?
Pages: 1 2