MobileSheets Forums

Full Version: Internal: file hash
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
What algorithm do you use to calculate the hash for each file, as stored in mobilesheets_hashcodes.txt?
It's a murmur3 hash using 0xC58F1A7B as the key.
Thanks.
What algorithm do you use to map the folder names to the short forms (eg. 1Nj1_QRPorIPIm...)?
Where are you encountering the short forms of the folder names? I rely on Microsoft's API for file access, so it must be handling all that under the covers. I don't do anything with short form/long form myself.

Mike
Interesting... AFAIK there is no Microsoft API involved when synching from Android to a Google Drive folder.

I see these short names in mobilesheets_hashes.txt and mobilesheets.db (Files table) as they occur in the Drive folder.

[attachment=981]
Oh sorry, I was confused as to what you were talking about. Those are the Google Drive File ids that Google Drive assigns when I upload a file. There is no algorithm to map the folder names to the ID. If I have a path with multiple levels, say root/folder1/folder2/file, then I have to first retrieve the root, then do a query for folder1 by name, then after I get that object with a proper ID, I can query for folder2 by name, then using the ID returned for that I can get the file. Google Drive doesn't support a concept of hierarchical paths like Dropbox (or any normal file system), so there is no way around that. To speed things up, I cache access to folders once I retrieve them.  

In the hash file you are seeing the ID of the folder that owns each file. I just save that path in the database stored on Google Drive so that it simplifies the processing. All files/folders are relative to the storage location (which is set to the selected sync folder containing the database file), so I can easily compare the files/folders on the device against what is on Google Drive so long as the files on the device are also relative to the storage location. When it comes to finding matching songs and files, the logic has gotten a little complex, as I'll compare file names along with song IDs, and if I get multiple matches then that dialog is shown to ask the user to pick the right one.

Mike
I see.
Does this imply that when I synch to GDrive folder e.g. 'Current', and later rename this folder to e.g. 'Archive', a synch would no longer work?
I try to understand how the synch (to cloud folder) process works, but I cannot explain why it is so slow.
I do a clean upload to an empty cloud folder. 665 files for 1653 songs. This requires about 30 minutes.
I run the synch again, updating server, and it takes 10 minutes.
I assume that the database and hashcodes are fetched from the server, and then the local files and song data is compared to the contents of the remote database and hash codes. I would expect this to take a couple of seconds.
There is one piece I didn't mention - in order to support a user copying in a new version of a file, if the last modified timestamp as reported by Google Drive doesn't match the last modified timestamp that I recorded in the hashcodes file, I have to recalculate the hash. If I only compared cached hashcodes, then there is no way for me to detect when a user updates a file. Due to that fact, I always have to query for every file to check it's last modified timestamp to see if it's changed. That's probably what is taking 10 minutes. Let me know if you have any thoughts on this. One option would be for me to add a checkbox in the settings to "ignore last modified timestamps" so that I won't check for modified files and only rely on cached hashcodes. That would speed up the processing significantly and it would probably only take a couple of seconds like you expect.

Mike

EDIT:
A better label for the setting would probably be "Check for updated files in cloud folders".
(08-02-2018, 07:00 AM)Zuberman Wrote: [ -> ]in order to support a user copying in a new version of a file,

Great! I didn't dare to hope that you would support this.

Quote:Due to that fact, I always have to query for every file to check it's last modified timestamp to see if it's changed. That's probably what is taking 10 minutes.

You can get all the metadata of all files in a folder in a single call, and cache it locally?

The timestamp in the hashcodes file is in milliseconds. Does GDrive use milliseconds as well?
Do you issue a separate call to update the timestamp on the drive to the value in the hashcodes file, or vice versa?

Quote:A better label for the setting would probably be "Check for updated files in cloud folders".

Nice!
I can get all of the files under a single folder, but not all files under all folders. Due to the fact that I'm separating each song into separate folders by title in order to reduce file name conflicts, there isn't a way for me to grab every file recursively. 

Google Drive uses milliseconds for the last modified timestamp. If any timestamps need to be updated, I gather them all up and commit them all to the hashcodes file at the end with a single upload.

I've made the change to add the "Check for updated files in cloud folders" settings, and I completed a second sync in 30 seconds if I disable the option. So you may choose to use this if you know you haven't updated any of the files on the cloud.

Mike
While checking integrity of the cloud folder, I noticed I have a number of song files that have differing sizes in the database and on disk.
I assume this is caused by my workflow -- I drop replacement files directly in the MSPro storage, and these are picked up by MSPro. I do not use "swap files" since it seems to work anyway. A bad habit?
As long as you load the song after you replace the file so that MobileSheetsPro detects the change and updates the database, I think it's fine to do that. If you don't load the song though, then the database will be out of sync with the file on disk. If you are loading the song and MobileSheetsPro is not updating the file size in the database, then that's a bug I will need to fix.

Mike

EDIT:
Looking through the code, it does appear that I should be updating the file size in the database. So hopefully you are seeing that happen.
Thanks!
It works for the ChordPro files, but not for the audio files. I fixed these manually.