MobileSheets Forums

Full Version: CSV file format
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Now CSV imports are becoming popular, it is important to know what the exact format requirements are.

CSV is a long time evolved and rather informal file format. There are a lot of conventions that seem work.

In 2005 an attempt was made to formally define the format in RFC 4180.

What are the requirements for MSPro? The current documentation is usable but lacks information on:
  • File encoding. Given that MSPro is an Android/java application I assume the default is UTF-8. Does MSPro take a BOM for UTF-16 and UTF-32 encodings?
  • Separator character. According to the docs MSPro detects the separator from the first line. Anything goes?
  • Field quoting. Does MSPro accept double quotes (" in HTML) to surround fields? A doubled double-quote to embed a single double-quote in a quoted field?
  • What if a multi-valued field contains a '|'?
  • Maybe even full RFC 4180 compliance?

Just a couple of details that may be interesting to add to the documentation.
MobileSheetsPro will handle UTF-8, UTF-16BE and UTF-16LE. It strips the bom marker at the beginning. I don't support UTF-32 as I've never a used a file with that encoding, I'm not sure if the open source libraries I'm using properly support that and the tool I use to edit files (Notepad++) doesn't have a way to convert files to that encoding. 

MobileSheetsPro will accept any character after "title" as the delimiter. Anything goes.

MobileSheetsPro does accept double quotes to surround fields. I do not support a doubled double-quote though. That is something I would need to add support for.

I split any field using '|', so if a multi-valued field contains a '|', whatever is listed is going to be split into multiple parts. That is how you can specify multiple genres, artists, albums, etc.

I'm pretty close to being fully compliant with RFC 4180. I don't preserve white space at the start or end of values though (I trim them as I didn't see the value in leaving white space there). Perhaps this is wrong and I should always honor whatever white space is specified. 

Thanks,
Mike
Sounds great.

Quote:I do not support a doubled double-quote though.

I don't think this is urgent.

Quote:I don't preserve white space at the start or end of values though (I trim them as I didn't see the value in leaving white space there).

Personally I would say that it is important to retain spaces in quoted strings. So (using underscores to represent spaces) ;_foo_; may yield "foo" or "_foo_", but ;"_foo_"; should yield "_foo_". But also not very urgent.

Just wait until users are filing issues Wink