1. Why have you split up the different parts into different files? To my mind, you'd have a cleaner structure if you had just one or two files, with different sections, rather than distinct files. This way'd actually be slightly easier to work with programatically, too, as you'd only have to load a single file, and then obtain references to the appropriate XML nodes (which you'd do with multiple files, anyway).
I don't really agree that less files would be easier to handle. Especially when I look at the XRNS format - it's terrible to keep everything in one file IMO.
Well, I work a lot of with Jetty webapps that load up XML for various things. I find that it's a lot simpler to load a single XML file in the webapp, and find the correct nodes for each process (or whatever) and use them. When you're programming, you don't want to be dealing with files, you want to be dealing with abstracted XML -- reducing the number of file handles you have to deal with makes your code easier to manage and deal with.

2. In instruments/instrument/envelopes//envelope, I would suggest putting some sort of ID attribute on the nodes, and then referencing this ID from within loop and sustainloop -- an XML document isn't guaranteed to have a particular order, and you don't really want to be linked from the tick number.
Good catch, I actually forgot that loop points have to be linked to a node.
But actually, from what I know, the order in an XML document is important, so I don't see why I would have to use an ID here.
Have a look here:
http://www.ibm.com/developerworks/xml/library/x-eleord.html . An XML parser is free to return XML nodes in any order it sees fit, and I don't think that you can assume that parser P is always going to return nodes in the same order. Therefore, I think you should put some actual ordering information on the nodes, so that you can programmatically ensure that they are ordered correctly.
3. Module/Volume -- move the attribute "mixmode" into a subnode.
4. Module/Tempo -- move the attribute "mode" into a subnode.
Why? I consider attributes as "descriptions", and in this case, "mixmode" and "mode" describe how to interpret the child nodes. This is actually one of the few cases where I'd say that attributes make sense. 
Fair enough -- I don't think it'll really make much difference either way.

For some reason, I just prefer to have them out of attributes, but that's probably just me.

6. Patterns//Pattern -- you need some sort of unique identifier for each distinct pattern, too.
Why's that? Storing empty patterns (which are rare) as <Pattern /> would take less space than using IDs. 
See my answer to (2), above. And I think that the file will be able to handle an extra, what, 19-21 bytes of data.

7. How are you referencing the actual sample files? I can't see it...
Not at all ATM, Because I still don't know if I should just take the sample number as a filename or something like "1 - Sample's filename.wav" (where sample's filename would be the <filename> tag)
I think that just using a simple number is a good idea -- this way, you're guaranteed to not have conflicts, which are bad.
9. Patterns need some method of ordering them -- a "seq" attribute, perhaps, going from 0 to n.
I can't really see what this would be useful for, care to explain? 
See number (2), above.