[GRADLE-2181] Copying of files with unicode in filename fails with "java.lang.UnsupportedOperationException: ENOENT" Created: 18/Mar/12 Updated: 18/Jan/13 Resolved: 23/Jul/12 |
|
Status: | Resolved |
Project: | Gradle |
Affects Version/s: | 1.0-milestone-9 |
Fix Version/s: | 1.1-rc-1 |
Type: | Bug | ||
Reporter: | René Gröschke (Inactive) | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 1 |
Attachments: | copy_bug.tar.gz |
Description |
After upgrading from milestone-8 to milestone-9 and running "gradle copyFiles -S" on the example attached to this bug, the copy operations fails with the following output (+ stack trace): FAILURE: Build failed with an exception.
|
Comments |
Comment by Luke Daley [ 12/Apr/12 ] |
The problem comes in that we are using JNA. When we cross that boundary the Java (encoding agnostic) String is converted to bytes using the platform encoding. http://jna.java.net/javadoc/overview-summary.html#strings We cross the boundary calling a libc function. It's not clear to me what encoding libc is expecting, and I guess it may be platform dependent. I can't see a clear way to resolve this issue. One option would be to have Gradle set 'jna.encoding' to UTF8 globally, but this might be too far reaching. It should be noted that any user can workaround this problem by ensuring that the JVM file encoding for the build supports the characters in the filename. |
Comment by Adam Murdoch [ 12/Apr/12 ] |
Generally, I think, libc functions such as stat(), chmod(), etc don't specify how the file names are encoded - it's the caller's problem to use the same encoding as everyone else that wants to do something with the file name. Often, I think, file names are encoded using wcstombs(). It appears this is what the JVM is using (or openjdk 7 at least). This uses the platform encoding, specified by $LANG (or $LC_CTYPE). This isn't the same as the JVM's file.encoding. On the mac, filenames need to be encoded with utf-8. The apple java 6 has file.encoding set to MacRoman, whereas openjdk 7 has file.encoding set to utf-8. $LANG is set to use utf-8, so it looks like the apple jvm is not doing the right thing here. On linux, I'm not sure what the jvms are doing with file.encoding. I don't have my linux machine with me to try it out. I'm hoping that they honour $LANG. If they do, I think we can just special-case for the mac, where we use utf-8 for filename encodings on the mac, and file.encoding everywhere else. On windows, we don't need to care. We don't use any of the libc functions, and use the windows unicode-aware functions instead. |
Comment by Adam Murdoch [ 12/Apr/12 ] |
Another workaround is to use java 7. |
Comment by Luke Daley [ 13/Apr/12 ] |
I'm not sure we can use libc and solve this effectively, since there seems to be no way to give the libc functions anything other than String and let it handle the encoding. Having Gradle globally set jna.encoding seems dangerously invasive to me. I think the best we can do is provide a good error message that suggests people either use Java 7, set jna.encoding or file.encoding to something that works. Would utf8 always work? |
Comment by Luke Daley [ 21/May/12 ] |
Rene, it's likely that this works on Ubuntu because the default platform encoding is a character set that supports the characters. Can you test by running with a something for -Dfile.encoding that does not have these characters please. |
Comment by René Gröschke (Inactive) [ 22/May/12 ] |
I've tested the behaviour on ubuntu: 1. gradle copyFiles -> works |