[GRADLE-3374] Jar manifests content should be encoded using UTF-8 Created: 29/Dec/15  Updated: 23/May/16  Resolved: 18/May/16

Status: Resolved
Project: Gradle
Affects Version/s: None
Fix Version/s: 2.14-rc-2

Type: Bug
Reporter: Mark Vieira (Inactive) Assignee: Paul Merlin
Resolution: Fixed Votes: 1

Attachments: File mf-test.7z    


Currently Jar manifests use the platform default encoding. Perhaps instead a default of UTF-8 should be used or provide the option for explicitly setting the file encoding.


Comment by guai [ 31/Dec/15 ]

Netbeans developers think UTF-8 is mandatory in manifest too. here

Comment by guai [ 22/Apr/16 ]

There is more to this.
manifest line cannot be greater than 72 bytes, not chars. Sometimes 72 bytes is exactly in the middle of multibyte char.
Lines like that cannot be loaded back.
This lines should be split in between of chars not exceeding 72 bytes.

Comment by guai [ 22/Apr/16 ]

In fact modern java loads manifests with lines greater than 72 bytes ok.

Comment by guai [ 22/Apr/16 ]

Sorry. I was wrong. In fact java.util.jar.Manifest can make split-chars, but it can read them back as well. I hit error in Netbeans, maybe they use nonstandard reader, I don't know... Or maybe it's something else. cr-lf/lf maybe

Comment by Paul Merlin [ 23/Apr/16 ]

Under the hood, Gradle uses Ant Manifest class that does the right thing wrt. line length and encoding.
So this shouldn't be a problem when using Gradle. Do you actually see line length issues with Manifests generated using Gradle?

Comment by guai [ 23/Apr/16 ]

I saw manifests generated by gradle with 72-bytes-split inside of a multibyte char. And I then hit some problems loading this modules with Netbeans.
But then I rechecked.
java.util.jar.Manifest does the same, it can split inside multibyte chars and can combine back strings right. So it's not gradle bug as long as it does the same.
I think not to split inside chars is better or even not to split lines at all, but I can't insist on gradle to behave that way. Some text editors can't display that bytes, which is minor inconvenience.
Maybe there are soft, that could not load this manifests, etc.
I plan to discover what was the exact reason of my problems with Netbeans on Monday, I'll post here the results, maybe it's not connected to manifests even, I'm not sure yet.

Comment by guai [ 25/Apr/16 ]

That's crazy but manifest.mf extracted from target jar and generated-manifest.mf which I think is what should have been packaged - they differs.
generated-manifest.mf was generated by https://github.com/radimk/gradle-nbm-plugin and it can be read ok.
I think next to that goes ant task call which makes netbeans module jar.
It seems that the problem is there.

Comment by guai [ 25/Apr/16 ]

Or maybe the problem is in gradle's jar task
Because build\tmp\jar\MANIFEST.MF is already broken

Comment by guai [ 25/Apr/16 ]

Here is simple reproducing script. run gradle netbeans and then mf-test.groovy

Comment by guai [ 25/Apr/16 ]

It definitely is a gradle bug. org.gradle.api.java.archives.internal.DefaultManifest#attributes already contains broken chars.
Should I create new task as it's not connected to default encoding?

Comment by guai [ 18/May/16 ]

Manifests still not in UTF-8 unless set GRADLE_OPTS=-Dfile.encoding=UTF-8 is set.
Even when allprojects*.jar*.manifest*.contentCharset = 'UTF-8' present.

Comment by guai [ 18/May/16 ]

Reproducible on the same attached test project. Both issues still there. Manifest is not in UTF-8 unless -Dfile.encoding=UTF-8. And symbols being corrupted after split-join on 72-bytes line length limit when it's inside multibyte characters.

Gradle 2.14-rc-1

Build time: 2016-05-18 09:38:24 UTC
Revision: f8f26c696dbcba218e74091a92f0517d6a6f75da

Groovy: 2.4.4
Ant: Apache Ant(TM) version 1.9.6 compiled on June 29 2015
JVM: 1.8.0_65 (Oracle Corporation 25.65-b01)
OS: Windows 8 6.2 amd64

Comment by Paul Merlin [ 18/May/16 ]

Thanks for your feedback!
I can reproduce your issue with the provided sample build.

First, try this sample build that demonstrate how charsets are chosen when dealing with manifests: https://github.com/eskatos/gradle-3374
There's no encoding issue and no multibyte char split.

You can play with it. For example, remove the contentCharset used to read the merged manifest so that the default platform charset will be used.

In your sample I think the nbm plugin reads some generated manifest from disk using the default platform charset and things starts to break.

We changed Gradle's code so that, by default, it generates manifests using UTF-8 but read merged manifests using the platform charset. This decision was taken with backward compatibility in mind but it looks like it wasn't a good idea.

Will get back to you shortly. If you have any more feedback, keep it coming!

Comment by guai [ 19/May/16 ]

Paul, I could break your example
I've put a line already split inside multibyte char into MANIFEST_TEMPLATE.txt
File is in UTF-8 except it contains that two char-parts, which is not UTF-8-compatible i think.
-Dfile.encoding=UTF-8 is set and contentCharset = 'UTF-8' is there too.
I can read it ok with new Manifest(new File('src/config/MANIFEST_TEMPLATE.txt').newInputStream())
But the result merged manifest is broken in that attribute's value.

Comment by Paul Merlin [ 19/May/16 ]

> Paul, I could break your example
I'll follow up today.

Comment by Paul Merlin [ 19/May/16 ]

Thanks to your feedback I dug a bit more and I'm close to a proper fix. To be continued

Comment by Paul Merlin [ 20/May/16 ]

I pushed a fix to the release branch.
Could you confirm it fixes your issues?

In case you need it, here is the incantation that installs Gradle from sources:
./gradlew install -Pgradle_installPath=/path/where/to/install/built/gradle

Comment by guai [ 23/May/16 ]

ok, I'll try

Comment by guai [ 23/May/16 ]

Everything is fine. I couldn't break anything.

Comment by Paul Merlin [ 23/May/16 ]

Great. Thanks!

Generated at Wed Jun 30 12:47:08 CDT 2021 using Jira 8.4.2#804003-sha1:d21414fc212e3af190e92c2d2ac41299b89402cf.