[GRADLE-2795] Gradle locks the global script cache during the entire build, causing subsequent builds to fail if scripts change Created: 17/Jun/13  Updated: 28/Sep/16  Resolved: 10/Mar/16

Status: Resolved
Project: Gradle
Affects Version/s: None
Fix Version/s: 2.13-rc-1

Type: Bug
Reporter: Gradle Forums Assignee: Cédric Champeau (Inactive)
Resolution: Fixed Votes: 43

Issue Links:
Related
Related to GRADLE-3106 Timeout waiting to lock artifact cache Resolved

 Description   

For scripts included with "apply from" gradle uses the cache in user's home to keep compiled classes.
Probably to ensure the same script content is applied throughout the build, a shared lock is kept on this cache for the time it runs.
If a build detects the script has changed, it acquires an exclusive lock, updates the cache and goes back to a shared lock.
However if there was another build running under the same user, it will fail after the default 1 minute timeout, due to the other build's shared lock.

In practice this means a change to a shared script will cause all following builds to fail for a user, until all builds that were running during the change have finished.

I can think of a couple of solutions:
1. make the script's key in the cache a combination of path and file hash/date. A script will have multiple versions in the cache and each build will use the one it needs.
2. copy the cached class(es) to the .gradle folder of the current build. A shared lock on the cache in user's home will only be kept for the short time of cloning required cache artefacts,
3. ignore the cache if exclusive lock can not be acquired. Compile the script and use it only in the current build. This will slow down builds due to waiting and compiling but eventually the cache will be updated.

When the scenario occurs, a GradleScriptException exception with this message is thrown :

A problem occurred evaluating script.
Could not open buildscript class cache for script 'http://.../?p=build-core;a=blob_plain;f=repository-utils.gradle;hb=HEAD' (/home/build/.gradle/caches/1.6/scripts/_p_build_core_a_blob_plain_f_r_4n9gdhqrjd4inp4c6jive7ql9c/DefaultScript/buildscript).
Timeout waiting to lock buildscript class cache for script 'http://.../?p=build-core;a=blob_plain;f=repository-utils.gradle;hb=HEAD' (/home/build/.gradle/caches/1.6/scripts/_p_build_core_a_blob_plain_f_r_4n9gdhqrjd4inp4c6jive7ql9c/DefaultScript/buildscript). It is currently in use by another Gradle instance.
Owner PID: unknown
Our PID: 15314
Owner Operation: unknown
Our operation:
Lock file: /home/build/.gradle/caches/1.6/scripts/_p_build_core_a_blob_plain_f_r_4n9gdhqrjd4inp4c6jive7ql9c/DefaultScript/buildscript/cache.properties.lock



 Comments   
Comment by Gradle Forums [ 17/Jun/13 ]

Are you saying that you are running the same build under the same user from the same workspace multiple times concurrently? If so, what's the use case?

Comment by Gradle Forums [ 17/Jun/13 ]

No, they're different builds, different workspaces. Each of our build machine can support 3-6 build agents, they are run under the same user for a couple of reasons, but have different workspaces.

I created an example for the scenario: [1]https://gist.github.com/senoctar/5791149
The setup contains two builds and a shared script.
The first build applies the shared script and waits (simulating running tests). The second build makes a change to the shared script (simulating a developer updating the script) and then applies it.
These builds can be in completely separate folders and the second build will fail.

Because plain Gist repos can't have folders, I included a batch file that sets up folders like this:
build1.gradle -> build1/build.gradle
build2.gradle -> build2/build.gradle
applied.gradle -> shared/applied.gradle

The script then starts the two builds.
----------------------------------------------------------------------------------------
[1] https://gist.github.com/senoctar/5791149

Comment by Fabian Depry [ 25/Jun/13 ]

Is there a workaround for this; is there a simple way to tell Gradle to ignore the cache and re-compile the build script each time? Sure it's slower, but still better than having the entire build fail because of a locked file...

Comment by Laurent Moss [ 02/Aug/13 ]

A workaround is for the two builds to use two different Gradle home directories (even under the same user) specified with the --gradle-user-home command line parameter. Of course, this also implies that artifacts will be downloaded twice.

Comment by Fabian Depry [ 06/Aug/13 ]

Thanks for the answer; we ended up doing something similar: setting up different cache directories using the --project-cache-dir command line parameter.

Comment by Flemming Frandsen [ 07/Aug/13 ]

I just started hitting this problem and --project-cache-dir doesn't seem to fix the problem for me.

I could probably use --gradle-user-home, but I have some global configuration in .gradle/gradle.properties that I would rather like to use.

Comment by Patras Vlad [ 07/Aug/13 ]

The --project-cache-dir should not have worked, unless the same build is started more than once in the same folder, but this is not what the issue is about.

As a workaround we copy the shared script to the project directory and apply it from there. The scripts will be compiled for each project, but that is reasonable.

Comment by Timothy Bassett [ 12/Nov/13 ]

We are experiencing this problem in a dramatic fashion in our Jenkins server. We have a few apply from: "http://....". Besides limiting the number of executors in Jenkins, what might be some of the ways to minimize the problem?

Comment by Flemming Frandsen [ 12/Nov/13 ]

We were hit hard by this bug, I ended up writing a wrapper for gradle which ensures that two concurrent gradle processes never get to share a gradle home directory.

So far this approach has worked really well, but I had to nuke the gradle home dir after a number of gradle invocations (I used a limit of 100) or the cache in each would grow without bound (yet another bug) and fill the disk.

Comment by Timothy Bassett [ 12/Nov/13 ]

@Flemming

Yeah, I've taken that under consideration, but with 40+ build jobs and a large gradle cache, creating a gradle home for each one will run us out of disk space before we even get started.

Comment by Chuck May [ 12/Nov/13 ]

I am really surprised this bug has been lingering so long. With Jenkins, the solution is for each job, check the option:

Force GRADLE_USER_HOME to use workspace

That way you don't need a wrapper, but it will create a ton of wasted disk space. We have also started to look into Team City, and our only workaround there was basically the same solution. Each build agent needs to have a different GRADLE_USER_HOME.

Comment by Flemming Frandsen [ 12/Nov/13 ]

Tim: I know, the "Force GRADLE_USER_HOME to use workspace" option in Jenkins is completely worthless, I have over 200 gradle jobs in jenkins and replicating the huge gradle cache for each one is a complete non-starter.

What I did was to let the wrapper try a list of gradle-home dirs in turn until it finds one that's unoccupied and then it grabs a lock on that directory, runs gradle and after the build is done, if the directory has been used to do more than 100 builds, it gets nuked.

This way I get the benefit of caching, because each gradle-home dir is reused and I only keep the minimal number of gradle-home directories around, which is the maximum number of concurrent gradle processes that has ever been running.

Comment by Timothy Bassett [ 12/Nov/13 ]

@Chuck

Thanks. Not to beat a dead horse, but that's not really so much an option for us. As noted, the disk space will be a big issue. Above and beyond that, we do have a gradle.properties in the GRADLE_USER_HOME, so that will be an issue as well, because the gradle.properties will not be present.

If anyone know what I could do to help, please let me know.

Comment by Chuck May [ 12/Nov/13 ]

I have few enough jobs I can take the hit and replicate the user home direrctory. I did have to add a script step to every job that copies the master gradle.properties to the workspace user home so I didn't have to manually copy shared properties around. Kludgy, but it works.

I am hoping the Gradle devs will be able to look into this soon.

Comment by Hemant Gokhale [ 12/Nov/13 ]

We implemented a solution for our Jenkins installation that involves creating an executor-specific GRADLE_USER_HOME. It involves the following steps:

1. Inject this environment variable to the build process
GRADLE_USER_HOME=$HOME/.gradle/private-cache-$EXECUTOR_NUMBER

2. Run the following shell script as a separate build step of type 'Execute shell' before the 'Invoke Gradle script' step
mkdir -p $GRADLE_USER_HOME
ln -sf $HOME/.gradle/gradle.properties $GRADLE_USER_HOME/gradle.properties
ln -sfT $HOME/.gradle/native $GRADLE_USER_HOME/native
ln -sfT $HOME/.gradle/daemon $GRADLE_USER_HOME/daemon

3. Setup a separate Jenkins job to clean the caches every day. This job needs to be specific to each slave machine and the job weight should be set to the total number of executors configured for that slave.

Comment by Timothy Bassett [ 12/Nov/13 ]

Hemant,

Hmmmm... That looks interesting, that might work for us.

Comment by Patras Vlad [ 12/Nov/13 ]

As mentioned before one easy fix is to copy the script in the local build folder and apply from there.
This way you can have a cache common to all builds and the script will also be cached. There will be one copy in the cache for each build folder but it should be reasonable.
What I mean by this is replace apply from: <URL> with a custom function call, ex. applycommon <URL>.

One possible implementation would be:

void applycommon(def scriptUrl) {
    def localFile = new File(buildDir, scriptUrl.tokenize("/")[-1])	
    localFile << new URL(scriptUrl).openStream()
    apply from: localFile
}

I created a fork of the original gist by adding this fix and it works:
https://gist.github.com/senoctar/7432635
If the script you apply has additional apply statements this will be more difficult to implement. You have to change all scripts, check if local file exists etc. but it can be done.

Another workaround (which is the one we ended up using for most builds), is to check out both build files and common scripts. This is possible in TeamCity since you can add multiple VCS roots to a build configuration. This way all files are available locally and can be applied from there. As far as I know this is not possible in Jenkins, at least not without an additional plugin.

Comment by Patras Vlad [ 12/Nov/13 ]

Yet another workaround is to add a random parameter so gradle will see the script as a different file every time.
This has the downside of re-compiling a script every time it is applied and might be an issue if you have many scripts.
Ex: apply from: "http://git-server/?p=build-core;a=blob_plain;f=repository-utils.gradle;hb=HEAD&random=${Math.random()}"
Notice the &random=${Math.random()} at the end.

Comment by Timothy Bassett [ 12/Nov/13 ]

Patras

"http://git-server/?p=build-core;a=blob_plain;f=repository-utils.gradle;hb=HEAD&random=${Math.random()}"

Ugly and beautiful all at the same time. I love it! We'll take a look at this!

Comment by Joviano Dias [ 18/Jun/15 ]

I see the following option in jenkins for this under the gradle wrapper config:

(option)Force GRADLE_USER_HOME to use workspace		
Help for feature: Force GRADLE_USER_HOME to use workspace
Gradle will write to $HOME/.gradle by default for GRADLE_USER_HOME. For a multi-executor slave in Jenkins, setting the environment variable localized files to the workspace avoid collisions accessing gradle cache.

Will this work?

Comment by Torsten Krah [ 18/Jun/15 ]

As already written above depends for your usecase if it will work.
Not in any case because it will miss the setup from your default home $HOME/.gradle which you might already have there in your init.d directory and maybe your custom gradle.properties - at least it did not copy that last time i tried.
Also it will take a huge amount of disk space if you have many ( n ) workspaces as all artifacts will be there n-times and if you got many build jobs (200+) which all use there own home, you will run out of disk space sooner or later.

Comment by Christopher Dancy [ 22/Jun/15 ]

This issue is reproducible damn near 10/10 times if you run multiple containers all sharing/mounting the same gradle dir "/user/.gradle" as a volume. Would love to have all of our containers use the same gradle cache on the host.

host-OS=coreos-681.0.0
container-gradle-version=2.4
container-image=ubuntu-14.04

Comment by KwonNam Son [ 22/Sep/15 ]

I used Math.random() for jenkis build. It was quite good but makes too many build script cache files.
So I changed to JOB_NAME environment variable.
Jenkins injects JOB_NAME env to every build.
So I do like the following.

apply from: "http://urlforbuild.gradle?jobName=${java.net.URLEncoder.encode(System.getenv()['JOB_NAME'] ?: 'NOJOB', 'UTF-8')}"
Comment by Tony Bridges [ 13/Oct/15 ]

A bit of a variation on the underlying theme:
On my Jenkins system, I try to completely isolate the GRADLE_HOME for each build into the workspace. I also wipe the workspace before each build. The side effect of the lingering lock is that, if a given build host has not been idle long enough for the lock to expire, the subsequent build will fail on the "wipe workspace" step as it cannot delete a locked file.

Comment by Joachim Nilsson [ 12/Mar/16 ]

@Cédric: If this works, you will be my hero of the year
Looking forward to try out 2.13.

Comment by Cédric Champeau (Inactive) [ 12/Mar/16 ]

If you want to make sure it works for your use case, you can try a nightly: http://services.gradle.org/distributions-snapshots

I would actually appreciate feedback

Comment by Joachim Nilsson [ 22/Mar/16 ]

@Cédric I did a larger (~150 parallel jobs) test of your nightly build in out CI environment yesterday evening and it went just fine.
Then I reverted to 2.11 and half of the jobs hanged on locking:

> Timeout waiting to lock artifact cache (/local/.gradle/caches/modules-2). It is currently in use by another Gradle instance.
Lock file: /local/.gradle/caches/modules-2/modules-2.lock

We look forward to the sharp 2.13 being released!

Comment by Joachim Nilsson [ 24/Mar/16 ]

@Cédric Not sure if this is related to the fix but I have some indications that multiple builds using same cache take longer time now.

Comment by Joachim Nilsson [ 29/Mar/16 ]

@Cédric: sorry, we got it again when using the nightly build

[Gradle] - Launching build.
[NNNN] $ /local/tools/gradle-2.13-20160320000030+0000/bin/gradle -DSOURCE_BUILD_TAG=NNNN-smoketest-smoketest-build-1170 --gradle-user-home /local/.gradle/ smoketestPMD
...
Execution failed for task ':nn:yy'.
> Could not resolve all dependencies for configuration ':nn:runtime'.
> Timeout waiting to lock artifact cache (/local/.gradle/caches/modules-2). It is currently in use by another Gradle instance.
Owner PID: 15085
Our PID: 3808
Owner Operation: resolve configuration ':mm:zz'
Our operation: resolve files from configuration ':nn:runtime'
Lock file: /local/.gradle/caches/modules-2/modules-2.lock

Comment by Cédric Champeau (Inactive) [ 29/Mar/16 ]

I think this one is a different cache, used for dependency resolution. I will check with the team, thanks for the feedback!

Comment by Joachim Nilsson [ 29/Mar/16 ]

OK, I thought cache handling was somehow managed within the same 'module'. In that case I have no feedback and the improvement I saw was just a coincidence. Sorry for the confusion.

Comment by Antonio Terreno [ 19/Apr/16 ]

Hey this does not seem to be fixed with Gradle 2.13-rc-1.

I am running in the same issue, my situation is a bit different I suppose.

I am running a few gradle docker containers, using https://github.com/metahelicase/docker-gradle (about 10~)

These are spawn up by docker-compose, which runs the containers in parallel.

Now, the gradle image will mount a volume, and that will be shared within all the containers.

Therefore I run in the same issue, multiple gradle processes accessing the same lock file...

Is there anyway to disable it? Or to configure a random lock name at run time?

Comment by Trejkaz [ 28/Sep/16 ]

We see this on 2.14. Any build doing Gradle stuff seemingly prevents any other build from doing anything until it finishes. We only found out because one of them finally timed out.

Generated at Wed Jun 30 12:31:44 CDT 2021 using Jira 8.4.2#804003-sha1:d21414fc212e3af190e92c2d2ac41299b89402cf.