Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Exceeding CPU limit when pushing eosio.system contract #8289

Closed
krotkiewicz opened this issue Dec 7, 2019 · 21 comments
Closed

Exceeding CPU limit when pushing eosio.system contract #8289

krotkiewicz opened this issue Dec 7, 2019 · 21 comments

Comments

@krotkiewicz
Copy link

Hey,
when I run custom testnet nodeos with custom genesis file, I randomly get below error when pushing eosio.system contract.

$ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2019-12-07T13:03:09.462 nodeos    http_plugin.cpp:652           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080004: Transaction exceeded the current CPU usage limit imposed on the transaction

I use eos 1.8.5 from eostudio/eos:v1.8.5 docker image and eosio.contracts from release/1.8.x branch plus following genesis and config files:

{
  "initial_timestamp": "2018-03-02T12:00:00.000",
  "initial_key": "EOS55oVd4ZyC2QMiAMzs1WFYzEYU1GKAAALPCa45Dzs9wrUWo3hjD",
  "initial_configuration": {
    "max_block_net_usage": 1048576,
    "target_block_net_usage_pct": 1000,
    "max_transaction_net_usage": 524288,
    "base_per_transaction_net_usage": 12,
    "net_usage_leeway": 500,
    "context_free_discount_net_usage_num": 20,
    "context_free_discount_net_usage_den": 100,
    "max_block_cpu_usage": 100000,
    "target_block_cpu_usage_pct": 500,
    "max_transaction_cpu_usage": 50000,
    "min_transaction_cpu_usage": 100,
    "max_transaction_lifetime": 3600,
    "deferred_trx_expiration_window": 600,
    "max_transaction_delay": 3888000,
    "max_inline_action_size": 4096,
    "max_inline_action_depth": 4,
    "max_authority_depth": 6
  }
}
chain-state-db-size-mb = 65536
reversible-blocks-db-size-mb = 65536

producer-name = eosio
http-server-address = 0.0.0.0:8888

plugin = eosio::chain_api_plugin
plugin = eosio::history_plugin
plugin = eosio::history_api_plugin
plugin = eosio::http_plugin
plugin = eosio::producer_plugin
plugin = eosio::producer_api_plugin

access-control-allow-origin = *
http-validate-host = false
enable-stale-production = true
max-transaction-time = 3000000

filter-on = *

To easily reproduce it I prepared a repo with Dockerfile: https://github.com/ulamlabs/eos-regtest.
Just build the image and run it couple of times.

What I tried?

Increasing max CPU time in genesis file - then the pushing the system contract takes seconds and errors with Please increase the expiration time of your transaction!.

Thanks for helping.

@jgiszczak
Copy link
Contributor

The max-transaction-time of 3000000 ms you have specified exceeds the maximum block time of 500 ms. No transaction taking that long can be incorporated into a block. You could try increasing max_transaction_cpu_usage in genesis.json from 50000 microseconds to something less than 500000 microseconds.

@krotkiewicz
Copy link
Author

@jgiszczak I already tried that. When I increase all max cpus in the genesis file to some absurd values (like minutes), the transaction takes forever to execute and the node gives up with timeout error.

So there is something bad going on when executing that transaction (put system contract), looks like a deadlock to me.

@jgiszczak
Copy link
Contributor

As I said, increasing those limits to beyond the block time is useless. Any transaction taking longer than block time will fail with the timeout you are seeing, always. The transaction is expiring in the queue after repeated failures to be incorporated into a block.

Try building the 1.8.3 release of the system contract rather than the 1.8.x branch. What version of CDT did you use to build the system contract?

@jgiszczak
Copy link
Contributor

Testing demonstrates that changing max_transaction_cpu_usage in genesis.json still works as advertised. Note that EOS Mainnet uses a value of 150000 microseconds. It's safe to say 50000 is too low for current common contracts.

@ksiazkowicz
Copy link

I can confirm that changing those values works, but the issue that we're facing is that no matter if we set these values to something unreasonable like 20 seconds per block or we use mainnet values (0.2s for max block CPU usage and 0.15s for max transaction CPU usage), it doesn't guarantee us that this transaction won't exceed the deadline. We counteract this by repeatedly submitting it until it succeeds and ultimately it does, usually requiring only about ~0.01-0.03s of CPU time. There must be some reason why this transaction either exceeds 0.15s deadline or gets executed under just a fraction of that time.

We use eosio.cdt 1.6.3 as advised in release notes. Using v1.8.3 of system contract doesn't seem to resolve the issue.

@jgiszczak jgiszczak reopened this Jan 8, 2020
@jgiszczak
Copy link
Contributor

jgiszczak commented Jan 8, 2020

Reopening. I'm getting three independent reports of consistent transaction timing failures in the 1.8 branch, ranging all the way from 1.8.0 all the way up to 1.8.6.

@netuoso
Copy link

netuoso commented Jan 21, 2020

I am experiencing this same issue when deploying eosio.system contract.

EOS Version: 1.8.9
CDT Version: 1.6.3
Contracts Version: 1.83

Not sure exactly when it appeared but it is preventing local testnet deployment.

@netuoso
Copy link

netuoso commented Jan 22, 2020

I was able to solve this problem by adding http-max-response-time-ms = 500 to the nodeos config.ini. This option was added with EOS version 1.8.9+

The default value is 30ms which may not be enough on a local, single node testnet

@heifner
Copy link
Contributor

heifner commented Jan 22, 2020

Note tx_cpu_usage_exceeded "Transaction exceeded the current CPU usage limit imposed on the transaction" is different than timeout_exception "deadline ${d} exceeded by ${t}us". timeout_exception is tied to http-max-response-time-ms.

@netuoso
Copy link

netuoso commented Jan 22, 2020

Correct, the original poster said.

Increasing max CPU time in genesis file - then the pushing the system contract takes seconds and errors with Please increase the expiration time of your transaction!.

So when he fixes the tx_cpu_usage_exceeded bug, he gets a new bug. The new issue will be an expired transaction. I was experiencing the same thing and it was specifically related to deployed the eosio.system contract after a version update.

The newer versions of EOSIO defaulted the http-max-response-time to 30 to prevent RPC node abuse. As you have to use the RPC node to set the contract, the imposed limit caused issues on some hardware in a local testnet. Please at least attempt the above setting in a testnet before declaring it does not fix the described issue.

Hopefully this information helps someone as it seemed to help me running into the same issues. I have seen some form of these errors constantly when making changes that did not include increasing allowed http response time. I suppose it is possible the original commenter was using outdated eosio.contracts that didnt work with his EOS version, but I had this issue with multiple versions of eosio.contracts.

@heifner
Copy link
Contributor

heifner commented Jan 22, 2020

Thanks for the additional info.

@ksiazkowicz
Copy link

Thanks @netuoso. I tried your solution but it doesn't seem to make much impact for me. I'm using eosio 1.8.9, eosio.cdt 1.6.3 and eosio.contracts 1.8.3. I've set http-max-response-time-ms to 1000, max_transaction_cpu_usage to 150000 and max_block_cpu_usage to 200000. While some runs are successful, others get stuck trying to deploy eosio.system contract.

Here is the output from docker logs eosio 2>&1 | grep "eosio.system" -A 5:

+ retry cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2020-01-24T09:08:54.344 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
deadline exceeded
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
info  2020-01-24T09:09:04.501 nodeos    producer_plugin.cpp:1920      produce_block        ] Produced block 0000002d151a542a... #45 @ 2020-01-24T09:09:04.500 signed by eosio [trxs: 0, lib: 44, confirmed: 0]
error 2020-01-24T09:09:04.609 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2020-01-24T09:09:14.834 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
deadline exceeded
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
info  2020-01-24T09:09:25.001 nodeos    producer_plugin.cpp:1920      produce_block        ] Produced block 000000562aade7a7... #86 @ 2020-01-24T09:09:25.000 signed by eosio [trxs: 0, lib: 85, confirmed: 0]
error 2020-01-24T09:09:25.091 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
--
+ cleos set contract eosio /tmp/eosio.contracts/build/contracts/eosio.system/
Reading WASM from /tmp/eosio.contracts/build/contracts/eosio.system/eosio.system.wasm...
Publishing contract...
error 2020-01-24T09:09:35.350 nodeos    http_plugin.cpp:694           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080006: Transaction took too long
Error Details:
deadline exceeded

@netuoso
Copy link

netuoso commented Jan 24, 2020

Apologies, it helped me for deploying a few times but now I am re-encountering the issue.

Update: I am using an underpowered laptop and running the testnet in a docker instance via docker-compose. After some digging, it seems the issue I was experiencing with deadline exceeded while deploying the eosio.system contract was related to hardware constraints, not a software config error.

I was able to make my machine immediately start working by increasing the available RAM to the docker process as well as setting a CPU niceness of -18. https://en.wikipedia.org/wiki/Nice_(Unix)

TL:DR; if you are experiencing this issue, try to upgrade your hardware or ensure your virtual process has sufficient resources to deploy the contract within the timespan of a single block.

@ndcgundlach
Copy link
Contributor

Appears to have been a system resource issue.

@ksiazkowicz
Copy link

We're having these issues when running this on CI and to be honest I wouldn't call it underpowered. We currently run builds on general1.large codebuild instance, which has 15GB of RAM and 8 vCPUs. The only higher tier has 144GB of RAM and 72 CPUs and I'm not sure if it's worth switching to that one just to check if this resolves our issues as it's 12x the price of our current one. My laptop, on which this also fails half the time with deadline exceeded exception has an i7-4770HQ CPU and 16 GB of RAM.

I tried changing the niceness, but it still consistently takes at least 2 attempts to submit the contract. My machine also doesn't seem to be under much load while running it.

@ksiazkowicz
Copy link

ksiazkowicz commented Mar 10, 2020

I prepared a Docker image and a simple script to reproduce this issue.
https://github.com/ksiazkowicz/eos-testnet

The idea behind the script is that we're trying to bootstrap EOS testnet until it succeeds. Sometimes 1st attempt to deploy eosio.system contract will fail, so I wait 10 seconds and try again. After exceeding 10 retries, I'm considering test run as failed, kill the container and try again. By default, the script will run 100 tests.

I ran these tests against an r5.large EC2 instance running Ubuntu 18.04 and on my Mid-2015 MacBook Pro (Core i7, 16 GB RAM).

My results are here if anyone wants to dig in:
https://1drv.ms/x/s!ApF6BBqdJeSZkaQ4u3tTOs4cEswMEQ?e=u4bTdc

My Docker Desktop config:
Screenshot 2020-03-10 at 13 44 27

@taokayan
Copy link
Contributor

This maybe the time-out issue on keosd. Can you try to following (sign the transaction manually without invoking keosd)?

./cleos set code eosio eosio.system.wasm -j -d -s -x 3600 > tmp
./cleos sign tmp -k THE_EOSIO_PRIVATE_KEY -p

@ksiazkowicz
Copy link

ksiazkowicz commented Apr 16, 2020

Interesting. So if I understand correctly, you suggest that the transaction itself might be already expired by the time it's signed and ready to be submitted? That would make sense, looking at it again the logs mention that the deadline, not CPU usage is exceeded. I adjusted the bootstrapping script of my Docker image accordingly and ran it again. I added test results here: https://1drv.ms/x/s!ApF6BBqdJeSZkaQ4u3tTOs4cEswMEQ?e=PntAbv

What's interesting about the results, that 98 out of 100 builds were successful at first try. The 2 ones that failed had the error mentioned in the 1st message in this ticket:

info  2020-04-16T07:50:23.712 cleos     main.cpp:3306                 operator()           ] grabbing chain_id from nodeos
info  2020-04-16T07:50:23.901 nodeos    producer_plugin.cpp:2046      produce_block        ] Produced block 000000c2db5907e2... #194 @ 2020-04-16T07:50:24.000 signed by eosio [trxs: 0, lib: 193, confirmed: 0]
error 2020-04-16T07:50:24.154 nodeos    http_plugin.cpp:696           handle_exception     ] FC Exception encountered while processing chain.push_transaction
Error 3080004: Transaction exceeded the current CPU usage limit imposed on the transaction
Error Details:
transaction was executing for too long
pending console output:
+ echo retry
+ sleep 10
retry

I attached the log files here: https://1drv.ms/u/s!ApF6BBqdJeSZka8r05JuZtH-4R2YiQ?e=ETt3ac

It's as if there were 2 cases:

  1. Transaction expires until it's signed and pushed to the RPC. These are the builds that succeeded in 1 or more attempts. Default deadline is 30s, that should be more than enough. I noticed that in your example, you increased that to 1 hour and we're no longer hitting that.
  2. Transaction exceeds CPU limits. These builds were discarded after 10 failed retries. It's rare but it still happens.

Our code is available here: ulamlabs/eos-testnet#2

@heifner
Copy link
Contributor

heifner commented Apr 16, 2020

Can you add verbose-http-errors=true which should provide you more info on the timeout.

@ksiazkowicz
Copy link

It appears to be enabled:

root@2ff33b4cd2bb:/# ps aux | grep nodeos
root        11  1.5  0.8 134729372 67704 ?     Sl   10:34   3:16 nodeos -d /data --config-dir /app --genesis-json=/app/genesis.json
root      2650  0.0  0.0  11464  1000 pts/0    S+   14:06   0:00 grep --color=auto nodeos
root@2ff33b4cd2bb:/# cat /app/config.ini | grep verbose-http-errors
verbose-http-errors = true

When I submit it manually, the response does include the traceback but it's not really helpful.

{
    "code": 500,
    "message": "Internal Service Error",
    "error": {
        "code": 3080004,
        "name": "tx_cpu_usage_exceeded",
        "what": "Transaction exceeded the current CPU usage limit imposed on the transaction",
        "details": [
            {
                "message": "transaction was executing for too long",
                "file": "transaction_context.cpp",
                "line_number": 493,
                "method": "checktime"
            },
            {
                "message": "pending console output: ",
                "file": "apply_context.cpp",
                "line_number": 113,
                "method": "exec_one"
            }
        ]
    }
}

The weird part is that all runs have the same config and it doesn't matter how many times you try to submit it, it fails with this exception. If I kill the container, remove all its data and spin it up again, it will probably be fine. In fact, I spinned up another container based on the same image and it bootstrapped just fine.

Not sure how useful this is, but here is also cleos get info output for both nodes:

broken node

oot@2ff33b4cd2bb:/# cleos get info
{
  "server_version": "eb4456fd",
  "chain_id": "7479dd536fa543a6e5faafe8f90132f8d1aab58c746d7d7a4e01c10ea091e25a",
  "head_block_num": 26827,
  "last_irreversible_block_num": 26826,
  "last_irreversible_block_id": "000068ca42e0a8dbdd5d5f108a0cda5526a252aea1c1aa4e39515611ad9631f0",
  "head_block_id": "000068cbf0fa8c02aadaa1320a53508d9c4a9f620c8900aa40e9a2693c454120",
  "head_block_time": "2020-04-16T14:19:05.000",
  "head_block_producer": "eosio",
  "virtual_block_cpu_limit": 200000000,
  "virtual_block_net_limit": 1048576000,
  "block_cpu_limit": 200000,
  "block_net_limit": 1048576,
  "server_version_string": "v1.8.13",
  "fork_db_head_block_num": 26827,
  "fork_db_head_block_id": "000068cbf0fa8c02aadaa1320a53508d9c4a9f620c8900aa40e9a2693c454120"
}

working node

root@30e879854ba4:/# cleos get info
{
  "server_version": "eb4456fd",
  "chain_id": "7479dd536fa543a6e5faafe8f90132f8d1aab58c746d7d7a4e01c10ea091e25a",
  "head_block_num": 55,
  "last_irreversible_block_num": 54,
  "last_irreversible_block_id": "000000368762efdad2bac8d4869a5117504bfd923af19057df2159b3b1c5942d",
  "head_block_id": "000000373495308ac2ed3f740c7d27a0b2351f700d7d415c2d1a068d5e86fe16",
  "head_block_time": "2020-04-16T14:18:52.500",
  "head_block_producer": "eosio",
  "virtual_block_cpu_limit": 211077,
  "virtual_block_net_limit": 1106757,
  "block_cpu_limit": 199900,
  "block_net_limit": 1048576,
  "server_version_string": "v1.8.13",
  "fork_db_head_block_num": 55,
  "fork_db_head_block_id": "000000373495308ac2ed3f740c7d27a0b2351f700d7d415c2d1a068d5e86fe16"
}

@aroyer
Copy link

aroyer commented Jun 5, 2020

to resolve increase the default 30 ms for configuration variable http-max-response-time-ms used by the http plugin in keosd. default is not enough to set the contract for eosio.system using cleos set contract on resource constrained environment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants