Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key not found: 'info' - controller will not start #235

Closed
apcheamitru opened this issue Jun 27, 2018 · 12 comments
Closed

Key not found: 'info' - controller will not start #235

apcheamitru opened this issue Jun 27, 2018 · 12 comments

Comments

@apcheamitru
Copy link

Having issues after with new deployment... controller will not start.

[root@sgandhi-ow ~]# kubectl -n openwhisk get po controller-0
NAME           READY     STATUS             RESTARTS   AGE
controller-0   0/1       CrashLoopBackOff   41         1h

Here are the container logs:

[2018-06-27T19:55:36.566Z] [INFO] Initializing Kamon...
[INFO] [06/27/2018 19:55:37.130] [main] [StatsDExtension(akka://kamon)] Starting the Kamon(StatsD) extension
[2018-06-27T19:55:37.199Z] [INFO] Slf4jLogger started
[2018-06-27T19:55:37.231Z] [INFO] Starting remoting
[2018-06-27T19:55:37.412Z] [INFO] Remoting started; listening on addresses :[akka.tcp://controller-actor-system@10.244.0.148:2552]
[2018-06-27T19:55:37.415Z] [INFO] Remoting now listens on addresses: [akka.tcp://controller-actor-system@10.244.0.148:2552]
[2018-06-27T19:55:37.436Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - Starting up...
[2018-06-27T19:55:37.488Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - Registered cluster JMX MBean [akka:type=Cluster]
[2018-06-27T19:55:37.488Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - Started up successfully
[2018-06-27T19:55:37.532Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - No seed-nodes configured, manual cluster join required
[2018-06-27T19:55:37.752Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.triggers.fires.perMinute
[2018-06-27T19:55:37.752Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.sequence.maxLength
[2018-06-27T19:55:37.753Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.concurrent
[2018-06-27T19:55:37.753Z] [INFO] [#tid_sid_unknown] [Config] environment set value for controller.instances
[2018-06-27T19:55:37.753Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.perMinute
[2018-06-27T19:55:37.754Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.concurrentInSystem
[2018-06-27T19:55:37.754Z] [INFO] [#tid_sid_unknown] [Config] environment set value for runtimes.manifest
[2018-06-27T19:55:37.754Z] [INFO] [#tid_sid_unknown] [Config] environment set value for kafka.hosts
[2018-06-27T19:55:37.755Z] [INFO] [#tid_sid_unknown] [Config] environment set value for port
[2018-06-27T19:55:38.275Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] topic completed0 already existed
[2018-06-27T19:55:38.519Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] topic health already existed
[2018-06-27T19:55:38.800Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] topic cacheInvalidation already existed
[2018-06-27T19:55:39.096Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] topic events already existed
[2018-06-27T19:55:39.467Z] [INFO] [#tid_sid_controller] [Controller] starting controller instance 0 [marker:controller_startup0_count:1714]
[2018-06-27T19:55:40.518Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] handler capacity = 128, pipeline fill at = 128, pipeline depth = 256
[2018-06-27T19:55:40.675Z] [INFO] [#tid_sid_loadbalancer] [ShardingContainerPoolBalancerState] blackboxFraction = 0.1
[2018-06-27T19:55:40.855Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] handler capacity = 128, pipeline fill at = 128, pipeline depth = 256
[2018-06-27T19:55:41.294Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] [GET] serving from datastore: CacheKey(whisk.system/invokerHealthTestAction0) [marker:database_cacheMiss_count:3544]
[2018-06-27T19:55:41.296Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] [GET] 'test_whisks' finding document: 'id: whisk.system/invokerHealthTestAction0' [marker:database_getDocument_start:3546]
[2018-06-27T19:55:41.531Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore]  [marker:database_getDocument_finish:3781:234]
[2018-06-27T19:55:41.559Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] write initiated on existing cache entry, invalidating CacheKey(whisk.system/invokerHealthTestAction0), tid sid_loadbalancer, state WriteInProgress
[2018-06-27T19:55:41.563Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] [PUT] 'test_whisks' saving document: 'id: whisk.system/invokerHealthTestAction0, rev: 49-675f68155d3f18cb5b808e0a905fa41d' [marker:database_saveDocument_start:3813]
[2018-06-27T19:55:41.628Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore]  [marker:database_saveDocument_finish:3877:64]
[2018-06-27T19:55:41.641Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] write all done, caching CacheKey(whisk.system/invokerHealthTestAction0) Cached
[2018-06-27T19:55:41.641Z] [INFO] [#tid_sid_loadbalancer] [InvokerPool] test action for invoker health now exists
[2018-06-27T19:55:41.755Z] [INFO] [#tid_sid_controller] [Controller] loadbalancer initialized: ShardingContainerPoolBalancer
[2018-06-27T19:55:41.756Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] handler capacity = 128, pipeline fill at = 128, pipeline depth = 256
[2018-06-27T19:55:41.762Z] [INFO] [#tid_sid_controller] [ActivationThrottler] systemOverloadLimit = 5000
Exception in thread "main" pureconfig.error.ConfigReaderException: Cannot convert configuration to a whisk.core.controller.WhiskInformation. Failures are:
  at 'whisk':
    - Key not found: 'info'.

	at pureconfig.package$.getResultOrThrow(package.scala:138)
	at pureconfig.package$.loadConfigOrThrow(package.scala:160)
	at whisk.core.controller.RestAPIVersion.(RestAPIs.scala:176)
	at whisk.core.controller.Controller.(Controller.scala:133)
	at whisk.core.controller.Controller$.main(Controller.scala:234)
	at whisk.core.controller.Controller.main(Controller.scala)
[2018-06-27T19:55:53.882Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - Exiting completed
[2018-06-27T19:55:53.892Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - Shutting down...
[2018-06-27T19:55:53.912Z] [INFO] Cluster Node [akka.tcp://controller-actor-system@10.244.0.148:2552] - Successfully shut down
[2018-06-27T19:55:53.929Z] [INFO] [#tid_sid_unknown] [Controller] Shutting down Kamon with coordinated shutdown
[2018-06-27T19:55:53.954Z] [INFO] Shutting down remote daemon.
[2018-06-27T19:55:53.956Z] [INFO] Remote daemon shut down; proceeding with flushing remote transports.
[2018-06-27T19:55:53.992Z] [INFO] Remoting shut down
[2018-06-27T19:55:53.993Z] [INFO] Remoting shut down.
[ERROR] [06/27/2018 19:55:54.034] [controller-actor-system-dispatchers.kafka-dispatcher-19] [Controller$(akka://controller-actor-system)] [#tid_sid_unknown] [KafkaConsumerConnector] poll timeout occurred. Retrying 3 more times.
[ERROR] [06/27/2018 19:55:54.039] [controller-actor-system-dispatchers.kafka-dispatcher-36] [Controller$(akka://controller-actor-system)] [#tid_sid_unknown] [KafkaConsumerConnector] poll timeout occurred. Retrying 3 more times.
[ERROR] [06/27/2018 19:55:54.040] [controller-actor-system-dispatchers.kafka-dispatcher-20] [Controller$(akka://controller-actor-system)] [#tid_sid_unknown] [KafkaConsumerConnector] poll timeout occurred. Retrying 3 more times.

I'm new to OpenWhisk, so if there's any additional information I can provide, please let me know.

@dgrove-oss
Copy link
Member

If you are using the Helm-based deployment, this should have been fixed by #234 (merged earlier this week). We didn't fix the older, non-Helm deployment because we're close to removing it.

@apcheamitru
Copy link
Author

Ah, ok. I was using the non-Helm deployment but will try with Helm.

@apcheamitru
Copy link
Author

Works with helm deployment.

@pallavidn
Copy link

I have a openwhisk install on kubernetes since a month and everything was working fine. No issues.
I deleted the controller pod and restarted it and this problem has occurring for the controller pod.
Do you recommend redeploying the whole openwhisk system again using Helm to fix this issue? Is there another way to fix it in my existing install?

@dgrove-oss
Copy link
Member

my guess is you picked up a newer controller image from dockerhub when you restarted the controller. Even if you got past this, your installation would be in trouble because the new controller isn't compatible with the old invoker pods. If you want to keep your current installation, you will need to use a tag to pull a controller image that is of the same "vintage" as the invoker image you are running.

@piyush-mishra
Copy link

piyush-mishra commented Mar 31, 2024

@dgrove-oss - I am new Open Whisk, when I am running on Windows Machine with command - $ ./gradlew core:standalone:bootRun then playground is geting open on 3232 port but api is not getting host on port 3233. API is not being running. Playground Output console below error is coming Error contacting service, status = undefined

I am also getting this error -

Exception in thread "main" pureconfig.error.ConfigReaderException: Cannot convert configuration to a org.apache.openwhisk.core.etcd.EtcdConfig. Failures are:
at 'whisk':
- Key not found: 'etcd'.
at pureconfig.package$.getResultOrThrow(package.scala:139)
at pureconfig.package$.loadConfigOrThrow(package.scala:161)

@Prithvi-Velicheti
Copy link

@dgrove-oss same etcd error bothering me since a week

@dgrove-oss
Copy link
Member

@piyush-mishra and @Prithvi-Velicheti -- I don't think its the same problem (this is an issue from 5 years ago).

I do think the problem is probably that the standalone configuration doesn't support the "new" scheduler that of a month or so is the default scheduler for OpenWhisk. The documentation/readme maybe out of date. I see in the nightly tests for standalone, we disable the new scheduler. Most likely this needs to be reflected in the documentation (for example the main README). @style95 -- does this make sense to you? You are the expert on configuring the new scheduler

@Prithvi-Velicheti
Copy link

I am sorry.

$./gradlew core:standalone:bootRun

Exception in thread "main" pureconfig.error.ConfigReaderException: Cannot convert configuration to a org.apache.openwhisk.core.etcd.EtcdConfig. Failures are:
at 'whisk':
- Key not found: 'etcd'.

    at pureconfig.package$.getResultOrThrow(package.scala:139)
    at pureconfig.package$.loadConfigOrThrow(package.scala:161)
    at org.apache.openwhisk.core.invoker.FPCInvokerReactive.<init>(FPCInvokerReactive.scala:85)
    at org.apache.openwhisk.core.invoker.FPCInvokerReactive$.instance(FPCInvokerReactive.scala:66)
    at org.apache.openwhisk.core.loadBalancer.LeanBalancer.makeALocalThreadedInvoker(LeanBalancer.scala:72)
    at org.apache.openwhisk.core.loadBalancer.LeanBalancer.<init>(LeanBalancer.scala:75)
    at org.apache.openwhisk.core.loadBalancer.LeanBalancer$.instance(LeanBalancer.scala:93)
    at org.apache.openwhisk.core.controller.Controller.<init>(Controller.scala:119)
    at org.apache.openwhisk.core.controller.Controller$.start(Controller.scala:338)
    at org.apache.openwhisk.standalone.StandaloneOpenWhisk$.startController(StandaloneOpenWhisk.scala:337)
    at org.apache.openwhisk.standalone.StandaloneOpenWhisk$.startServer(StandaloneOpenWhisk.scala:282)
    at org.apache.openwhisk.standalone.StandaloneOpenWhisk$.main(StandaloneOpenWhisk.scala:254)
    at org.apache.openwhisk.standalone.StandaloneOpenWhisk.main(StandaloneOpenWhisk.scala)

<============-> 97% EXECUTING [3m 21s]

:core:standalone:bootRun
This is the error I am facing @dgrove-oss

@style95
Copy link
Member

style95 commented Apr 3, 2024

@dgrove-oss
Yes, the standalone mode is not supported with the new scheduler.
We need to write down it in the documentation.
Since the new scheduler becomes the default scheduler, I feel we may need some relevant changes in the openwhisk-deploy-kube repo as well.

@gnk6
Copy link

gnk6 commented Apr 17, 2024

Hi guys i have the same issue here. Do you may have a hint on how to disable the new scheduler ? I dont run it container but on my machine. I run the below.

java -Dwhisk.standalone.host.name=192.168.122.69 -Dwhisk.standalone.host.internal=127.0.0.1 -Dwhisk.standalone.host.external=192.168.122.69 -jar ./bin/openwhisk-standalone.jar --couchdb --kafka --api-gw --kafka-ui

@gnk6
Copy link

gnk6 commented Apr 17, 2024

A small update on the above. The below simple bash script did it!

#!/bin/bash
export GITHUB_WORKSPACE=/user/openwhisk/
/user/openwhisk/tools/github/disable-scheduler.sh && /user/openwhisk/gradlew :core:standalone:build && java -Dwhisk.standalone.host.name=192.168.122.69 -Dwhisk.standalone.host.internal=127.0.0.1 -Dwhisk.standalone.host.external=192.168.122.69 -jar ./bin/openwhisk-standalone.jar --couchdb --kafka --api-gw --kafka-ui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants