Enterprise Chef 11.2.1 is a critical bug-fix release for customers who installed Enterprise Chef 11.2.0. It corrects a single defect experienced by customers who upgraded from earlier releases.
Bug Fixes:
- Fixes an issue where
private-chef
was being changed toprivate\_chef
unexectedly in upstart/runit configuration files
Notes:
If you upgrade from an earlier release of EC, your servers may now have two runit processes configured in upstart
/etc/init/private-chef-runsvdir.conf
/etc/init/private\_chef-runsvdir.conf
The second one is incorrect, introduced by the aforementioned issue in EC 11.2.0. In this condition, you will see two runsvdir processes running with many errors:
ps:
root 924 1 0 05:20 ? 00:00:00 runsvdir -P /opt/opscode/service log: /lock: temporary failure runsv oc_id: fatal: unable to lock supervise/lock: temporary failure runsv couchdb: fatal: unable to lock supervise/lock: temporary failure runsv bookshelf: fatal: unable to lock supervise/lock: temporary failure runsv postgresql: fatal: unable to lock supervise/lock: temporary failure runsv opscode-certificate: fatal: unable to lock supervise/lock: temporary failure root 926 1 0 05:20 ? 00:00:00 runsvdir -P /opt/opscode/service log: ry failure runsv opscode-expander: fatal: unable to lock supervise/lock: temporary failure runsv opscode-solr: fatal: unable to lock supervise/lock: temporary failure runsv rabbitmq: fatal: unable to lock supervise/lock: temporary failure runsv oc_bifrost: fatal: unable to lock supervise/lock: temporary failure runsv opscode-chef-mover: fatal: unable to lock supervise/lock: temporary failure
pstree:
Correcting the error:
HA
- on both the active/bootstrap and standby backend: remove the errant runsvdir config file
[code]root@backend1# rm -f /etc/init/private\_chef-runsvdir.conf
root@backend2# rm -f /etc/init/private\_chef-runsvdir.conf
[/code] - On the standby (non-bootstrap) backend: reboot your server to clear all remaining orphaned processes and to restart runsvdir to a working state
root@backend2# init 6
- On the standby backend: Verify that there is only a single runsvdir process and it is error-free (all dots)
root@backend2# ps -ef |grep 'runsvdir -P /opt/opscode/service' root 921 1 0 05:35 ? 00:00:00 runsvdir -P /opt/opscode/service log: ........................................................................................................................................................................................................................................................................................................................................................................................................... root@backend2# private-chef-ctl ha-status [OK] keepalived HA services enabled. [OK] DRBD disk replication enabled. [OK] DRBD partition /dev/opscode/drbd found. [OK] DRBD device /dev/drbd0 found. [OK] cluster status = backup [OK] did not find VIP IP address and I am not master [OK] found VRRP communications interface eth0 [OK] my DRBD status is Connected/Secondary/UpToDate and I am not master [OK] my DRBD partition is not mounted and I am not master [OK] DRBD primary IP address pings [OK] DRBD secondary IP address pings [OK] bookshelf is not running, and I am not master. [OK] couchdb is not running, and I am not master. [OK] keepalived is running. [OK] nginx is not running, and I am not master. [OK] oc\_bifrost is not running, and I am not master. [OK] oc\_id is not running, and I am not master. [OK] opscode-account is not running, and I am not master. [OK] opscode-certificate is not running, and I am not master. [OK] opscode-erchef is not running, and I am not master. [OK] opscode-expander is not running, and I am not master. [OK] opscode-expander-reindexer is not running, and I am not master. [OK] opscode-org-creator is not running, and I am not master. [OK] opscode-solr is not running, and I am not master. [OK] opscode-webui is not running, and I am not master. [OK] postgresql is not running, and I am not master. [OK] rabbitmq is not running, and I am not master. [OK] redis\_lb is not running, and I am not master. [OK] all checks passed.
- on the active/bootstrap backend: trigger a failover and then reboot
root@backend1# private-chef-ctl stop keepalived ok: down: keepalived: 1s, normally up root@backend1# sleep 30 root@backend1# init 6
- on the bootstrap (now standby backend): Verify that there is only a single runsvdir process and it is error-free (all dots)
root@backend1# ps -ef |grep 'runsvdir -P /opt/opscode/service' root 921 1 0 05:35 ? 00:00:00 runsvdir -P /opt/opscode/service log: ...........................................................................................................................................................................................................................................................................................................................................................................................................
- On the active (non-bootstrap) backend, trigger another failover back to the bootstrap backend
root@backend2# private-chef-ctl restart keepalived
- Test your now-active bootstrap backend to ensure full functionality (note: you may need to point your
api_fqdn
address at localhost using the server’s/etc/hosts
fileroot@backend1# private-chef-ctl ha-status [OK] keepalived HA services enabled. [OK] DRBD disk replication enabled. [OK] DRBD partition /dev/opscode/drbd found. [OK] DRBD device /dev/drbd0 found. [OK] cluster status = master [OK] found VIP IP address and I am master [OK] found VRRP communications interface eth0 [OK] my DRBD status is Connected/Primary/UpToDate and I am master [OK] my DRBD partition is mounted and I am master [OK] DRBD primary IP address pings [OK] DRBD secondary IP address pings [OK] bookshelf is running correctly, and I am master. [OK] couchdb is running correctly, and I am master. [OK] keepalived is running. [OK] nginx is running correctly, and I am master. [OK] oc\_bifrost is running correctly, and I am master. [OK] oc\_id is running correctly, and I am master. [OK] opscode-account is running correctly, and I am master. [OK] opscode-certificate is running correctly, and I am master. [OK] opscode-chef-mover is running. [OK] opscode-erchef is running correctly, and I am master. [OK] opscode-expander is running correctly, and I am master. [OK] opscode-expander-reindexer is running correctly, and I am master. [OK] opscode-org-creator is running correctly, and I am master. [OK] opscode-solr is running correctly, and I am master. [OK] opscode-webui is running correctly, and I am master. [OK] postgresql is running correctly, and I am master. [OK] rabbitmq is running correctly, and I am master. [OK] redis\_lb is running correctly, and I am master. [OK] all checks passed. root@backend1# private-chef-ctl test ... Finished in 1 minute 23.67 seconds 116 examples, 0 failures, 3 pending
- Note: pending errors are OK
- Note: This command may fail on the first attempt after a fail-over, please contact support if it continues to fail.
- On your frontends, follow the Standalone procedure as detailed below
- Upgrade following the normal procedure to Enterprise Chef 11.2.1
Standalone
- stop the errant runsvdir process:
# initctl status private\_chef-runsvdir private\_chef-runsvdir start/running, process 926 # initctl stop private\_chef-runsvdir private\_chef-runsvdir stop/waiting
- remove the errant runsvdir config file
# rm -f /etc/init/private\_chef-runsvdir.conf
- stop all private-chef services
# private-chef-ctl stop
- reboot your server to clear all remaining orphaned processes and to restart runsvdir to a working state.
- Verify that there is only a single runsvdir process and it is error-free (all dots)
# ps -ef |grep 'runsvdir -P /opt/opscode/service' root 921 1 0 05:35 ? 00:00:00 runsvdir -P /opt/opscode/service log: ...........................................................................................................................................................................................................................................................................................................................................................................................................
- Test your system to ensure full functionality (note: you may need to point your
api_fqdn
address at localhost using the server’s/etc/hosts
file# private-chef-ctl test ... Finished in 1 minute 23.67 seconds 116 examples, 0 failures, 3 pending
Note: pending errors are OK