Monday, 2014-01-06

*** nowen has quit (Quit: Bye)00:29
*** rfxn (~teck7@bas1-montreal54-1167956021.dsl.bell.ca) has joined #wikid07:11
*** nowen (~nowen@50-194-249-125-static.hfc.comcastbusiness.net) has joined #wikid13:43
*** Troy_ (329b98a8@gateway/web/freenode/ip.50.155.152.168) has joined #wikid16:14
nowenHi Troy_16:14
Troy_Good morning16:14
*** Troy_ is now known as Guest6459016:14
Guest64590i appologize for the delayed responses yesterday.. it was my sons b-day and got busy16:15
nowenno problem.  just wanted to make sure that I hadn't dropped a ball16:15
nowenany ideas on the cause?16:16
Guest64590no.. not yet.. just seemed to get stuck early Sunday morning.. issued OTPs but wasn't able to able to do much of anything else16:18
Guest64590once I shutdown the primary server, the failover scripts took over and we were back up on the secondary16:18
Guest64590now I can't get the primary back up16:18
Guest64590did you see anything from the pgsql logs?16:19
nowennot really.16:22
nowencan you start postgres via the service command?16:22
Guest64590i have not attempted that yet.. i was waiting on Vince (db admin) before I did anything else.. but I haven't been able to reach him this morning yet16:23
nowenand there was nothing in the tomcat logs?16:26
Guest64590I just sent the last few entries to you16:30
Guest64590I don't see anything from the recent startup attemp16:30
Guest64590attempt16:30
nowenyeah, but that error could point to the failure.  the time seems about right, right?16:32
Guest64590yes.. actually I believe these latest entries would be about an hour or so after we saw failures16:35
Guest64590I'm going back a bit to see what is going on earlier16:35
Guest64590actually from the Wikid logs, the last verified creds happened at about 6:09AM.. but users got OTPs up until I stopped the service at about 7:40Am16:37
nowendid you'll ever upgrade your WiKID version?16:49
nowenTroy:  Can you reboot the server? I think we should archive/mv the postgres log and see if it is complaining on a fresh restart16:59
Guest64590no.. we are still at wikid-server-enterprise-3.4.87-b121617:00
Guest64590yes.. i will see about rebooting this server.. hopefuly that will clear up anything that is hung up17:01
Guest64590the server is rebooting now17:19
nowenok17:19
nowenis wikid set to autostart?17:19
Guest64590no17:22
nowenok17:22
Guest64590one question.. on the setup.conf.. is the port different for standalone?17:22
nowenjust mv the pgstartup.log anywhere lese17:22
nowenyes, but if you set the role to none and then run setup and set it to none again, it will change everything.17:23
Guest64590postgres_port=5434 or 5432 ?17:23
nowen543217:23
Guest64590ok17:23
nowenfor stand alone17:23
Guest64590ok.. i move pgstartup.log to /tmp and ran wikidctl setup, set to none17:26
Guest64590attempting start now17:26
nowenok17:27
Guest64590ok.. that worked!17:27
nowenok17:28
Guest64590now I just need to fail back over17:28
nowenso, one thing I wanted to talk about was how we designed the failover.17:28
Guest64590ok17:28
nowenwe developed it for a customer that was neutral on which server was primary17:29
nowenso, the fastest recovery was to make the old primary the secondary, sync the updates and restart17:29
nowenif you'll want to keep the primary the primary, we can write a script that will facilitate the fail back17:31
nowenbut it will still take longer than switching17:31
nowendoes that make sense?17:31
Guest64590ok. makes sense.. we would need to adjust out failover scripts for this happen the way you intended17:31
nowenalso, we're going to look into upgrading postgres to a more recent version which I think will have some good performance improvements for you guys17:32
Guest64590that would be awesome.  i need to schedule a time for upgrading to the lastest Wikid version17:33
Guest64590sounds like we would need to re-create all the local and network client certificates.. so it may take some time17:34
nowenyeah - will you talk to Vince about it and see how he feels? definitely want him on board and if he has any preferences I'd like to know17:34
nowenI don't think so17:34
nowenonly if you change the IP addresses17:34
Guest64590you mentioned something a few months back about updating the certificates for one the later builds17:35
nowenoh, yes17:35
nowenthat's right.  We have a new intermediate ca.17:35
Guest64590i don't recall which build updated the int CA..b ut i remember you saying we would need to re-generate the cert chain17:37
nowen3.5.0 b1421 updated the cert17:37
nowenthere's also an update for the utilities rpm.  so you have to do both17:38
Guest64590ok17:40
*** Guest64590 has quit (Ping timeout: 272 seconds)20:06
*** Troy_ (329b98a8@gateway/web/freenode/ip.50.155.152.168) has joined #wikid20:54
*** Troy_ is now known as Guest3480120:54
nowenTroy:  any update?20:56
Guest34801Not really.  I'm getting ready to setup the replication back on the old primary again21:08
Guest34801Nick:  when i run the setup replication the other direction.. from the new primary back to the old primary (setup as slave). do I need to enter the root password to transfer the key?21:56
Guest34801I believe I already transferred the keys one time before in this direction21:56
nowenyes21:56
nowenreally?21:57
nowenoh, yeah, it is probably doing it again21:57
Guest34801yes.. when we had that last downtime21:57
nowenI'm not sure, I always just enter it21:57
Guest34801ok.. i don't have it.. so I'll have to have the DC guys help me out21:57
nowenhmm21:58
Guest34801Nick:  I just tried the wikidctl sync from the new primary to the old primary and got some timeout error22:36
nowenhmm22:36
nowencan you ssh to the box?22:36
Guest34801but I was able to setup the replication in the reverse fine22:36
nowentake a look at /opt/WiKID/private on both boxes - do you see the replication keys?22:37
Guest34801yes.. i was able to ssh as wikid user22:37
nowenhow about as root?22:37
Guest34801checking root22:41
nowenI'm thinking that it is something with keys.  it's just ssh, so it should be fst22:41
Guest34801yes.. i'm able to ssh as root both ways22:41
nowentry again, maybe it was a blip22:42
Guest34801and I see the replication keys in the /private folder22:42
Guest34801i did test twice.. let me check a few other things first22:42
Guest34801what I did is the following:  I stopped both servers, setup the slave (old primary) and the master (old slave).. then ran wikidctrl sync selecting d for database only22:50
Guest34801didn't get any prompt to enter the root of the slave22:51
Guest34801I see the replication.ssh in the private folder on each server22:51
nowenit asked you for the password when you did the setup?22:51
nowendo you see replication.ssh.pub too>22:52
nowen?22:52
Guest34801I got the While talking to slave server: Timeout connecting to xxx.xxx.xxx.xxx22:53
Guest34801everything looks fine otherwise22:53
Guest34801no.. it never ask me this time22:53
nowendo you see replication.ssh.pub too?  there should be two keys22:54
Guest34801yes.. i see that file on both servers22:54
nowendates and size are the same?22:54
Guest34801then i went ahead and started the slave and then the master22:54
Guest34801and replication seemed to start fine22:54
Guest34801i'm checking the db timestamps22:55
nowenI'm checking here and its syncing fine on mine22:55
nowencheck /var/log/secure on the new slave22:55
Guest34801time stamps are good22:56
Guest34801Jan  6 16:34:09 hsvwikidp1 sshd[15371]: Accepted publickey for root from x.x.x.x port 40675 ssh222:58
nowenhmm, I know it's not your pipe.22:58
nowenyou think the timestamps mean that it worked, but failed at the close?22:59
Guest34801strange.. well.. we can leave it this way until tomorrow22:59
nowenis the new primary still up?22:59
Guest34801yes.. we are working fine for now.... we would like to get back to the other way since we have the automatic failover scripts in place for the old primary23:00
Guest34801and from the timestamps i just ran, it looks like replication is working23:00
nowenyeah, could be the ssh tunnel was slow to close or something23:01
nowenok - I'll be here tomorrow.23:25
*** nowen has quit (Quit: Leaving.)23:26
*** Guest34801 has quit (Quit: Page closed)23:42

Generated by irclog2html.py 2.11.0 by Marius Gedminas - find it at mg.pov.lt!