Friday, 2013-11-15

*** joevano has quit (*.net *.split)03:52
*** joevano (~joevano@bzflag/developer/JoeVano) has joined #wikid03:53
*** nowen (~nowen@50-194-249-125-static.hfc.comcastbusiness.net) has joined #wikid13:11
*** tessellare (~tessellar@38.88.11.237) has joined #wikid13:25
*** tessellare has parted #wikid (None)13:25
*** mark___ (8f74745b@gateway/web/freenode/ip.143.116.116.91) has joined #wikid13:51
mark___Nick are you here13:51
mark___we have an issue with our wikid13:52
nowenis it the iphone token?13:52
mark___no13:52
mark___our db crashed13:52
nowenwhat happened?13:53
nowencan you restart wikid?13:53
mark___our db filled up the drive and now over 4500 of the users are missing13:53
mark___no it will not let us restart13:53
mark___gives us a space error13:54
nowenok13:54
nowencan you run some commands now?13:54
mark___sure Troy is joining so hang on one moment13:55
*** Troy (329b9bb1@gateway/web/freenode/ip.50.155.155.177) has joined #wikid13:55
TroyHi13:56
nowenhi13:56
nowensorry to hear about the issue13:56
noweni have some postgres commands I want you to run13:56
nowenis the disk 100% full?13:56
nowenrun 'yum clean all' first to see if that helps13:56
mark___now we have 9% free13:58
nowenok13:58
nowennow:  su - postgres13:58
nowen$ psql -d wikid13:58
nowenwikid=# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;13:59
mark___should we start the server again?13:59
nowenI pasted the prompt there too13:59
nowenno, not yet13:59
nowenthat command will show the db size13:59
nowenactually,  right, you might need to start postgres14:00
nowen'service postgresql start' i think14:00
nowenno need to start all of wikid yet14:00
nowenwe will vacuum the db and that should get us more space14:00
mark___what is the command for starting postgres14:01
nowenservice postgresql start14:01
Troysays service command not found14:02
mark___command service not found14:02
nowenis this ubuntu?14:03
nowenor centos?14:03
nowenare you root?14:04
Troyredhat14:04
Troypostgres -D /usr/local/pgsql/data14:04
Troyi'm su to postrgres14:04
nowenoh - sorry - back out to root and start it14:04
nowenonce started, su to postgres again14:06
nowenare you'll in replication mode?14:06
mark___okay it started14:08
Troyyes14:08
Troyshould I config back to standalone?14:09
nowennot yet, I think14:09
Troyok14:10
nowenas postgres, if you run 'psql -d wikid -p 5434' do you get a psql prompt?14:11
Troyyes14:13
nowenok14:13
nowenrun 'SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;'14:14
nowenand tell me how big it is14:14
Troyit didn't come back with anything after that command14:15
Troydoes it dump to a file?14:15
nowenno it should come back14:16
nowenyou have a ; on the end?14:16
nowenand you dropped the quotes?14:16
nowenwikid=# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;14:17
nowen fulldbsize14:17
nowen------------14:17
nowen 15 MB14:17
nowen(1 row)14:17
Troypsql -d wikid -p 5432 or 5434 ?14:17
Troyhow to I exit and try the other port?14:17
nowenctrl d14:18
nowenshould be 5434 in replication14:18
Troyyes.. doesn't connect on that other port14:19
Troyjust doesn't bring back anything for the db size14:19
Troyyes.. i dropped the quotes14:19
nowenko14:19
nowenok14:20
nowenso, back as postgres14:20
Troy-rw------- 1 root  root         0 Oct 27 04:02 spooler.2 -rw------- 1 root  root         0 Oct 20 04:02 spooler.3 -rw------- 1 root  root         0 Oct 13 04:02 spooler.4 drwxr-x--- 2 squid squid     4096 Feb 16  2010 squid -rw------- 1 root  root         0 Sep 13  2011 tallylog -rw-r--r-- 1 root  root     16952 Nov 15 07:07 up2date -rw-r--r-- 1 root  root     21646 Nov 10 02:49 up2date.1 -rw-r--r-- 1 root  root     21906 Nov  3 03:06 u14:20
Troyok14:20
Troywikid=# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize wikid-# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;14:21
nowendid it work?14:22
nowenSELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize wikid-# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;14:22
nowenSELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;14:24
nowen?14:26
TroyERROR:  syntax error at or near "SELECT" at character 64 LINE 2: SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsi...14:27
nowenhmm14:27
nowenmaybe something in the cache.  try again14:27
nowenis there a leading space?14:27
nowennah, that doesn't matter14:27
nowenok - we can skip this if you like. just vacuum the db and see if it helps the disk14:29
Troyok14:31
Troythe problem is when it was up it was only showing like 500 users14:31
mark___will that remove any critical data14:31
nowenIt should not14:32
nowenare you sure it is the db taking the space?14:32
Troynot sure really..because the secondary server was not full14:33
Troyjust this one14:33
Troycould be logs?14:33
nowenwas the server running slow?14:33
Troya little.. but mostly just when I restarted.  i got the timecop errors14:33
Troythen the log in wikid said it was out of space14:34
nowendid you have the logs set for debug?14:34
Troyno.. it was normal14:35
nowenlook for any big logs in /var/log14:35
nowenalso 'locate *.rpm'14:35
Troyopt/wikid-utilities-3.0.9-1.i386.rpm /usr/share/doc/mozldap-6.0.5/README.rpm /usr/share/doc/vim-common-7.0.109/Changelog.rpm /var/cache/yum/rhel-i386-server-5/packages/ghostscript-8.70-6.el5.i386.rpm14:39
Troycan I check maybe if we have a backup of the db?14:40
nowenthat's ok.  just wanted to make sure that there weren't any big wikid rpms14:40
nowenyes - and we can create one14:40
Troythen run a vacuum?14:40
nowenwell - your secondary should be a back up14:41
Troyyes.. unfortunately the secondary is the same14:44
nowen'tar -czvf dbbackup.tar.gz /var/lib/pgsql/data/* ' will tar up the db.14:44
nowenbut I worry there isn't room for it on the server14:45
nowenhow much space is there?14:46
Troywe are restoring from a backup14:46
*** jrdx (~jrdx@38.88.11.237) has joined #wikid14:50
*** jrdx has parted #wikid (None)14:50
nowenhmm - I was going to suggest we try to start it and see14:53
Troyjust an update.. we are restore from a full backup 11/10 then we have daily incrementals15:01
nowenso, are your users all back?15:06
Troywe are still waiting on the restore15:07
Troy# psql -d wikid -p 5432 or 5434 wikid-# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize; wikid=# VACUUM FULL VERBOSE;15:07
Troyshould I go ahead and run a VACUUM when it's back online?15:08
TroyI will check first.. but we may run into the issue again if it's close to the max usage15:08
nowenif you have a back up, then no harm in running vacuum, certainly15:08
nowen'vacuumdb -avfv -p 5434'15:09
nowenand 'reindexdb -p 5434'15:10
mark___nick iphone15:13
mark___is the app missing from the store?15:13
mark___for iphones?15:13
nowenmark___: yes.  we had a show-stopper bug. no option to revert the binaries on the apple store, so we had to pull it.  we have submitted a critical fix and await apple15:14
mark___ok15:14
nowenvery sorry15:14
nowenmark___: how big are the disks on your wikid server?15:56
Troyi think there was only 3GB allocated for the /var volume16:11
nowenI would really like to see the output of the dbsize16:12
nowenwhat's the current status?16:21
Troywe are back up17:03
nowengood to hear17:03
nowenin /var/lib/pgsql, will you run 'du -h'17:04
nowenand let me know how big /data is?17:05
mark___hi Nick17:19
Troy672K17:19
mark___do you know big the flat file is17:19
mark___for a failover17:19
nowenmark___: i don't follow what flat file?17:19
mark___does the failover use postgres17:19
mark___when wikid failovers17:20
nowenreplication uses postgres17:20
mark___how does that process occur17:20
nowenwhen the primary is down, you need to run wikidctl promote on the secondary17:21
mark___yes but what does that entail17:22
mark___when i run wikidctl17:22
mark___what all occurs17:22
mark___does the logs move over as well17:22
nowenno, everything should be there.  You give the secondary an ip that works and run it as the primary17:23
nowenhttp://www.wikidsystems.com/support/wikid-support-center/installation-how-tos/how-to-configure-wikid-for-replication has the commands17:23
nowenTroy: that doesn't seem very big.  how much free disk space is there now?17:24
nowenmark___: it can be scripted. is that what you want to do?17:30
mark___Nick here is our question17:52
mark___the primary server failed17:52
mark___due to lack of db space17:53
mark___when we failed over the secondary did not work17:53
mark___or did not come up correctly17:53
mark___any ideas as to why this did not work as expected17:54
Troydoes the secondary server flat files get updated on every db change on the primary?17:54
mark___and we have it scripted17:54
nowenah - I see - by flat-file, you mean not in the db.  No, those must be synced by 'wikidctl sync'. But, those should not change often.  E.g., the intermediate certs.17:56
nowenhow exactly did the secondary 'not work'.  not able to get OTPs? or not able to login?18:04
mark___it did not show but 786 users18:24
mark___when it should have shown 572618:24
nowenare they both up now?  We can check to see if they are in sync18:25
Troyyes.. it should've show the full amount.. but somehow it pulled in something else18:25
Troywe are running just the primary in stand alone18:25
nowendo you think it pulled in the mistaken amount from the primary?18:26
Troythat's what I'm thinking18:27
nowenso, we're back to what happened on the primary18:28
nowenhow much disk space is there now?18:28
Troythey increased the volume.. i'll have to check the space now18:28
Troylooks like they increase /var from 3GB to 5GB18:31
nowenis it close to 3gb?18:37
nowen'df -h'18:37
nowenI never know what people know ;-)18:37
*** mark___ has quit (Ping timeout: 250 seconds)18:44
*** mark___ (8f74745b@gateway/web/freenode/ip.143.116.116.91) has joined #wikid19:18
mark___Nick19:18
mark___our DBA has a question19:18
mark___we started everything back up19:18
nowenok19:18
mark___and he can no longer access the DB19:18
mark___and he thinks it is down19:19
mark___however wikid is working19:19
nowenis he using the right port?19:19
mark___what port19:19
nowenno replication == 543219:19
nowenreplication == 543419:19
mark___Nick19:36
nowenyes19:37
mark___one sec19:37
nowenk19:37
TroyNick .. what is the path to the postgres flat file that get sent over on the secondary?20:38
nowenumm.  there is no flat file for postgres.  there's a utility that copies all the transactions in real time to the secondary20:39
nowenif you would like to check that the timestamps are the same on the primary and secondary, you can run this: psql -h 127.0.0.1 -p 5434 -d wikid -U postgres -c "select20:42
nowenmax(timestamp) from logging_event"20:42
nowenon the master, you can check the secondary using port 5436 instead of 543420:43
mark___is there a way to create a lag or holding folder before being applied in the event there was a corruption20:49
mark___just in case a corruption did occur it would not affect the secondary immediately?20:50
nowengood question, let me dig on that20:50
mark___ok20:50
nowendoesn't appear to be an option20:55
nowennot looking good on that21:14
nowenyou could run a chron on the secondary that tar's up the db21:15
nowenany idea what caused the corruption? was it the disk space?21:22
Troyi guess nothing was corrupted.. just ran out of space and once the volume was increased, the db was brought back online21:44
nowenhmm21:45
nowenmy question:  how much space is there now? did something happen to chew it all up? or was it a slow roll?21:45
Troywe did add quite a few more users over the last few months.. so i think it was just a slow roll21:46
Troywe thought Nagios was monitoring the volumes.. but I guess that was never setup21:47
Troywe've been faithful to archive the logs every 2 weeks21:47
nowenhow much space is there now?21:48
Troybrb.. i need to run pick up my kids from school..21:48
Troyi think we have about 2GB free on the db volume21:48
nowenare we cool for today?21:48
Troyyea.. i think so.. i want to follow up with you next week if you are available Monday or Tuesday21:48
nowensure21:48
Troyjust in case any questions come up.. thanks!21:49
nowenI'd like to keep a close eye on things for a bit21:49
nowenemail me if I'm not here, as always21:49
nowenmark___: It's possible that I am being stupid on that quote too.  long couple of days21:51
nowenmark___: are we ok for the night? my eldest is in a play - I saw it last night, but would like to catch it again22:27
Troyyes.. we are good for now22:28
nowenok22:28
nowensorry for the issue.22:29
*** nowen has quit (Quit: Leaving.)22:30

Generated by irclog2html.py 2.11.0 by Marius Gedminas - find it at mg.pov.lt!