Friday, 2013-11-15

*** joevano has quit (.net .split)		03:52
*** joevano (~joevano@bzflag/developer/JoeVano) has joined #wikid		03:53
*** nowen (~nowen@50-194-249-125-static.hfc.comcastbusiness.net) has joined #wikid		13:11
*** tessellare (~tessellar@38.88.11.237) has joined #wikid		13:25
*** tessellare has parted #wikid (None)		13:25
*** mark___ (8f74745b@gateway/web/freenode/ip.143.116.116.91) has joined #wikid		13:51
mark___	Nick are you here	13:51
mark___	we have an issue with our wikid	13:52
nowen	is it the iphone token?	13:52
mark___	no	13:52
mark___	our db crashed	13:52
nowen	what happened?	13:53
nowen	can you restart wikid?	13:53
mark___	our db filled up the drive and now over 4500 of the users are missing	13:53
mark___	no it will not let us restart	13:53
mark___	gives us a space error	13:54
nowen	ok	13:54
nowen	can you run some commands now?	13:54
mark___	sure Troy is joining so hang on one moment	13:55
*** Troy (329b9bb1@gateway/web/freenode/ip.50.155.155.177) has joined #wikid		13:55
Troy	Hi	13:56
nowen	hi	13:56
nowen	sorry to hear about the issue	13:56
nowen	i have some postgres commands I want you to run	13:56
nowen	is the disk 100% full?	13:56
nowen	run 'yum clean all' first to see if that helps	13:56
mark___	now we have 9% free	13:58
nowen	ok	13:58
nowen	now: su - postgres	13:58
nowen	$ psql -d wikid	13:58
nowen	wikid=# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;	13:59
mark___	should we start the server again?	13:59
nowen	I pasted the prompt there too	13:59
nowen	no, not yet	13:59
nowen	that command will show the db size	13:59
nowen	actually, right, you might need to start postgres	14:00
nowen	'service postgresql start' i think	14:00
nowen	no need to start all of wikid yet	14:00
nowen	we will vacuum the db and that should get us more space	14:00
mark___	what is the command for starting postgres	14:01
nowen	service postgresql start	14:01
Troy	says service command not found	14:02
mark___	command service not found	14:02
nowen	is this ubuntu?	14:03
nowen	or centos?	14:03
nowen	are you root?	14:04
Troy	redhat	14:04
Troy	postgres -D /usr/local/pgsql/data	14:04
Troy	i'm su to postrgres	14:04
nowen	oh - sorry - back out to root and start it	14:04
nowen	once started, su to postgres again	14:06
nowen	are you'll in replication mode?	14:06
mark___	okay it started	14:08
Troy	yes	14:08
Troy	should I config back to standalone?	14:09
nowen	not yet, I think	14:09
Troy	ok	14:10
nowen	as postgres, if you run 'psql -d wikid -p 5434' do you get a psql prompt?	14:11
Troy	yes	14:13
nowen	ok	14:13
nowen	run 'SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;'	14:14
nowen	and tell me how big it is	14:14
Troy	it didn't come back with anything after that command	14:15
Troy	does it dump to a file?	14:15
nowen	no it should come back	14:16
nowen	you have a ; on the end?	14:16
nowen	and you dropped the quotes?	14:16
nowen	wikid=# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;	14:17
nowen	fulldbsize	14:17
nowen	------------	14:17
nowen	15 MB	14:17
nowen	(1 row)	14:17
Troy	psql -d wikid -p 5432 or 5434 ?	14:17
Troy	how to I exit and try the other port?	14:17
nowen	ctrl d	14:18
nowen	should be 5434 in replication	14:18
Troy	yes.. doesn't connect on that other port	14:19
Troy	just doesn't bring back anything for the db size	14:19
Troy	yes.. i dropped the quotes	14:19
nowen	ko	14:19
nowen	ok	14:20
nowen	so, back as postgres	14:20
Troy	-rw------- 1 root root 0 Oct 27 04:02 spooler.2 -rw------- 1 root root 0 Oct 20 04:02 spooler.3 -rw------- 1 root root 0 Oct 13 04:02 spooler.4 drwxr-x--- 2 squid squid 4096 Feb 16 2010 squid -rw------- 1 root root 0 Sep 13 2011 tallylog -rw-r--r-- 1 root root 16952 Nov 15 07:07 up2date -rw-r--r-- 1 root root 21646 Nov 10 02:49 up2date.1 -rw-r--r-- 1 root root 21906 Nov 3 03:06 u	14:20
Troy	ok	14:20
Troy	wikid=# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize wikid-# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;	14:21
nowen	did it work?	14:22
nowen	SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize wikid-# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;	14:22
nowen	SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize;	14:24
nowen	?	14:26
Troy	ERROR: syntax error at or near "SELECT" at character 64 LINE 2: SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsi...	14:27
nowen	hmm	14:27
nowen	maybe something in the cache. try again	14:27
nowen	is there a leading space?	14:27
nowen	nah, that doesn't matter	14:27
nowen	ok - we can skip this if you like. just vacuum the db and see if it helps the disk	14:29
Troy	ok	14:31
Troy	the problem is when it was up it was only showing like 500 users	14:31
mark___	will that remove any critical data	14:31
nowen	It should not	14:32
nowen	are you sure it is the db taking the space?	14:32
Troy	not sure really..because the secondary server was not full	14:33
Troy	just this one	14:33
Troy	could be logs?	14:33
nowen	was the server running slow?	14:33
Troy	a little.. but mostly just when I restarted. i got the timecop errors	14:33
Troy	then the log in wikid said it was out of space	14:34
nowen	did you have the logs set for debug?	14:34
Troy	no.. it was normal	14:35
nowen	look for any big logs in /var/log	14:35
nowen	also 'locate *.rpm'	14:35
Troy	opt/wikid-utilities-3.0.9-1.i386.rpm /usr/share/doc/mozldap-6.0.5/README.rpm /usr/share/doc/vim-common-7.0.109/Changelog.rpm /var/cache/yum/rhel-i386-server-5/packages/ghostscript-8.70-6.el5.i386.rpm	14:39
Troy	can I check maybe if we have a backup of the db?	14:40
nowen	that's ok. just wanted to make sure that there weren't any big wikid rpms	14:40
nowen	yes - and we can create one	14:40
Troy	then run a vacuum?	14:40
nowen	well - your secondary should be a back up	14:41
Troy	yes.. unfortunately the secondary is the same	14:44
nowen	'tar -czvf dbbackup.tar.gz /var/lib/pgsql/data/* ' will tar up the db.	14:44
nowen	but I worry there isn't room for it on the server	14:45
nowen	how much space is there?	14:46
Troy	we are restoring from a backup	14:46
*** jrdx (~jrdx@38.88.11.237) has joined #wikid		14:50
*** jrdx has parted #wikid (None)		14:50
nowen	hmm - I was going to suggest we try to start it and see	14:53
Troy	just an update.. we are restore from a full backup 11/10 then we have daily incrementals	15:01
nowen	so, are your users all back?	15:06
Troy	we are still waiting on the restore	15:07
Troy	# psql -d wikid -p 5432 or 5434 wikid-# SELECT pg_size_pretty(pg_database_size('wikid')) As fulldbsize; wikid=# VACUUM FULL VERBOSE;	15:07
Troy	should I go ahead and run a VACUUM when it's back online?	15:08
Troy	I will check first.. but we may run into the issue again if it's close to the max usage	15:08
nowen	if you have a back up, then no harm in running vacuum, certainly	15:08
nowen	'vacuumdb -avfv -p 5434'	15:09
nowen	and 'reindexdb -p 5434'	15:10
mark___	nick iphone	15:13
mark___	is the app missing from the store?	15:13
mark___	for iphones?	15:13
nowen	mark___: yes. we had a show-stopper bug. no option to revert the binaries on the apple store, so we had to pull it. we have submitted a critical fix and await apple	15:14
mark___	ok	15:14
nowen	very sorry	15:14
nowen	mark___: how big are the disks on your wikid server?	15:56
Troy	i think there was only 3GB allocated for the /var volume	16:11
nowen	I would really like to see the output of the dbsize	16:12
nowen	what's the current status?	16:21
Troy	we are back up	17:03
nowen	good to hear	17:03
nowen	in /var/lib/pgsql, will you run 'du -h'	17:04
nowen	and let me know how big /data is?	17:05
mark___	hi Nick	17:19
Troy	672K	17:19
mark___	do you know big the flat file is	17:19
mark___	for a failover	17:19
nowen	mark___: i don't follow what flat file?	17:19
mark___	does the failover use postgres	17:19
mark___	when wikid failovers	17:20
nowen	replication uses postgres	17:20
mark___	how does that process occur	17:20
nowen	when the primary is down, you need to run wikidctl promote on the secondary	17:21
mark___	yes but what does that entail	17:22
mark___	when i run wikidctl	17:22
mark___	what all occurs	17:22
mark___	does the logs move over as well	17:22
nowen	no, everything should be there. You give the secondary an ip that works and run it as the primary	17:23
nowen	http://www.wikidsystems.com/support/wikid-support-center/installation-how-tos/how-to-configure-wikid-for-replication has the commands	17:23
nowen	Troy: that doesn't seem very big. how much free disk space is there now?	17:24
nowen	mark___: it can be scripted. is that what you want to do?	17:30
mark___	Nick here is our question	17:52
mark___	the primary server failed	17:52
mark___	due to lack of db space	17:53
mark___	when we failed over the secondary did not work	17:53
mark___	or did not come up correctly	17:53
mark___	any ideas as to why this did not work as expected	17:54
Troy	does the secondary server flat files get updated on every db change on the primary?	17:54
mark___	and we have it scripted	17:54
nowen	ah - I see - by flat-file, you mean not in the db. No, those must be synced by 'wikidctl sync'. But, those should not change often. E.g., the intermediate certs.	17:56
nowen	how exactly did the secondary 'not work'. not able to get OTPs? or not able to login?	18:04
mark___	it did not show but 786 users	18:24
mark___	when it should have shown 5726	18:24
nowen	are they both up now? We can check to see if they are in sync	18:25
Troy	yes.. it should've show the full amount.. but somehow it pulled in something else	18:25
Troy	we are running just the primary in stand alone	18:25
nowen	do you think it pulled in the mistaken amount from the primary?	18:26
Troy	that's what I'm thinking	18:27
nowen	so, we're back to what happened on the primary	18:28
nowen	how much disk space is there now?	18:28
Troy	they increased the volume.. i'll have to check the space now	18:28
Troy	looks like they increase /var from 3GB to 5GB	18:31
nowen	is it close to 3gb?	18:37
nowen	'df -h'	18:37
nowen	I never know what people know ;-)	18:37
*** mark___ has quit (Ping timeout: 250 seconds)		18:44
*** mark___ (8f74745b@gateway/web/freenode/ip.143.116.116.91) has joined #wikid		19:18
mark___	Nick	19:18
mark___	our DBA has a question	19:18
mark___	we started everything back up	19:18
nowen	ok	19:18
mark___	and he can no longer access the DB	19:18
mark___	and he thinks it is down	19:19
mark___	however wikid is working	19:19
nowen	is he using the right port?	19:19
mark___	what port	19:19
nowen	no replication == 5432	19:19
nowen	replication == 5434	19:19
mark___	Nick	19:36
nowen	yes	19:37
mark___	one sec	19:37
nowen	k	19:37
Troy	Nick .. what is the path to the postgres flat file that get sent over on the secondary?	20:38
nowen	umm. there is no flat file for postgres. there's a utility that copies all the transactions in real time to the secondary	20:39
nowen	if you would like to check that the timestamps are the same on the primary and secondary, you can run this: psql -h 127.0.0.1 -p 5434 -d wikid -U postgres -c "select	20:42
nowen	max(timestamp) from logging_event"	20:42
nowen	on the master, you can check the secondary using port 5436 instead of 5434	20:43
mark___	is there a way to create a lag or holding folder before being applied in the event there was a corruption	20:49
mark___	just in case a corruption did occur it would not affect the secondary immediately?	20:50
nowen	good question, let me dig on that	20:50
mark___	ok	20:50
nowen	doesn't appear to be an option	20:55
nowen	not looking good on that	21:14
nowen	you could run a chron on the secondary that tar's up the db	21:15
nowen	any idea what caused the corruption? was it the disk space?	21:22
Troy	i guess nothing was corrupted.. just ran out of space and once the volume was increased, the db was brought back online	21:44
nowen	hmm	21:45
nowen	my question: how much space is there now? did something happen to chew it all up? or was it a slow roll?	21:45
Troy	we did add quite a few more users over the last few months.. so i think it was just a slow roll	21:46
Troy	we thought Nagios was monitoring the volumes.. but I guess that was never setup	21:47
Troy	we've been faithful to archive the logs every 2 weeks	21:47
nowen	how much space is there now?	21:48
Troy	brb.. i need to run pick up my kids from school..	21:48
Troy	i think we have about 2GB free on the db volume	21:48
nowen	are we cool for today?	21:48
Troy	yea.. i think so.. i want to follow up with you next week if you are available Monday or Tuesday	21:48
nowen	sure	21:48
Troy	just in case any questions come up.. thanks!	21:49
nowen	I'd like to keep a close eye on things for a bit	21:49
nowen	email me if I'm not here, as always	21:49
nowen	mark___: It's possible that I am being stupid on that quote too. long couple of days	21:51
nowen	mark___: are we ok for the night? my eldest is in a play - I saw it last night, but would like to catch it again	22:27
Troy	yes.. we are good for now	22:28
nowen	ok	22:28
nowen	sorry for the issue.	22:29
*** nowen has quit (Quit: Leaving.)		22:30

Generated by irclog2html.py 2.11.0 by Marius Gedminas - find it at mg.pov.lt!