MySQL: Automate your Merge Engines

|

A while ago, I’d written a bash script to handle the automated control of a table using the MERGE engine.  The table originally held the log data from a mobile messaging service, and very quickly became far  too big to be easily manageable.  Since the application was provided by a third party, and I had no way of changing how it logged it’s data, I devised this bash script to automatically reorganise a merge table to ensure the data was in manageable chunks.

This predates partitioning within MySQL, which would have been an ideal solution at the time.  However, there are still practical uses for the merge engine, so I’ve dug out the script for your enjoyment.  I’m sure it worked, as is, back in 2005, but I can’t recall if this was the totally polished and final version, and I’ve also sanitised it to remove references to the company and the third party app. So, if it doesn’t work perfectly, don’t be surprised, but it is certainly a good starting point if you want to do something similar.

This script also handles the creation of the new tables, so you don’t need to predefine the tables X years in advance.

#/bin/bash

################################################################################
# Merge engine rotate and clean script.                                        #
# What do I do?                                                                #
# Take latest week live  entries entries (till time on day script was run)     #
# and copy them into a new table.                                              #
# Modifies MASTER table (using “merge” storage engine) so it                   #
# displays data for only last x weeks. All other weekly tables remain          #
# individually available.                                                      #
#                                                                              #
# Executes as user: [whatever]                                                 #
# Execution frequency: [stick it in your crontab. Weekly would be good].       #
#                                                                              #
# Changelog:                                                                   #
# Initial state                                                                #
# Version 1.0                                                                  #
################################################################################

################################################################################
#Okay, before I begin, I need to know some stuff….
# How do I get into the database, and where do I go when I get there?
user=”xxxx”
passwd=”xxxx”

echo “setup login”

database=”dbname”  #the database i’m working with
tableprefix=”tablename” #the table that the live data is in
mergetable=”toptable” #the “merge” engine table
keycolumn=”id”  #the unique identifier in the live log table
logrotateday=”Monday” #the day I should rotate the logs
logretain=2 #how many weeks do we want under the merge table?

#Who do I send the “oh dear I’m not working” email to?
alert_email=”email@null.com”
echo “setup vars”

# Where do you want me to put the file with the table listing?
listpath=”/path/to/writeable/tempfile/” #path to mergetable listing file
listfile=”merge_tables.txt” #filename of mergetable listing

listfqp=”${listpath}/${listfile}” #file storing the merge tables

mysql=”/path/to/mysql/bin/mysql -u$user -p$passwd $database” # execute MySQL

# Where do you want me tell you about all the bad shit?
qrylogdate=`date +%Y-%m-%d` #date for use in queries.
errlog=”/path/tp/error_log_”$qrylogdate”.err”
echo “setup paths and files”
# If bad stuff happens, I’m going to do this…
function report_error(){
echo “Log rotation script exited” >> $errlog
tail -n 10 $errlog | mail -s “log rotation script has failed” “$alert_email”

}
echo “running….”
################################################################################
# Now I start doing stuff…

daytoday=`date +%A` #what day of the week is it?
#before any new tables are added, get the latest one.
lasttable=`grep “^$tableprefix” $listfqp | tail -n 1 | sed ’s/’${tableprefix}’_//g’`
newtable=$lasttable
oldtable=$lasttable
echo “${lasttable} , ${newtable},${oldtable}”
if [[ $logrotateday == $daytoday ]]
then
echo “It’s ${daytoday}! Lets create a new table.”

#Create new table. with _<var> suffix.
newtable=`expr $lasttable + 1`
echo “$newtable”
$mysql -e”CREATE TABLE ${tableprefix}_${newtable} like ${tableprefix}_${oldtable}” 2>> $errlog
if [ "$?" != 0 ]
then
echo “failed at table creation” >> $errlog
report_error
exit
fi

# get a list of all the archive log tables, and insert into the
# merge_log_tables.txt file
echo “Writing merge table listing to file”
$mysql -e”show tables like ‘${tableprefix}_%’” > $listfqp 2>> $errlog
# parse file to take 6 most recent tables and create CSV list.
mergelist=`grep “^${tableprefix}_” $listfqp | tail -n 6 | awk ‘{ ORS=”,” ; print $0 }’|sed -e ’s#,$#\n#g’`
echo $mergelist
# modify the master log archive table to use only the latest 6 log tables.
echo “altering $mergetable with new union list”
$mysql -e”alter table $mergetable union($mergelist)” 2>> $errlog
if [ "$?" != 0 ]
then
echo “failed at master archive update” >> $errlog
report_error
exit
fi
fi
# use a “select into” to copy all records inserted between the last entry in the
# archive, and midnight last night.
echo “copying data into ${tableprefix}_${newtable}”
$mysql -e”insert into ${tableprefix}_${newtable} select * from $tableprefix where id < (select max(${keycolumn}) from ${tableprefix})”  2>> $errlog
if [ "$?" != 0 ]
then
echo “failed at ’select * into…..’” >> $errlog
report_error
exit
fi

# and delete this, and all preceding records from the live log.
echo “deleting copied data from $tableprefix”
$mysql -e”delete from $tableprefix where $keycolumn<=(select max($keycolumn) from ${tableprefix}_${newtable})” 2>> $errlog
if [ "$?" != 0 ]
then
echo “failed at ‘deletion from live logs…..’” >> $errlog
report_error
exit
fi
echo “All done”
# All Done…  Bye bye
################################################################################

  • Share/Bookmark

A cheeky MySQL replication log flusher

|

Rather than relying on the expire_log_days feature to ensure your replicating master doesn’t fill up with obsolete binary logfiles, you can deploy this little SQL script.  It’s a bit cheeky and dirty, but it works.  You could use this to just keep your logs in check, or also for a system where multiple slaves are connecting to a master for periodic updates, and you can’t be sure which databases are up to date or not.

Pre-requisites are that the databases, the slave at least, supports federated tables, and that both support scheduled events and stored procedures.  other requirements are that replication is statement based, and when the events are created, they have the status “slave side disabled” on the slave servers.

A summary of the functionality is as follows:
On running the SQL on the master, a “replication check” database, two tables, a stored procedure and 2 events are created.
The tables hold the “heartbeat” information comprising the slave hostname and the timestamp of when the table was last updated.
The hostcheck table is federated from the slave database onto the slavecheck table on the master, so the master’s slavecheck table provides a reliable view of the last time the slave was updated.
One ‘heartbeat’ event runs every 10 minutes, where any slave running the replicated query would update it’s local hostcheck table with it’s hostname and the current timestamp.  This, via the federated table, updates the master with the same data.
The “replpurge” procedure runs daily, and purges all master logs before the lowest timestamp value (i.e. oldest) entry in the slavecheck table.
It should be noted that the “slavecheck” table requires to be populated with the hostnames of all slave servers that are required to operate within this system, and a corresponding date value which is early enough to ensure the master logs do not get prematurely purged.

The following SQL can be executed on any Master database, and when all slaves are “synced” it will automatically start working.

– Start SQL

DROP DATABASE IF EXISTS repl_check;

CREATE DATABASE repl_check;

USE repl_check;

GRANT INSERT,UPDATE,DELETE on repl_check.* to feduser@’%’ identified by ‘fedpass’;

DROP TABLE IF EXISTS `repl_check`.`hostcheck`;

DROP TABLE IF EXISTS `repl_check`.`slavecheck`;

CREATE TABLE  `repl_check`.`hostcheck` (

`hostname` varchar(255) NOT NULL,

`time_stamp` datetime NOT NULL,

PRIMARY KEY (`hostname`)

) ENGINE=FEDERATED CONNECTION=’mysql://feduser:fedpass@master.server.host:3306/repl_check/slavecheck’;

CREATE TABLE  `repl_check`.`slavecheck` (

`hostname` varchar(255) NOT NULL,

`time_stamp` datetime NOT NULL,

PRIMARY KEY (`hostname`)

) ENGINE=MyISAM ;

SET GLOBAL EVENT_SCHEDULER=’on’;

DROP EVENT IF EXISTS repl_check.replupdate;

CREATE EVENT repl_check.replupdate ON SCHEDULE EVERY 10 MINUTE DO REPLACE INTO repl_check.hostcheck (SELECT variable_value,NOW() FROM information_schema.global_variables WHERE variable_name=’hostname’);

DROP PROCEDURE IF EXISTS repl_check.binlog_cleanup;

DELIMITER $$

CREATE PROCEDURE  repl_check.binlog_cleanup()

BEGIN

DECLARE tstamp timestamp;

SELECT min(time_stamp) into tstamp from repl_check.slavecheck;

PURGE MASTER LOGS BEFORE tstamp;

END $$

DELIMITER ;

DROP EVENT IF EXISTS repl_check.replpurge;

CREATE EVENT repl_check.replpurge ON SCHEDULE EVERY 1 DAY DO CALL repl_check.binlog_cleanup();

  • Share/Bookmark

MySQL Partitioning: Automated Management

|

MySQL introduced partitioning with version 5.1, with alpha releases of 5.1.3 offering it available from December 2005. Since then, it’s  been developed further and with the relatively recent 5.2.41 release, many of the bugs and issues surrounding partitioning have been ironed out.

One issue I’ve had some users come across relates to the partition management on a live database. How big do partitions need to be? how often do we need to repartition? What do we do with the “old” data? How can we make the management of partitions easier in general?

I’ve addressed these questions with a stored procedure and by taking advantage of MySQL’s recent Event Scheduler.

My example uses a MyISAM table, partitioned on a bigint column which contains a unix timestamp and an ‘archive’ table that contains the same structure but using a different storage engine, for long-term storage and archiving.  My Sample also automates the creation of the log table, and can be run as a “deployment script” to set up a logging database ready for use.

First: create the log table:
DROP TABLE IF EXISTS table_log;
set @qry := concat(‘CREATE TABLE table_log (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
unix_time bigint(20) unsigned NOT NULL,
logging_stuff varchar(512) DEFAULT NULL,
PRIMARY KEY (id,unix_time)
) ENGINE=MyISAM
PARTITION BY RANGE ( unix_time) (
PARTITION  p0 VALUES LESS THAN (‘,unix_timestamp(now())*1000,’) ENGINE = MyISAM)’);
prepare stmnt from @qry;
execute stmnt;
Then, create the archive table
DROP TABLE IF EXISTS table_log_archive;
CREATE TABLE table_log_archive (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
unix_time bigint(20) unsigned NOT NULL,
logging_stuff varchar(512) DEFAULT NULL,
PRIMARY KEY (id,unix_time)
) ENGINE=MyISAM
And now the stored Proc:
DROP PROCEDURE IF EXISTS partition_logs;
DELIMITER $$
CREATE  PROCEDURE  partition_logs(in tblname varchar(100),in partsize int,in lwater int,in hwater int)
BEGIN
DECLARE p_start_time bigint;
DECLARE p_end_time bigint;
DECLARE p_start_name bigint;
DECLARE p_end_name bigint;
DECLARE p_low_water bigint;
DECLARE p_high_water bigint;
DECLARE p_new_name bigint;
DECLARE p_new_time bigint;
DECLARE cnt int;
DECLARE deld int;
DECLARE partition_column varchar(100);
set partsize = partsize * 3600; — turn the passed partition size in hours, into the unixtimestamp.
SELECT cast((substr(partition_name from 2)) as SIGNED ) into p_start_name FROM information_schema.PARTITIONS where table_name = tblname order by partition_ordinal_position limit 1;
SELECT cast((substr(partition_name from 2)) as SIGNED ) into p_end_name  FROM information_schema.PARTITIONS where table_name = tblname order by partition_ordinal_position desc limit 1;
SELECT partition_description  into p_start_time FROM information_schema.PARTITIONS where table_name = tblname order by partition_ordinal_position limit 1;
SELECT partition_description  into p_end_time  FROM information_schema.PARTITIONS where table_name = tblname order by partition_ordinal_position desc limit 1;
SELECT partition_expression into partition_column from information_schema.PARTITIONS where table_name = tblname limit 1;
select unix_timestamp(date_sub(date(now()), interval lwater day)) into p_low_water;
select unix_timestamp(date_add(date(now()), interval hwater day)) into p_high_water;
set @plw := p_low_water;
set @tbl := tblname;
set @col := partition_column;
set @ins := concat(‘replace into ‘,@tbl,’_archive select * from ‘,@tbl,’ where ‘,@col,’ < ‘,@plw);
prepare instmnt from @ins;
execute instmnt;
select p_start_time,p_low_water;
set deld = 0;
while (p_start_time < p_low_water)
do
select p_start_time,p_low_water;
set @pstart := p_start_name;
set @droppart := concat(‘ALTER TABLE ‘,@tbl,’ drop partition p’,@pstart);
prepare dropstate from @droppart;
execute dropstate;
SELECT partition_description  into p_start_time FROM information_schema.PARTITIONS where table_name = tblname order by partition_ordinal_position limit 1;
SELECT cast((substr(partition_name from 2)) as SIGNED ) into p_start_name FROM information_schema.PARTITIONS where table_name = tblname order by partition_ordinal_position limit 1;
set deld = deld+1;
END WHILE;
set cnt=0;
while (p_end_time < p_high_water)
do
set p_end_name = p_end_name + 1;
IF (p_end_time < p_low_water)
THEN set p_end_time = p_low_water;
ELSE set p_end_time = p_end_time + partsize;
END IF;
set @pendname := p_end_name;
set @phighwater := p_end_time;
set @alter_log :=  concat(‘ALTER TABLE ‘,@tbl,’ ADD PARTITION (PARTITION p’,@pendname,’ VALUES LESS THAN(‘,@phighwater,’))’);
prepare stmnt from @alter_log;
execute stmnt;
set cnt = cnt+1;
END WHILE;
select partsize as ’seconds’,p_low_water as ’start at’,p_high_water as ’stop at’,cnt as ‘partitions created’,deld as ‘partitions dropped’;
END$$
This procedure takes advantage of the partition information being held in the information_schema, and uses it to calculate the current, obsolete and new partitions.  it takes any “obsolete” data dumps it into the _archive table before dropping the partitions. In addition, it’s flexible enough that it can be called against any table that’s partitioned in the same way thus:
call partition_logs(‘table_name’,[hours per partition],[number of days to retain],[number of days in advance])
e.g. call partition_logs(‘table_log’,2,7,7);
…will repartition the table_log table, dumping any data older than 7 days old into the archive table and removing those partitions, and will create enough partitions for the next 7 days where each partition is 2hrs in size.
When called, the proc will also output some basic information on the specified partition size as well as how many partitions were dropped and created.  The only caveat with this procedure as it is, is that the “top” partition doesn’t use “less than max value” and has a hard limit.  This means that, if the procedure isn’t called when it should be, you may hit a ‘hard’ limit on the partitioned table where it won’t allow inserts due to the time integer being higher than the maximum partition value.
With a scheduled event calling this procedure on a regular basis, you have a completely automatic partition management system.
  • Share/Bookmark

The End of “Free” Britain?

|

This week saw, what should be, a triumph of human rights over ill-applied and overzealous policing of stop-and-search under Section 44 of the anti terrorism act. The full ruling can be read at the European Court of Human Rights website.

The gist of it is that Kevin Gillan and Pennie Quinton were stopped and searched under Section 44 of the Anti Terrorism Act 2000 for doing nothing more than going about their daily business. This law gives police powers to stop any individual within “designated areas” and search them for material related to terrorist actions without having to show “reasonable cause” to justify their actions. They believed the stopping and searching of them for no good reason was a violation of their rights, and took the case to the European Court of Human Rights.  The court ruled that a stop and search without reasonable grounds, suspicion nor  evidence was unlawful and also commented on the fact that the law itself is very wooly in how it is enforced.

The court ruling should have been a triumph of human rights and common sense over an ill thought-out piece of legislation that has been abused time and again since it was brought in, but it wasn’t.  The Government’s response to the ruling is that nothing will change and they will continue to stop and search people in spite of it being illegal.  Their reasoning? even if a particular law is “unlawful” an official acting in compliance with that law would not themselves be acting unlawfully. In other words, they are only following orders, so can’t be held responsible.

“Only following orders”?  I wonder where similar defences were last used?

  • Share/Bookmark

Broadband Tax

|

The proposed Broadband Tax has had the following consultation document released, which offers some interesting clarification of various aspects.

http://www.hm-treasury.gov.uk/d/consult_landlineduty.pdf

I’m most intersted in sections 3.3 through to 3.6

3.3 The duty is payable on local loops. These are lines that connect an end-user’s premises to the wider communications network.

3.4 The duty will be payable on all local loops that are made available for use by an owner whether or not the lines are actually used. It will also be payable on all local loops regardless of whether the loop consists of a copper pair, a co-axial cable or a fibre connection.

3.5 The duty will apply regardless of the service that is being provided over the local loop. Services may include traditional voice services or other data services namely broadband – Government does not intend to capture television services.

3.6 Where more than one local loop is provided in order for an end-user to receive two distinct services, the duty is payable on both lines. When an end-user receives services from two different providers but over the same local loop, the duty is only payable once.

Read 3.4 and 3.6, and consider that we have a BT line “made available” to us, even tho we don’t use it, and that with a Cable connection, we then have two local loop lines would suggest that we are liable to pay twice.

I think I’ll be talking to my MP again.

  • Share/Bookmark
Get Adobe Flash playerPlugin by wpburn.com wordpress themes