Friday, March 18, 2011

Server Clusters

One of my projects requires a database to be shared over multiple servers with failover properties, so I began investigating clustering of SQL databases. Microsoft SQL does it of course, but for our little project it would’ve been a $6000 license cost. Sod that. So I thought I’d try out the ever reliable, open-source/free databases.

Disclaimer: these are my own discoveries. For all I know, I went about this the complete wrong way. I’m just dumping it all here for future reference or in case anyone else had similar issues. I found during the course of this experiment that a lot of this stuff wasn’t very well documented and in hard to find places. So this is just me putting it all together. My implementations are tailored for my setup only, and I know that it is outside of the usage specs of some of these cluster definitions.

MySQL:

MySQL has a clustering version (I used ver. 7). It uses a system where there are multiple data nodes, a single management server and single or multiple SQL servers. At a minimum, they recommend 4 servers: 1 management, 2 data, 1 server. My setup was only ever for 2 servers, so I split it evenly down the middle: Each server had a manager, a data node and a server so that they could run independently on failover.

I set it up using a hybrid of these two amazing tutorials:

http://downloads.mysql.com/tutorials/cluster/GetMySQLClusterRunning-Windows.pdf

http://planet.mysql.com/entry?id=20198

In getting things to work like the Linux example, I had to change one thing. Instead of having ndbcluster=true in the .conf file, I changed it to just ndbcluster (it being there meant it was true). My two config files were as follows:

Config.ini:

[ndbd default]
noofreplicas=2

[ndbd]
hostname=host1
id=1

[ndbd]
hostname=host2
id=2

[ndb_mgmd]
id = 101
hostname=host1

[ndb_mgmd]
id = 102
hostname=host2

[mysqld]
id=51
hostname=host1

[mysqld]
id=52
hostname=host2

my.cnf:

[mysqld]
ndb-nodeid=52
ndbcluster
datadir=c:\\mysql\my_cluster\mysqld_data
basedir=c:\\mysql\mysqlc
port=5001
server-id=52
log-bin=host2-bin
This worked amazingly when everything was running in console mode. Occasionally there was an error (Error 2003: Can’t connect to MySQL server on ‘) when accessing the SQL database from another computer, but the following fixed that by setting up permissions for the computer logging in. For production databases though, you should definitely use proper usernames/passwords/permissions.

1. Run mysql –u root –P
2.
Run GRANT ALL ON *.* To ‘’@’’;

Excellent.

To get it working properly though, I needed it to run on my two servers as Windows Services.

To do that you simply run each application (ndbd, ndb_mgm, mysqld) with the –install flag. Passing in the command lines is a bit harder, as it has to be moved to a file called my.ini located in the base directory of MySQL or in the Windows folder. Each application is labeled under its executable header (aka, ndbd would be [ndbd]) and the full variable names are listed underneath (-- not -). One server (host2) that I used is as follows:

My.ini:

[ndb_mgmd]
config-file=c:/MySQL/my_cluster/conf/config.ini
configdir=c:/MySQL/my_cluster/conf

[nbdb]
ndb-connectstring=host2:1186

[mysqld]
ndb-nodeid=52
ndbcluster
datadir=c:\\mysql\my_cluster\mysqld_data
basedir=c:\\mysql\mysqlc
port=5001
server-id=52
ndb-connectstring=localhost:1186
log-bin=host2-bin
ndb-extra-logging

The management and the data nodes ran perfectly as servers. However, I hit a snag with the database. The service would start, stop and display the following message:

Could not start the MySQL service on local computer error 1067: the process terminated unexpectedly.

Well, I tried everything. When I ran the service in console mode (from the command line with –console) it ran perfectly. I could not find a solution to save my skin. Much Googling has shown that this is a common problem with MySQL running on Windows. And there’s not really a fix for it. I tried it on XP SP2, XP SP3, Server 2008 and Server 2008 R2 with no joy. It’s really a bugger, as in console mode it worked perfectly for me. I’m sure in Linux/Unix/whatever it’ll be a great solution if you need a simple server.

Just a note, when creating a table in MySQL that is clustered, don’t forget to set engine=ndb or engine=ndbcluster in the CREATE statement.

So then I moved on to PostgreSQL.

PostgreSQL:

I installed the latest version on one PC and set up a dummy database. I then set up a version on the other PC.

To get replication I followed the steps from :

http://www.postgresql.org/docs/9.0/interactive/warm-standby.html#STREAMING-REPLICATION

but with a few differences I stole from:

http://brandonkonkle.com/blog/2010/oct/20/postgres-9-streaming-replication-and-django-balanc/

Instead of making a “backup” using pg_dump or PGAdminIII I copied across the entire Postgres SQL folder (in my case, the 9.0).

Side note: when using backup in PGAdminIII for a custom database, it can give you a “database not found” error. (pg_dump: [archiver (db)] connection to database ""ClusterDB"" failed: FATAL: database ""ClusterDB"" does not exist) To get around this, copy the command line text from PGAdminIII message output to your own command line, and edit it so the last command, the name of the database, is surrounded by quotations only, not /”. For some reason, it adds the string literal escape character for “.

I tried starting the service through the windows service panel at this point, but it kept locking on start-up. To kill it, look for the applications pg_ctl.exe and postgres.exe in your Windows Task Manager. Killing these will stop the service from starting, especially if it loops (which it does when incorrectly starting). If your server still won’t start, don’t forget to check both the Windows event logs (application) and the log file in the data/pg_log folder.

So after copying across the file structure (make sure you backup the old one first, you’ll need a few things from it) I had to set the folder security attributes to allow the same user (in my case it was ./postgres) to make changes to the folder (right click the folder>properties>security>edit). This allows the server to change any postmaster lock files. You may still get some errors about postmaster.pid being unable to be created, so it’s best just to delete the postmaster.pid file from the data folder.

I found I kept getting the error “FATAL: no pg_hba.conf entry for host "::1", user "postgres", database "postgres", SSL off”. To combat this, I just went back to my backup of the original installed database and copied across the three .conf files again (pg_hba.conf, pg_ident.conf, postgresql.conf).

At this point, my server was running. After a few hours and a loss of hair.

Getting it to run properly, well, it was a bit of a bitch. I kept getting:

LOG: streaming replication successfully connected to primary
FATAL: the database system is starting up

To combat this, I put both servers in Hot_standby (instead of archive) mode and redid the entire process again.

So it works. However, Postgres is not a proper cluster, it’s more of a replication. Any update to the master flowed through to the slave server flawlessly. It did use quite a bit of network space though with its log files. On the event of a failure you’d just automatically connect to the standby server, but coming back from it would require manual work and database copying of any new updates. A lot of effort, when all I want is a cluster!

Microsoft SQL Server it is then.

...or is it?

It turns out Microsoft SQL Server 2008 clusters requires the underlying servers to be part of a Microsoft Failover Cluster. Microsoft Failover Clusters can be configured in Windows Server 2008 Enterprise and Datacenter editions. Unfortunately, all of my servers are running Standard.

So the original idea is out of the window, unless I also pay to upgrade my Windows Servers. Back to square 1 with too much money. I had a sleep on it and came back the next day with my solution (as of now).

I liked the MySQL cluster database. It’s the one I wanted to use. It just wouldn’t run as a service! So, I made it into a service. Simple. I fired up C# and made a simple service that checks the process list for MySQL (it’s process name is “mysqld”). If it’s running, it goes to sleep. If it’s not running, it launches the program as a background process in Windows.

Here’s the core of the service, which I make run in its own thread:

while (_running) // Run until told to stop
{
// get the list of all currently running processes
Process[] proList = Process.GetProcesses();
// see if our application is in the currently running process list
Process query = (from clsProcess in proList
where clsProcess.ProcessName == "mysqld"
select clsProcess).FirstOrDefault();
if (query != null)
{
// it's running, tell the user
Console.WriteLine("MySQL is running.");
}
else
{
// it's not running
Process pro = new Process();
Console.WriteLine("MySQL not running. Attempting to start.");
// point the process starter to where the file lives
pro.StartInfo.FileName = "c:\\mysql\\mysqlc\\bin\\mysqld.exe";
// add the arguments
pro.StartInfo.Arguments = "--console";
// hide the console window of the application
pro.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
pro.StartInfo.UseShellExecute = false;
pro.StartInfo.RedirectStandardOutput = true;
pro.StartInfo.RedirectStandardError = true;
pro.OutputDataReceived += new DataReceivedEventHandler(pro_OutputDataReceived);
// start the process
try
{
pro.Start();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
// Take a nap.
Thread.Sleep(20000);
}

private void pro_OutputDataReceived(object sender, DataReceivedEventArgs e)
{
Console.WriteLine(e.Data);
}

It operates as such:

1. Queries the process list for MySQL (which is known in the process world as “mysqld”)

2. If not found in the list it launches the process

a. You need the location of the executable

b. I passed it the argument “–console” to force it to run. In trial and error I found that if you start mysqld with no arguments, it liked to time out rather quickly.

c. I then set the WindowStyle to hidden so a console window isn’t displayed on the server

d. It’s optional, but I use the OutputDataReceived even to catch all of the output in order to log it.

3. Sleeps.

Pretty simple application but it does the job. When running, the database appears in the task manager (“mysqld”) and it works perfectly. If I manually kill the database, it restarts within 20 seconds.

To make it act like a proper service though, I needed to kill the database if the underlying service was shutdown. So in the OnStop() function of the service I called the following function after killing the main loop thread:

public void Shutdown()
{
if (_running)
{
Process[] proList = Process.GetProcesses();
// it's one we've launched that is hidden
Process query = (from clsProcess in proList
where clsProcess.ProcessName == "mysqld"
select clsProcess).FirstOrDefault();
if (query != null)
{
// it's running, kill it
Console.WriteLine("mysqld is running in hidden mode. Killing.");
query.Kill();
}
}
}

It just loops through the running process list and if MySQL is found, it kills it.

With this service running MySQL in a sneaky way, everything works. Despite the service launching a non-service program, it still manages to work on start up and with no users logged in. I have found that the console output from MySQL won’t log properly when run as a service though. I’ll work through that later, but for now I’m happy that my clustered database is working!

No comments:

Post a Comment