Trouble Shooting

What status codes does onstat return?

You can check the current mode of IDS by checking the status ($?) after onstat -.

The values returned by 'onstat -' are:

-1/255 Offline 
 0     Initialisation 
 1     Quiescent 
 2     Recovery 
 3     Backup 
 4     Shutdown 
 5     Online 
 6     Abort

How can we see what's happening during a checkpoint?

Versions 9.40 and later support the environment variable TRACECKPT

Set this to any value but 0 and restart the instance, and for each checkpoint you'll see something like the following in the message log -

14:50:20 652 buffers dirty

14:50:20 oldest lsn loguniq 2304, logpos 0x38d018

14:50:20 653 dirty pages are to be flushed

14:50:20 dskflush() took 0 seconds

14:50:20 wait4critex() took 0 seconds

14:50:20 0 buffers dirty 14:50:20 oldest lsn loguniq 2304, logpos 0x38d018

14:50:20 safe_dskflush() took 0 seconds

14:50:20 Checkpoint Completed: duration was 0 seconds.

14:50:20 Checkpoint loguniq 2304, logpos 0x38e018, timestamp: 0x3fa8072c

14:50:20 Maximum server connections 2

14:50:20 Buffer manager: starting coarse downgrades.

14:50:20 Buffer manager: finished coarse downgrades.

Starting threshold adjustments.

14:50:20 Buffer manager: finished threshold adjustments.

How can I drop a database in a down dbspace?

From 9.21 on there is an environment variable called FORCE_DB_DROP which would allow you to drop a database in a down or dropped dbspace. But unfortunately this in not implemented in 7.x

ODBC Errors

Since upgrading I'm now seeing

[Informix][Informix ODBC Driver] Database locale information mismatch

Why?

Why does dbaccess auto-commit on interrupt

When interrupt is hit during a dbaccess command line session, the SQL will be auto-committed.

Warning:Data Commit is a result of an unhandled exception in TXN PROC/FUNC/TRI

To turn this behaviour off use

export DBACCNOIGN=1

Locked system-catalogs - Error 211

When stored procedures are executed Informix checks to ensure that the tables, index's etc used by the procedure have not changed. If this is not the case, then the query plan for the procedure is regenerated. This in itself, is not a problem, however, if the procedure is within a transaction an exclusive lock is held within the system catalogs. While this lock is held larger areas of the database can be effectively 'locked' off from users.

AFDEBUG

If AFDEBUG is set when the engine is started then the engine will hang instead of crashing when the there is an Assert Failure etc.

By setting the environmental variable AFDEBUG to 1, and re-starting the engine, Online will hang in a debug state rather than crash when an assertion failure error is detected. Various diagnostics may then be carried out.

Once the online engine suspends, it is possible to run several onstat commands against the instance. These are all helpful in reviewing the shared memory at that instance in time.

In order to use a debugger against the process id, you can attach manually using the following command:

sdb $INFORMIXDIR/bin/oninit PID Example;

The stack trace will tell which function the oninit is failing on.

To bring the engine down when AFDEBUG is set, use onmode -kuy. If that doesn't bring the engine down, issue a kill -9 on the master oninit process and run ipcs to ensure shared memory is erased.

Cant bind to the network

This deserves it's own page so look here

Can not attach to shared memory

If an instance crashes, the UNIX memory management subsystem might fail to release the memory and semaphores previously assigned to Informix. If this is the case then these resources must be released manually. Root access is required.

command: ipcs and ipcrm

IPC status from /dev/mem as of Sun 25 Jan 13:58:56 1970 
T ID KEY MODE OWNER GROUP 
Message Queues:
q     0  0x4107001c -Rrw-rw---- root printq
Shared Memory: 
m     0  0x0d050268 --rw------- root system
m     1  0x0d06e1bc --rw-rw-rw- root system
m     2  0x52604801 --rw-rw---- root informix
m     3  0x52604802 --rw-rw---- root informix
m     4  0x52604803 --rw-rw-rw- root informix
m 40965  0x527e4801 --rw-rw---- root informix
m 40966  0x527e4802 --rw-rw---- root informix
m     8  0x52604804 --rw-rw---- root informix
m  4105  0x52604805 --rw-rw---- root informix
m  4106  0x527e4803 --rw-rw-rw- root informix 
Semaphores:
s  4096  0x4d0a0057 --ra-ra---- root system
s     1  0x6206e124 --ra-r--r-- root system
s     2  0x0106e0a3 --ra------- root system
s     3    00000000 --ra------- root informix
s     4    00000000 --ra-ra-ra- root informix
s 36869    00000000 --ra------- root informix
s 36870    00000000 --ra-ra-ra- root informix
s 36871    00000000 --ra-ra-ra- root informix

If the server is running multiple instances then the correct memory ids need to be identified. Note this will show the shared memory and semaphores in use, these are not the ones to be removed. The ones to be removed are those that can not be traced back to a particular instance.

ipcrm -m <id> removes the memory segment 
ipcrm -s <id> removes the memory segment

But the chunk permissions are correct

When adding a new chunk the error The chunk <filename> must have owner-ID and group-ID set to informix is displayed but when a listing is done the following is reported

(-rw-rw-rw)   informix informix <filename>

The most likely cause is the permissions on the oninit program, they should be

6754(-rwsr-sr--) owner: root, group: informix

Locking Problems

Another section that needs it's own area

PC Connectivity Issues

Another section that needs it's own area

208 errors

The instance can experience memory problems from a number of sources. onstat -m should contain further information, namely an operating system error i.e. an ISAM error number between -1 and -100.

OS error -12 Not enough core.

Either there is no more physical memory available or the maximum shared memory limit has been reached. The maximum memory that a Informix process can use is set by SHMTOTAL but the kernel might be restricting the maximum as well.

OS error -24 Too many open files.

There are a maximum number of open memory segments that UNIX can support. This limit has been reached. To overcome this increase the maximum memory areas within the kernel or increase the SHMADD parameter so memory is allocated in larger blocks. Under AIX there is (was?) a hardlimit of 10 for the number of segments per user process.

HP-UX can show performance problems if more than one segment is allocated.

Odd 208 errors

If you are seeing 208 errors on simple tasks, such as index creation, try turning off PDQPRIORITY

Spurious Temporary Table errors

310 errors

After dropping a temporary table the table is recreated and the engine reports the table already exists. If the control code for the temporary table spans a transaction boundary it is possible that the temporary table is not dropped correctly. The current database connection needs to be closed to clear the problem.

Sysprocplan locking errors

Some front-end development tools such as SQLWindows automatically perform a BEGIN WORK whether it is required or not. This means that if a procedure re-optimises at execution time the sysprocplan table is updated. The client application isn't aware that any update has occurred, and if the query doesn't perform any updating the application probably won't perform a COMMIT or ROLLBACK. This leaves locks on the sysprocplan until the next time the application does a COMMIT or ROLLBACK (possibly never for a read only application).

The solution involves insuring that the procedures do not re-optimise at execution time.

Ensure that all procedure statistics are updated immediately after the table statistics. This forces a re-optimisation.

Another cause is dynamically creating procedures where the creation is unintentionally within a transaction.

If procedures are being created from sql files then update statistics code can be embedded in the file e.g.

create procedure sp_......
end procedure;
update statistics for procedure sp...

Log change fails action pending

If logging is to be switched on for a particular database a level-0 archive is required. However, trying to access the database after completing the archive fails as the engine thinks an action is still pending. This situation occurs when the archive tape device is not set to /dev/null [a link file to /dev/null is not good enough]. Changing the tape device to /dev/null and redoing the archive will clear the problem.

Incorrect(?) Error Messages

Error numbers below 100, typically, are system level error messages. Informix's interpretation of these numbers has to cater for all Unix implementations and as such the message Informix presents might or might not be relevant to your system.

To complicate matters further, if the client code is connecting to two different servers then the application can fail with a meaningful message from one server and a meaningless message from the other. For example, a SCO client tries to connect to an IBM RS6000, then the RS6000 refuses the connection and sends back system error 79 (which means connection refused on the IBM-RS). The SCO machine reports No record Locks because error 79 means that on SCO server.

The 'real' error messages for this class of error's can be seen in /usr/include/sys/errno.h

Error -349

After installing a new instance everything appears to be OK, but when trying to use the instance there are -349 errors. The mostly likely cause is the system database (sysmaster, etc.) failed to build either due an error in the buildsmi script (unlikely) or the instance was stopped before the buildmsi script completed (likely).

If it's the later then the easiest solution is to just re-initialise, however the buildsmi script can be run by hand. If the user interrupted the database build, then the next time the server goes to On-line mode the build will be re-attempted. So you don't always have to re-initialise the server.

If the buildsmi script is failing during the initialisation process then it'll have to be debugged or else just call Informix Technical Support.

What does 'Changing data structure forced command termination' mean

Informix, like many programs, makes extensive use of structures and these under normally circumstances are probably changing. Because onstat, for performance reasons, doesn't bother to flag it's interest in these structures onstat can find itself looping or in a dead-end.

In these circumstances it aborts with the message Changing data structure forced command termination. Running the same onstat command again nearly always completes without a problem.

Also you can get this message in certain versions when your client hostname is longer than 8 characters. Then re-running the onstat will not help.

Ontape

bash-4.2$ onstat -

IBM Informix Dynamic Server Version 11.70.FC5GE -- On-Line -- Up 1 days 21:21:10 -- 4404820 Kbytes

gimli$ ontape -a
Shared memory not initialized for INFORMIXSERVER 'gimli_net'.

Program over.
gimli$ export INFORMIXSERVER=oninit_shm
gimli$ onstat -

IBM Informix Dynamic Server Version 11.70.FC5GE -- On-Line -- Up 1 days 21:21:37 -- 4404820 Kbytes

gimli$ ontape -a
Shared memory not initialized for INFORMIXSERVER 'oninit_shm'.

Program over.

But you can still connect and access the databases via dbaccess on both shm and tcp connections

Check the kernel memory setting and make sure they are at least what the release notes say. Also check onstat -g seg, this server reported 40 segments