Sanity check module

sanity_check_1Sanity check is a new kiwitrees feature, available in version 3.1 and above. It is a response to an issue that has been described many times in PhpGedView, webtrees, and recently in kiwitrees. The title sums up the issue well, although I can’t claim credit for it. I ‘borrowed’ it from an excellent piece of online software called “Bonkers” that operates as a stand-alone sanity checker.  If you want a more in depth review of your data than kiwitrees offers, I highly recommend it. They describe the issue with an excellent quote:

Sooner or later, we all get to the point where we realize that there is a lot of data in our database that is just completely bonkers.

The intention with the kiwitrees  sanity checker is to allow you to select one or more data issues you think might exist in your tree, quickly search for all examples, then click on a link to each of the records concerned to check and adjust the data as necessary.

Sanity checker is quite different to, and does not replace, the existing tool “Check for GEDCOM errors” which is designed to check your data for strict adherence to the GEDCOM specification. Sanity checker goes to the next level, checking for data entry errors that may not technically be GEDCOM errors, but do seem to be “bonkers”.

It is important to be aware that the term “bonkers” is extremely un-scientific 🙂 Bonkers does not always mean WRONG! But at least with this tool you have an opportunity to check. Some specific examples of acceptable “bonkers data” are noted in the descriptions below.

There is an important note on the sanity checker page, in red. It says “This process can be slow. If you have a large family tree or suspect large numbers of errors you should only select a few checks each time“. Please consider this before you tick all the boxes. How many checks you can do in one go will depend on the size of your tree, the number of errors, and the amount of memory available on your server. If an error occurs a “fatal error” message will appear on the page. You can clear that by simply clicking your browser’s ‘back’ button. Then try again with fewer tools ticked. If even one tool is too much for your system either ask your webhost for more memory, or use an external tool not dependent on your server such as “Bonkers“.

sanity_check_2F or it’s initial release sanity checker has just a few tools covering some date issues, missing data, and duplicated data. More tools will be added over time, but feel free to comment here with any suggestions you have, and ideally with some sense of a priority level for your request. You will also see that all these early tools are only related to individual data, not family or any other record type. I do hope to add those later, especially date issues around family events such as marriage. But they are more complex and resource intensive, so I’m starting with the easy stuff!

Date discrepancies

1 – Birth after baptism or christening
2 – Birth after death or burial
3 – Burial before death
These are self-explanatory. The first  looks for baptism (BAPM) or christening (CHR) dates (whichever you use) and compares them with the birth (BIRT) date. It then lists any where it appears the person was baptised before they were born!

The second is similar, but looks for people who were not born until some date after their death (DEAT) or burial (BURI).

The third is comparing an event (burial) against the death date.

Missing data

1 – No gender recorded
No gender simply looks for individuals where there is no gender (SEX) recorded. It will find individuals with no SEX tag at all. It does not need to check for values other than the acceptable “M, F, or U”, nor for entries of just “1 SEX” as these are all converted to valid entries automatically on either import or edit.

Duplicated data

These tools looks for two (or more) similar records within a single individuals data record, such as two births (BIRT), two deaths (DEAT) or two genders (SEX).  It does not consider the content of those records, just their existence. While such duplication might appear to be “bonkers”, you should not assume that is always the case. The GEDCOM specification allows for recording multiple occurrences of the same event in situations where for example a researcher discovers conflicting evidence about a person’s birth event. The specification indicates that in such cases you should record each event in order of preference. Kiwitrees acknowledges this and always uses the first event as the “preferred” one.

Even multiple genders could be a deliberate record of evidence found or perhaps gender change during life. In this case the same “preference” rule applies. The first one found in the raw GEDCOM data is the one used to determine the display elements such as silhouette image, background colours etc.

The check for a duplicated name is a very specific case. It only finds duplicates where the individuals FULL name is IDENTICAL and entered twice.

1 – Birth 
2 – Death 
3 – Gender 
4 – Name 


6 Comments
pab

pab » 22 Jun 2015 »

There is already some degree of sanity checking since a triangular icon with an exclamation mark appears when there is an impossible situation such as birth after mother’s death and census entry before birth. These are usually due to the UK GRO Quarter Registrations where I quote a birth as (eg) June Quarter (MAR/APR/JUN) and the baby appears in the census (usually APR) or where a mother dies in childbirth and the baby is registered later than mother’s death. A list of these would help my sanity!

    kiwi

    kiwi » 22 Jun 2015 »

    Yes, this tool is definitely in support of the existing check icon displays. The only issue with those is that you have to be viewing the individual to know they are there!

    So, if I understand you correctly, and broadly speaking, you are requesting tools to check for “Birth after death of Mother” and “birth after census record? They are both certainly possible. Would the first have broader use though as “born after parent’s death” rather than just mother?

    It is also important with such checks to understand how kiwitrees recognises date ranges (rather than specific dates like 01 JAN 1901), especially as everyone can have different ways to enter them.

    For example, you say “I quote a birth as [eg] June Quarter “. I assume that is followed by a year. Do you mean that literally is how you enter it? I use kiwitrees’ in-built shorthand date conversion and type “q1 1956″ for that sort of entry. Kiwitrees automatically re-writes that shorthand entry and saves it as “BET MAR 1956 AND APR 1956″, and displays: between January 1956 and March 1956

    With any date range (BET AND / AFT / BEF / FROM TO etc and even just a year, or month / year entry, kiwitrees has to take a position on what exact date within that period to use in any computation; whether to use the start, mid or end point of the range. Do you calculate date differences against “1923” as being from Jan 01, Dec 31, or June 15?

pab

pab » 22 Jun 2015 »

Hmm … birth after death of mother (… technically this is possible nowadays!) … birth after death of father is possible, eg: the soldier who sired his child then went to war and was blown to bits before it was born …

As for the ‘q1 1956’ syntax … I have a 10 year old GEDCOM I am updating to kiwitrees standards and am seeking tools to help identify the ‘q1 1956’ issues … a kind of circular argument.

I am really positive about this tool especially if it can find all the ‘hidden’ icon warnings, and highlight possible yet not definite situations. What about typos such as 1956 instead of 1856, even 185 instead of 1856?

    kiwi

    kiwi » 22 Jun 2015 »

    Hmm … birth after death of mother (… technically this is possible nowadays!) … birth after death of father is possible, eg: the soldier who sired his child then went to war and was blown to bits before it was born …

    Yes, all such checks must always be regarded as warnings rather than errors.

    What about typos such as 1956 instead of 1856, even 185 instead of 1856?

    In general I would expect most of those to be highlighted by checks like the existing “birth before baptism” and “birth after death” type of check. We need more, but with the right ones such date checks should be possible. There are also other ways. The “bonkers” tool referred to in the FAQ here uses a term “flourishing age” which means “the span of time when we expect to find non-vital records associated with a living person, such as graduation, military, etc. “. Once defined it can be used to determine events occurring outside that range.

    Its worth reading the documentation for that tool, as I hope to utilise many of it’s checking concepts, subject to the limitations many commercial web hosts enforce on cheaper hosting packages.

jacoline » 6 Jul 2015 »

I know I have a mix of using CHR and BAPM. I would prefer to fix the CHR to BAPM. And all I can think of is downloading the gedcom file and search for CHR and check the ID and search for the ID in my system and then edit.
Could this contain this fix?

kiwi

kiwi » 7 Jul 2015 »

jacoline, no sanity check is just a check. It does not fix anything. It tells you where problems might be.

But fixing a mixture of BAPM / CHR is already very easy. Just use Administration > Tools > Batch Update > Search & Replace. Change from “1 CHR” to “1 BAPM”

(Usual warning: backup your GEDCOM file first, test a couple of single changes before you “Update ALL”, make sure your user account is set to “automatically accept changes).

Have your say!

Have your say!