Friday, 14 December 2012

TeamMentor global search, duplicate articles and a new ‘Any’ Library

In the 3.2 version of TeamMentor image
there is a Library per technology (.Net, Android, C++, iOS, Java, PHP) and type (CWE, PCI DSS Compliance):

image

These are all separate Libraries, each with its own folder and fully independent from each other:

image .

For example you can just access the Java Library by opening the https://teammentor.net/Library/Java url:

 image

Note: You can also just install this library on new TM server by copying the Java folder into its TM_Library folder (or drop its zip into the Install/Upload libraries admin interface)

There are a number of practical advantages in having libraries physically separated by folders (you can think about it as separate databases).

For example when we did the translation of the CWE Library into Japanese, all we had to do was to send the CWE.zip folder to the translators. When we received the translations, all we had to do to see the TM content in Japanese, was to fire up a new instance of TM and install/copy/upload the CWE_Japanese.zip file.

And now that we have moved this ('one library per technology' structure) to our GitHub TMContent organisation (with a separate repository per technology), we are able to open issues/bugs per technology/library, which gives us a much greater level of quality control on the content published.

Before the 3.x release there was one big Library (with all articles inside it)

This was a result of the original import/move from the SQL Server database (which had all articles in there). Not only this had a number of performance hits on first load (for example by creating a situation where the entire 4000+ articles would be shown in the main Table List (which would blew in IE)), it strongly tied all libraries together, making it impossible to create individual libraries.

Now, one good thing that existed in the ‘one library with everything’ mode is that generic articles that were used/linked in different technologies (namely the ones tagged with ‘ Any’) where all unique (i.e. there was only one copy of them in the entire TM Website).

But when we moved into the (current) ‘one library per technology’ mode we had to copy those ‘ Any’ articles in order to create ‘stand-alone’ libraries (i.e. libraries that are all seft-contained). And since Articles GUIDs must be unique per server (across all libraries), this means each one those ' Any’ articles must have a unique GUID (note: the unique GUID requirement is enforced by TM engine on data load).

Usually this is not a problem, since when the user browses/searches/filters the TM library using the GUI, he/she doesn’t see any duplicates.

Unless, the special (not really supported/documented) ‘all:{search term}’ mode is used (which does a search on all libraries):
image

As you can in the example shown above there are 5 different links to articles called  A Data Flow Analysis Is Performed all with the same Technology, Phase, Type and Category metadata (which is not an ideal/good situation)

But, when you move beyond the ‘ Any’ articles:

image

You will see that you actually have 'per technology articles', with the same title, but with different content (which is what we want, since the more specific the content is to a user, the better)

For example:

On All Database Input Is Validated (ASP.NET 3.5 and Java) articles:

image image

On All Input Is Validated (Android, C++ and iOS) articles:

image image image

On How to Test for Client-side Validation Bypass Bugs in ASP.NET and Java articles (note how these articles are very different, with technologic specific code samples)

image image

So how can we fix the ‘duplicate article’ issue?

One way is to have one unique ‘ Any’ article per server and then link/map to that article (via it’s GUID) from a particular Folder or View.

This can already be done done (not via the GUI) since the Folders and Views mappings are just a list of the GUIDS that exist in the Library’s XML file:

image

My preferred solution is to move all generic/’ Any’ articles into it’s own library, which actually makes sense (since they are by-design generic and global across multiple libraries).

This would give us a separate library will all ‘global’ content and avoid duplication of articles :)