Monday 29 October 2012

Handling code samples on translation of technical content


Here an interesting problem, how to deal with code-samples when doing translations?

We are in the process of translating TeamMentor into a couple languages and the issue come up on how to identify the Code Samples for the translators (which at the moment should all be wrapped using <pre> tags (since that is how the Javascript-based-code-formatting works)).

Here is my first raw idea:
  1. create a fork (of the target library) for a particular translation
  2. run a script on all articles that remotes ALL code samples and replaces it with a unique ID (the extracted code snipptets would be stored separately)
  3. the translators work on the articles (which now have no code samples)
  4. the translators commit their change into their fork (we can help them if they want)
  5. we re-apply the code samples
What I like about this idea is: 

1) we remove 'areas for translators to make mistakes' (i.e. there are no code samples in their content)
2) we are able to visualize all code samples in one place
3) we can make a pontual decision on the cases where it makes sense to do some translation on the code samples (cases where there is a lot important code comments)

Since the translation house(s) have its own tools (and parsers), the final solution is probably going to be a mix of this idea and what they already do with their technology (no need to solve problems that already have solutions :)  )

Any good solutions out there that work well for Html content with code samples?