Informatica Cloud development best practices
Informatica Cloud is a powerful environment but a pretty unforgiving one. There are performance traps for the unwary and you'll need to manage development carefully if you don't want your environment to descend into chaos.
One difficulty is that it does not offer any tools for managing the software development lifecycle. There's no source code control, object level permissions or means of organising objects beyond a flat list. This means it's up to you to assert the kind of discipline needed to keep your environments in check.
Here are some best practices that I have picked up from implementing the platform to manage external data integrations.
Use separate organisations for development, QA and production
Running different organisations is the best way to separate development code from whatever is running in production. You can apply a different security policy to objects in production to lock them down. More importantly, you can implement a formal test and release management process where code is promoted through separate environments using Informatica's object export functionality. This requires some discipline to keep in place but the level of confidence and control it gives you is worth it.
Assign least privilege to the secure agents
Don't default into assigning the secure agent full user or admin rights. For least privilege on Windows, the secure agent should run under the Network Service account - assuming it will need to consume any network resources . You will also need to grant "Modify" file permissions to the following locations:
- The secure agent installation folder, i.e. C:\Program Files\Informatica Cloud Secure Agent.
- Any folders that hold configuration files or data that will be used by tasks and mappings.
Adopt some naming conventions
This is a shared development environment so you need to assert some consistency over naming. This is particularly important given that views are the only means you have of organising objects. Whatever system you adopt object names should clearly identify the project or application they are developed for, the type of connection and the purpose.
There is no shortage of suggestions for naming conventions that are use around in the PowerCenter world – you can find them in the Velocity site, though they may not be appropriate for Informatica Cloud. You have limited screen real estate so can't make the names too long and should avoid too much capitalisation. The way the application is presented and visual cues provided in the mapper means that object type prefixes also feel unnecessary.
Describe all the things
Every object has a description field though these are optional. You really should use them as it is your best means of conveying what they are for. There is nothing worse than having to hack through a list of cryptically named tasks with no descriptions. Attempting to configure task parameters with no descriptions is equally thankless.
Plan memory size
If you run large integration jobs on a default installation of Informatica Cloud then you may find they start failing with the following error:
[ERROR] java.lang.OutOfMemoryError: Java heap space
The secure agent runs in a Java virtual machine which means that you assign a chunk of memory to it. For some connectors this pretty much dictates the capacity of your integration jobs as they load the entire data set into the heap.
This can be increased by tweaking the start-up parameters for the virtual machine. The default installed size is 256MB which doesn't get you very far – some customers are recommending that you expand this to as much as 8GB. This implies that you can't just throw your secure agents anywhere – you need to plan the requirements and make sure they have enough resources allocated to them.
Use source code control for the configuration files
Many connectors rely on configuration files, particularly those associated with web services and XML\JSON files. You can find that there will be a proliferation of configuration files held in a directory structure that has to be replicated across every secure agent host. You'll also need some means of backing them up.
All in all, some kind of source control and deployment mechanism (e.g. GitHub and Jenkins) can make configuration much easier to manage.
Use a separate organisation as a back up for production
In the absence of any built-in source code control in Informatica Cloud you can create a separate organistion that mirrors whatever runs in production. Exporting the latest version of an integration to your backup organisation can become part of your release process as you promote integrations from test to production.
Keep objects small and specialised
This is particularly the case for mappings, as the open-ended workflow designer can make it tempting to load up functionality. There are trade-offs with performance to consider of course, but you'll find tasks and mappings easier to maintain if they have a clear, concise function.
It's also much easier to diagnose a failed task flow if it is composed of smaller, specialised mappings. If your tasks are quite specialised then you'll know exactly what aspect of the flow failed.
Cache lookups, particularly for Salesforce
If you use a lookup in a task or mapping then the underlying data source for the lookup is read on every single record that is processed. This can be pretty bad from a performance perspective and ruinous if you are using an external metered API such as Salesforce. It's normally best to cache the lookup data in a local flat file and use that instead.
Integration jobs are waterfall projects
Integration work tends not to benefit from agile or lean approaches, where iterative development cycles are used to “discover” the right solution. Informatica Cloud is not really designed to allow you to experiment freely and create mapping tasks on the fly.
A waterfall style of development is recommended where all integrations should go through a specific workflow with clear handover points between design, development, test and release.