Ingo Karkat - curriculum vitae - Job description HP Software BSM

Job description HP Software BSM

business

HP Software provides a vast portfolio of solutions for professional IT, starting with capacity and project planning, on through definition and management of test cases, the actual roll-out, until ongoing operations and management of the IT environment. R&D is comprised of more than 3000 software developers located in 20 sites all over the world, with Böblingen being the 5th largest [in 2012].

Business Service Management (BSM) combines the application-centric end-user view with the service-oriented ITIL processes of the user support helpdesk and the detailed, technical monitoring in the Operations Management of the back-end.
BSM is a product line including more than 20 products (mostly legacy HP OpenView as well as products from Mercury Interactive and other acquisitions) organized in 3 main centers: Business Availability Center, Operations Center and Network Management Center, with about US$ 350M total license revenue in 2008.
The proven and leading Operations Manager (OM) (ex OpenView Operations, ex VantagePoint Operations, ex IT/Operations, ex OperationsCenter) software was being integrated into the HP Business Availability Center application that was acquired through Mercury Interactive in 2006 in order to provide both a business (top-down) and operational (bottom-up) view of the IT environment in one application.

The Operations Manager i (OMi) product is offering a new Operations user interface which is an integral part of the BSM console. In its first release, this new product still depended upon an Operations Manager installation for the deployment of the Agent infrastructure and management policies; alerts, metrics and status messages were forwarded into a new, web-based Event and Health view. The major theme of the second release was added integration capabilities, so that 3rd-party domain managers or other legacy sources could feed management data into the Event subsystem, profiting from the advanced event correlation capabilities that promise increased efficiency of the IT staff. The third major release added deployment and policy management capabilities to the product to make it a replacement for its mature predecessor, finally achieving the long-desired technology refresh of the OM product. (A previous attempt based on a home-grown Java + native component architecture had made it up to beta status, and was canceled literally at the last minute, citing problems selling it into the main development platform, Linux. After a short one-release intermezzo, the same developers were recruited for the OMi development.)

job title

HP Software BSM Software Development Engineer
Jul 2009 - Feb 2016
Built Integration Adapter from the ground up, maintained Content Manager, extended various OMi components.

achievements

Just as I was taking my parental leave, my team ramped up its engagement for the first release of Operations Manager i, working as technical consultants and facilitators between the local developers and the offshore content development teams. After a year, I returned just in time to see that effort being shut down (a team dedicated to catalyzing development and shielding the dev teams from each other was not deemed necessary any more), and our team in a quest for a new role. The two on-site development teams (with about 30 developers) had struggled with keeping the schedule (the original release had been delayed for many months, partially due to the tight coupling of three major development sites scattered around the globe, but also caused by a crooked combination of attempted Agile development with traditional, bureaucratic management practices, and the inevitable cultural differences), and three months into the new project were again running late. Thus, after some state of limbo, the implementation of the Southbound Interface was moved to our team, handing us a crucial area of functionality that was new and encapsulated enough to allow us a quick start, without having to delve too deep into the existing source code and stepping onto the other teams' areas.

Integration Adapter

my desk Feb-2011
triple-screen, clean and orderly

Our team started development with a re-evaluation of the use cases and existing prototype in August, and then came up with a design that fit the preexisting opinions fairly well. Due to the delay caused by the ownership change of that feature, it was initially planned that the Integration Adapter would release late next summer, a couple of months after the BSM 9.0 MR in July; the reasoning being that customers would at first be occupied with migration of the platform, anyway, and would look into the new integration capabilities only after that. Except that… a few months into the implementation, someone in management recognized that the entire 9.0 release was marketed for its integration capabilities, and that meant that we had to accommodate the early MR date and switch gears in the middle of the project. The re-scheduling left us with only two more months until the functional complete milestone, and we had to drastically cut features. Fortunately, or own development proceeded smoothly, and we were able to make the date, whereas the rest of the BSM platform stubbornly ignored the ominous signs of high defect counts and somber feedback, and finally had to go through a debilitating round of last-minute feature cuts, leaving it running only on Windows, and without support for upgrades from previous versions. One could say that in a face-saving move, they got something out on the promised (and publicized) date, but far from what was initially envisioned and planned. Having learned how to read between the lines of management communications, phrases like The theme for the Service Packs is 'Make the product ready not only for proof-of-concepts' told a clear story there.

For the next minor version refresh of the Integration Adapter, I extended both Java backend and Flex frontend to implement a policy editing lock, so that multiple users could work with our user interface in parallel. This work had to be integrated with the other parallel development streams that went on in back- and frontend. I also participated in the design of another crucial new feature, the backsync. But I was mostly occupied in the project with the packaging and installer area, where we had to incorporate the requirement that the underlying Agent infrastructure component be transparently included in our product installer.

This area is traditionally shunned by developers, because of the complexity (abstractions around platform-specific install technology to support installation on Windows, Linux and Unixes from a single source definition), and the resulting huge stack of often old and home-grown technologies (Imake-based custom build scripts, an install technology developed in-house). But, because of my long history with the company, this was a good fit for me, and a challenge that I took up with vigor. My company connections and experience allowed me to untangle the ownerships (most of the shared components had been moved to low-cost geographies, sometimes going through many hands), and design a way to incorporate this difficult requirement into the existing architecture. During the course of this, I filed dozens of defects, surfacing problems and inconsistencies, and also documenting the neglect that some internal shared components had suffered from. Even though most of the components never reacted to the issues in a timely way, I managed to implement the requirement in a robust way, and then started tutoring our remote but brilliant QA team on the installer area, passing on a summary of the history and architecture, and putting forward my opinion on what problems are due to historic baggage unlikely to change, and where the testing effort instead should be directed to, because it is under our control to change.

refactoring away differences in build scripts

On my initiative, and because of the good working relationship I had established with our build engineering counterpart, I started an unsolicited side project to refactor our central build scripts, because this had a high overlap with the packaging-related tasks I was working on. Whereas the build team's focus is on short-term corrections to keep the build running, I took the long-term view and cleaned up the many glaring duplications and inconsistencies that had crept into our build scripts. This resulted in a LOC reduction from 2000 to 950 in the central build scripts alone. Extracting build fragments into Imakerules that can be re-used across the software organization greatly reduced the complexity of the scattered build scripts, and did away with many inconsistencies that had grown from the prevalent copy-and-paste mentality. (But getting the reusable code accepted by a reluctant and shortsighted organization was a challenge way harder than the technical difficulties!)

I carried on with the build refactorings throughout the eventual (again delayed) release of the Integration Adapter 9.10 in April 2011, as they would be helpful for the planned move of our source code into a dedicated repository, up until June, where I moved from working on the Integration Adapter (which was about to be merged and re-branded with another related BSM component done in Israel). Whereas some of my teammates started building integrations into management solutions like Microsoft System Center Operations Manager (using the Integration Adapter we had created), my assignment was to help the colleagues that had come from OMi into my team (one manager was promoted up and away, and my boss had the fewest people, so he benefited from the eventual re-balancing). For me, that presented an opening to work in an even larger team on one of the most complex and biggest products, and I would soon find not just various interesting challenges, but also a soul mate with whom I would closely and very successfully work together for the next decade.

OMi Content Manager

The Content Manager covers the definition, import, and export of management packs; foremost for OMi, but subsequently also core BSM components started writing their own providers and plugged into that extensible component, shedding their legacy configuration mechanisms. (Well, that was the plan — actually we've been dragging along the legacy configuration for years; in some areas we even partially import through both; it doesn't take much imagination to recognize how many issues and inconsistencies this causes, especially during upgrades.) It is comprised of a Groovy on Grails backend, and a Flex 3 frontend running in Flash Player.

screenshot OMi 10.62 Content Manager

As one of the oldest components of OMi, it had seen its share of developers, and various approaches had been used there that later fell out of style in newer parts of OMi. As a crucial component, it was never rewritten or adapted, and maintenance was only applied conservatively. Fortunately, its main developer has been a diligent and experienced person, and he proved to be a treasure-trove of knowledge (and very happy to dispense it). As he is more of a backend guy, I focused on the UI part; basically fixing various issues. One enhancement (differentiating between factory-provided and customer-created content) led me through all administrative UIs of OMi, where I had to add a button and business logic to each UI. Through this, I learned a lot about the code (and the different teams and styles, and historical setup.) Another bigger enhancement added support for attachments; so far, the configuration format was an extensible XML structure, which is convenient for small business objects, but cumbersome for large (binary) blobs of data. Therefore, we supported a new ZIP format where each business object could reference other files. File upload through the Flash Player is difficult, and customers started running into performance issues with large Content Packs. Error handling had been a permanent headache, too (Flash integrates differently in Internet Explorer vs. Firefox). Therefore, I re-wrote the import as a pure HTML form. A trainee implemented a Content Pack preview (allowing to view the ZIP contents and any dependencies in a tree, without actually importing anything), which I've integrated into the product. Through enhancements like adding additional filters of the Artifact tree, I also gained advanced knowledge of the backend.

In the meantime, assignments had shifted, and people had left, so there was only me and my colleague left on Content Manager, and he had slowly been growing weary of the stuff, having worked on it work many years. Because content is central to the functioning, and the import runs during first startup, a constant flow of defects got reported against Content Manager. Most of them were not problems in Content Manager; some were bugs in individual Providers for certain content Artifacts (i.e. on the other side of the API), some were user confusion about the concepts (predefined vs. custom and how Artifacts should behave, and dependencies across Content Packs are hard to understand), but most were caused by deficiencies in the content: Teams mostly did not use the various tools, and instead directly edited the XML content definitions. Some Providers had modeled the Artifact dependencies in strange ways, some did not use important abstractions like predefined vs. custom at all. Some base content had been inherited from the BSM platform and dragged along without really understanding and maintaining it.

Eventually, our architects realized that as well and instigated a "trim down" of the out of the box content: I got a list of Artifacts, hunted them down in the various source definitions, and removed them; then wrote an upgrade that would do the same to existing customer databases. This broke several Content Packs that had dependencies that nobody knew of (and couldn't easily explain)! And other Content Packs depended on these. So, some Artifacts had to be resuscitated, and it took several months until all that mess got sorted out. Based on these experiences, it's unlikely such a thing will be attempted again. For my colleague, most of the problems with content were due to the Garbage in, garbage out principle.

At least the wrong constant reporting of bugs against Content Manager finally ebbed away; some of that was certainly because very few new Content Providers had been written, and the biggest problems in the existing ones had been addressed. By applying the Boy Scout Rule, whenever I investigated one of those submitted bugs, I added or improved the logging so that the next time, it immediately would be clear what caused the problem. With that, submitters or first-level bug dispatchers frequently recognized the correct module to assign the defect to, and hardly any bugs were submitted to Content Manager, and I (again) had managed to eliminate myself out of a job through quality improvements and automation.

By mid-2016, management decided to move responsibility for several components (including Content Manager) to an offshore team in Bangalore tasked with rewriting the old Flex-based UIs as modern and plugin-free HTML UIs. That overall translation effort then took much longer than initially planned, so there was no active transition, but my team and I have been working on different things since then, so Content Manager fell into a kind of long-term hibernation, as my previous cleanup and quality work made this viable. I still would have preferred to have done a more thorough refactoring, as my colleague and I would have been in the best position to do this. For example, though the current filtering in XML works well, it leaves two equally bad alternatives when the requirement to produce JSON for a HTML UI arises: Either treat the filtering as a black box and opaquely convert XML to JSON (dragging along a lot of old and untested code with you), or rewrite the filtering to work on an object tree and only later serialize that to (one or both) formats (which requires intimate knowledge of the domain and implementation history that a new team doesn't have).

OMi Monitoring Automation

screenshot Node Editor

A reorganization had me placed into the team working on the deployment and policy management area, the latter of which consumed the same (well, technically a slight variant of the) Integration Adapter (now called BSM Connector) policy editors. So, I also worked on their Flex UIs, and learned (after Content Manager and core OMi) a third way of how things are done. The feature Health Check added server-based monitoring of connected Agents (something we've identified years ago in the Integration Adapter use cases, and which so far had to be hand-crafted by each integrator that took monitoring seriously). I contributed the UI enhancements, and drew cute heart-shaped status icons that were just meant as a temporary placeholder, but then even made it into the final product.

Later, I worked on additional command-line interfaces for the Monitoring Automation part of the product, as the lack of these still prevented some large, long-standing customers to make the switch from the predecessor OM products to OMi; for large environments, it's impractical to do things in the UI, and there is demand for automation interfaces that can be easily scripted. These interfaces are rather simple Java applications that understand the original or equivalent command-line options, invoke one or multiple backend REST APIs, and return the data in a suitable textual output format. (Unfortunately, some developers stick to the state of the art of twenty years ago, and implement the whole logic in a single application class with several 200+ lines-long procedures, making this impossible to test, and therefore creating a breeding ground for bugs. This then presents me with the dilemma of either going along with it (extra carefully), or creating a schizophrenic mixture of styles.)

Common requirements like command-line help, common authentication arguments, destination server address had been factored out into a generic base class that I had already both used and shaped for the corresponding Content Manager command-line interfaces. Likewise, integration tests had been implemented in the same home-grown test framework (that, as a non-productive internal component suffered from the usual neglect) — I used the occasion to clean up tests (often by extracting test fixtures for reuse, defining proper dependencies among tests instead of relying on the hard-coded fixed execution order) and identifying useful additional helper methods for the test framework itself. I also introduced another home-grown deployment utility (written by OMi developers, and so far ignored by Monitoring Automation devs) to this part of the product in order to simplify deployment of changes to a live test environment, making this a one-click effort.

Fortunately, the internal barriers between teams (OMi developers were initially recruited from the Unix variant of the old Operations Manager predecessor, whereas the Monitoring Automation team was made up of developers from the Operations Manager for Windows team) were being eroded by reassignments (like mine), but also by two exoduses, the first when a local team for an Analytics product was built up, the second when several people decided to take a voluntary severance package. In the end, the few remaining Monitoring Automation devs were working on the (far fewer) core features, and defects and smaller enhancements were handled by anyone (including me). Finally, management realized that we were lacking manpower to make an impact, yet the fragmentation of working on different components also made us inefficient (as with Content Manager, technical debt had been documented, but not tackled in a meaningful way), so responsibility for Monitoring Automation was moved to a bigger offshore team, and the last two core Monitoring Automation developers became Product Owner and lead developer there.

I'm glad to have had a positive effect on that code base, and was regularly commended for my diligent work, the fresh perspective I brought in as an outsider that quickly built great rapport with the team, and my enthusiasm in attacking code smells through meaningful yet controlled refactorings on the side.

responsibilities

analysis of high-level marketing requirements
definition of use cases, product functionality and scope
implementation of Java web application backend with BlazeDS marshaling, JSP pages, HTML chrome, Flex rich client
maintenance and refactoring of multi-platform, automated build scripts (Korn shell, Imake)
application and component package definition, development of custom installation scripts
identification of issues with the used tool set and internal libraries, and channeling those to the distributed teams
refactoring, extension, and long-term maintenance of a core legacy component for bundling, import and export of the core domain data; comprising backend, frontend, command-line interface, data format, and internal integration APIs
brownfield development of new features and bug fixes in the context of a complex legacy application, covering all areas of the product and integrations
amplifying collaboration and knowledge exchange, spreading and aligning best practices among teams, role model for solid software engineering practices, tutoring of junior developers on established norms and processes

skills

project and implementation responsibility for components consumed by multiple off-site teams
HP Software product portfolio: HP Operations Manager for Windows 8.10 / Unix 9 / Linux 9, HP Operations Manager i 9 / 10, HP Business Service Manager 9

used tools

Microsoft Windows 7, Ubuntu 9.10+ operating systems (development); Windows 2003, 2008 and RHEL 5.5 targets
Eclipse 3.5 (Java), Adobe FlexBuilder 3 (Flex, ActionScript), IntelliJ IDEA 10+ development environment (Java, Groovy) with Maven 2 and Imake-based build
Apache Tomcat 6 and JBoss 7 web application servers, with Spring and Hibernate
CollabNet Subversion source code repository / configuration management
HP QualityCenter 10 / ALM 11 for project planning, task tracking, defect handling

Ingo Karkat; last update 09-Feb-2025