Tuesday 23 August 2011

Controlled Use of Excel for BI

1. Introduction

In most Financial Institutions the use of Excel is so embedded that any project to curb this is almost doomed to failure from the start.

This blog is an attempt to elicit discussions and to explore the appetite to address some of the inherent risks with Excel spreadsheets and to see how one can deliver supportable MI using Excel. It should be stressed that this proposal should not be seen as a replacement for strategic tools or processes.

  
2. Benefits and Risks of Using Excel

The use of Excel comes with a multitude of pros and cons. In days past the downsides were considered to be worth the trade-off against the benefits but his is starting to change with the introduction of more stringent governance via legislation such as Solvency II, Basel II and SOX.
Below is a list of benefits and risks of using Excel in your core data streams.


2.1. Benefits
• The skills base for Excel is large and practically every person within a company will be familiar with it.


• Excel is relatively cheap and most users in an organization will have it.

• Excel lends itself to very easily creating financial models in a relatively short time span which normally could not be done in a more structured environment. Less reliance on IT means more flexibility and less time.

• The user experience from creating datasets to graphs is very intuitive and rich.


2.2. Risks
• Any VBA code or macros are specific to the workbook and are not held in a central repository to be reused. This raises concerns about version and change control over the code.

• If the required data set is sourced from any application/database within the IT estate, no way exists to document data lineage or do impact assessments when designing changes to these sub systems.

• Although Excel skills are widespread, most users will develop the model based on easy to develop rather than efficiency of code. No standard way of developing models makes it difficult to support/hand over.

• Any links to external data are also held in the workbook or specific to the PC. Often they contain T-SQL specific to the use-case and this is also not subject to version or change control.

• Excel has restrictions on the amount of data that it can consume. (1 million rows for Excel 2010, 2007 and 65,000 rows for Excel 2003)

• Security to the underlying database is not controlled via a centrally maintained application account but rather based on individual users. This requires users to be added and maintained at Database level.

• Access to spreadsheet models in not restricted via permissions or passwords as they are mostly stored in shared drives accessible to large groups of people.

• Users often save versions of the spreadsheet at various points during a process cycle and between process cycles for archiving. This leads to an explosion of redundant data and cuts of the code being held leading to increased storage costs. This can also lead to confusion as to what is the most recent and correct version. These issues are compounded by users sharing files by e-mailing them to each other.

• The models can become complex in nature and often process large volumes of data. These models can take a long time to run on a PC and in some cases can crash.

• Distributed development and locations lead to key person dependencies.


3. Risk Mitigation

In order to mitigate the risks outlined above one could put the following processes in place. Refer to Annexure A for the risk mitigation matrix.

  
• Set up standards on structure and development of Excel models.

• Ensure all models are developed against these standards, peer reviewed, tested, documented and appropriately transitioned into production.

• Store the models centrally in a controlled and identifiable location.

• Convert macros and VBA into add-ins and store them centrally.

• Implement security around the models for both access and changes.

• Implement appropriate change control and monitoring of the models to ensure that they are not changed without authorization.

In addition to the new or changed processes, technology can be used to mitigate some of the risks. Based on the risk matrix in Annexure B, a combination of Excel Services 2010 with PowerPivot and Prodiance, in theory could mitigate all the identified risks.
Below is a discussion around these technologies and how they could mitigate the risks.


4. Excel Services

Excel Services is the Excel engine delivered through Sharepoint and has the following key advantages:
• As it is delivered via Sharepoint on a server you can leverage the availability of larger memory, many more processors and 64 bit technology. This should reduce the time it takes to run larger and complex data models.

• Security is controlled via standard Sharepoint functionality.

• Centrally used data connections can be used thus eliminating the need for users to be granted direct access to the data.

• When published to Sharepoint one can only make certain parts of the model visible to the users thus hiding any business logic and underlying assumptions, enhancing security even further.

• Automatic scheduling of data updates can be implemented thus making refreshed reports instantly available to the users.

• As workbooks are versioned you can always restore to a previous one should you it be required.
 
Excel services does have some disadvantages listed below:
• Not everything that is native to the desktop version will work in Excel services such as add-ins and certain controls. Also some functionality becomes limited such as Pivot Tables and screen split and zoom functions.

• In order to modify the models you still need to pull them into the desktop version first and then republish them.

• Sharepoint is not a particularly easy technology to deploy and maintain and in order for Excel Services to be installed you have to have the Enterprise version.

5. PowerPivot

PowerPivot is an Excel add-in which has been developed by the Microsoft SQL Server Analysis Services Team and uses in-memory column compression technology branded as Vertipaq. This technology is very similar to that used by such market leaders as Qlikview. PowerPivot allows you to bring in large data sets into Excel and even join these sets to each other and then use the resulting data to report off. It is very efficient at consuming large amounts of data and when used in x64 bit mode with large amounts of memory is quite astonishing in its response.
 
PowerPivot also extends to Sharepoint and when combined with Excel Services in Sharepoint the uses and power suddenly start to become apparent.

6. Prodiance

In one of my blogs a while back I mentioned 4 companies that provide Excel Spreadsheet Control Software. We ultimately decided to back Prodiance as our choice, a decision that has now been ratified by Microsoft.
 
Up until 7th June 2011, Prodiance was an independent company specializing in risk and control software for Excel. It has now been purchased by Microsoft and is a wholly owned subsidiary. With Microsoft now purchasing the company more integration with the Office suite and Sharepoint is anticipated.
Advantages of implementing Prodiance over your spreadsheets are as follows (as per Prodiance pdf):
 
• Electronic sign-off and optional eSignatures

• Email notification of significant or unauthorized changes (e.g. exceptions, policy violations)

• Extensive cell-by-cell, file level and workflow audit trails

• Side-by-side comparison of changes between versions

• Management reports, dashboards and drill-down into detailed reports

• Automated document versioning

• Check-in/check-out (optional)

• Web based access to all historical versions

• Association of parent/child versions and unified view across all audit trails and reports

• Robust document security model integrated with Active Directory/LDAP users and groups

• Permissions to grant appropriate folder and document access levels (e.g. view, add, update, delete, approve, etc.).

• Microsoft Information Rights Management (IRM) encryption for spreadsheets containing sensitive information

• Workbook, worksheet and cell level protection

• Optional lock-down for cell level input control with data validation

• Excel Services for displaying and publishing read-only versions of critical spreadsheets and BI reports

• Extensive cell-by-cell, file level, system level and process level audit trails

• Tracking of changes to key inputs, outputs, spreadsheet data, formulas, macros and queries

• Tracking of changes to queries and data connections to external data sources

• Auditing support for spreadsheets and Access databases

• Optional lock-down for cell level input control with data validation

• Interactive link/dependency diagrams

• Auditing of specific input ranges (including named ranges)

• Automated email alerts upon changes to input ranges

• Color scheme tool to highlight (used and unused) input cells

• Spreadsheet validation and testing via automated cell, formula and range diagnostics

• Proactive identification of spreadsheet development and structural problems

• Comprehensive document and records management

• Enterprise class workflow management

7. Conclusion

In order to achieve the controlled use of Excel within an organization you will need to go through a change in technology, process and culture. I hope this blog has provided some readers with a spark for debate.

Appendix A
 
 
 
Appendix B