Sunday 4 September 2011

Correlate

The Google guys have been cooking up some very interesting stuff in the labs again. I took a look at the new trending service called Google Correlate today. Not entirely sure of its commercial value as an online free tool but it appears quite nifty.

One has to have a Google login in order to access the following site: http://www.google.com/trends/correlate but once in it is quite easy to navigate. One is able to trend web search data the Google has been collecting for many years. You can draw a line graph and the software will then match that profile to the closest profile it can find. One can also type in a web search string and it then finds the closest trend to that. For example, being in the Insurance industry, I typed in "Motor Vehicle Accidents" and the top 4 returned search trends are "vehicle accidents" (duh), "city planning", "questionnaires" and "battered women". The correlation between MVA's and battered women is strong (0.8744). I guess the big question is why? One is also able to upload a data file of time stamped data in order to find the closest correlation.

I guess one just needs to be mindful of the fact that these trends are not actual events but trends of search patterns. I would be interested in hearing from anybody who has used this technology in order to bring real life business use cases to life.

- Paul Steynberg

Tuesday 23 August 2011

Controlled Use of Excel for BI

1. Introduction

In most Financial Institutions the use of Excel is so embedded that any project to curb this is almost doomed to failure from the start.

This blog is an attempt to elicit discussions and to explore the appetite to address some of the inherent risks with Excel spreadsheets and to see how one can deliver supportable MI using Excel. It should be stressed that this proposal should not be seen as a replacement for strategic tools or processes.

  
2. Benefits and Risks of Using Excel

The use of Excel comes with a multitude of pros and cons. In days past the downsides were considered to be worth the trade-off against the benefits but his is starting to change with the introduction of more stringent governance via legislation such as Solvency II, Basel II and SOX.
Below is a list of benefits and risks of using Excel in your core data streams.


2.1. Benefits
• The skills base for Excel is large and practically every person within a company will be familiar with it.


• Excel is relatively cheap and most users in an organization will have it.

• Excel lends itself to very easily creating financial models in a relatively short time span which normally could not be done in a more structured environment. Less reliance on IT means more flexibility and less time.

• The user experience from creating datasets to graphs is very intuitive and rich.


2.2. Risks
• Any VBA code or macros are specific to the workbook and are not held in a central repository to be reused. This raises concerns about version and change control over the code.

• If the required data set is sourced from any application/database within the IT estate, no way exists to document data lineage or do impact assessments when designing changes to these sub systems.

• Although Excel skills are widespread, most users will develop the model based on easy to develop rather than efficiency of code. No standard way of developing models makes it difficult to support/hand over.

• Any links to external data are also held in the workbook or specific to the PC. Often they contain T-SQL specific to the use-case and this is also not subject to version or change control.

• Excel has restrictions on the amount of data that it can consume. (1 million rows for Excel 2010, 2007 and 65,000 rows for Excel 2003)

• Security to the underlying database is not controlled via a centrally maintained application account but rather based on individual users. This requires users to be added and maintained at Database level.

• Access to spreadsheet models in not restricted via permissions or passwords as they are mostly stored in shared drives accessible to large groups of people.

• Users often save versions of the spreadsheet at various points during a process cycle and between process cycles for archiving. This leads to an explosion of redundant data and cuts of the code being held leading to increased storage costs. This can also lead to confusion as to what is the most recent and correct version. These issues are compounded by users sharing files by e-mailing them to each other.

• The models can become complex in nature and often process large volumes of data. These models can take a long time to run on a PC and in some cases can crash.

• Distributed development and locations lead to key person dependencies.


3. Risk Mitigation

In order to mitigate the risks outlined above one could put the following processes in place. Refer to Annexure A for the risk mitigation matrix.

  
• Set up standards on structure and development of Excel models.

• Ensure all models are developed against these standards, peer reviewed, tested, documented and appropriately transitioned into production.

• Store the models centrally in a controlled and identifiable location.

• Convert macros and VBA into add-ins and store them centrally.

• Implement security around the models for both access and changes.

• Implement appropriate change control and monitoring of the models to ensure that they are not changed without authorization.

In addition to the new or changed processes, technology can be used to mitigate some of the risks. Based on the risk matrix in Annexure B, a combination of Excel Services 2010 with PowerPivot and Prodiance, in theory could mitigate all the identified risks.
Below is a discussion around these technologies and how they could mitigate the risks.


4. Excel Services

Excel Services is the Excel engine delivered through Sharepoint and has the following key advantages:
• As it is delivered via Sharepoint on a server you can leverage the availability of larger memory, many more processors and 64 bit technology. This should reduce the time it takes to run larger and complex data models.

• Security is controlled via standard Sharepoint functionality.

• Centrally used data connections can be used thus eliminating the need for users to be granted direct access to the data.

• When published to Sharepoint one can only make certain parts of the model visible to the users thus hiding any business logic and underlying assumptions, enhancing security even further.

• Automatic scheduling of data updates can be implemented thus making refreshed reports instantly available to the users.

• As workbooks are versioned you can always restore to a previous one should you it be required.
 
Excel services does have some disadvantages listed below:
• Not everything that is native to the desktop version will work in Excel services such as add-ins and certain controls. Also some functionality becomes limited such as Pivot Tables and screen split and zoom functions.

• In order to modify the models you still need to pull them into the desktop version first and then republish them.

• Sharepoint is not a particularly easy technology to deploy and maintain and in order for Excel Services to be installed you have to have the Enterprise version.

5. PowerPivot

PowerPivot is an Excel add-in which has been developed by the Microsoft SQL Server Analysis Services Team and uses in-memory column compression technology branded as Vertipaq. This technology is very similar to that used by such market leaders as Qlikview. PowerPivot allows you to bring in large data sets into Excel and even join these sets to each other and then use the resulting data to report off. It is very efficient at consuming large amounts of data and when used in x64 bit mode with large amounts of memory is quite astonishing in its response.
 
PowerPivot also extends to Sharepoint and when combined with Excel Services in Sharepoint the uses and power suddenly start to become apparent.

6. Prodiance

In one of my blogs a while back I mentioned 4 companies that provide Excel Spreadsheet Control Software. We ultimately decided to back Prodiance as our choice, a decision that has now been ratified by Microsoft.
 
Up until 7th June 2011, Prodiance was an independent company specializing in risk and control software for Excel. It has now been purchased by Microsoft and is a wholly owned subsidiary. With Microsoft now purchasing the company more integration with the Office suite and Sharepoint is anticipated.
Advantages of implementing Prodiance over your spreadsheets are as follows (as per Prodiance pdf):
 
• Electronic sign-off and optional eSignatures

• Email notification of significant or unauthorized changes (e.g. exceptions, policy violations)

• Extensive cell-by-cell, file level and workflow audit trails

• Side-by-side comparison of changes between versions

• Management reports, dashboards and drill-down into detailed reports

• Automated document versioning

• Check-in/check-out (optional)

• Web based access to all historical versions

• Association of parent/child versions and unified view across all audit trails and reports

• Robust document security model integrated with Active Directory/LDAP users and groups

• Permissions to grant appropriate folder and document access levels (e.g. view, add, update, delete, approve, etc.).

• Microsoft Information Rights Management (IRM) encryption for spreadsheets containing sensitive information

• Workbook, worksheet and cell level protection

• Optional lock-down for cell level input control with data validation

• Excel Services for displaying and publishing read-only versions of critical spreadsheets and BI reports

• Extensive cell-by-cell, file level, system level and process level audit trails

• Tracking of changes to key inputs, outputs, spreadsheet data, formulas, macros and queries

• Tracking of changes to queries and data connections to external data sources

• Auditing support for spreadsheets and Access databases

• Optional lock-down for cell level input control with data validation

• Interactive link/dependency diagrams

• Auditing of specific input ranges (including named ranges)

• Automated email alerts upon changes to input ranges

• Color scheme tool to highlight (used and unused) input cells

• Spreadsheet validation and testing via automated cell, formula and range diagnostics

• Proactive identification of spreadsheet development and structural problems

• Comprehensive document and records management

• Enterprise class workflow management

7. Conclusion

In order to achieve the controlled use of Excel within an organization you will need to go through a change in technology, process and culture. I hope this blog has provided some readers with a spark for debate.

Appendix A
 
 
 
Appendix B
 
 
 

  

  

  

Wednesday 27 July 2011

New Features in SQL Server Code-name Denali - CTP3

Microsoft are shipping some new technology in the upcoming version of SQL Server. This new technology is code-named "Apollo" and introduces two new features.
  • Columnstore Indexes
  • Vector-based query execution
These 2 features claim to speed up Data Warehouse query processing time by a factor of between 10 and 100. Follow this link for a full description of the these features. You can download the CTP3 of Denali here.

- Paul Steynberg

Monday 25 July 2011

SQL Server Project "Crescent" Demo

Follow this link to see a demonstration of SQL Server Project "Crescent". To quote "a new immersive ad-hoc visualization tool that is part of SQL Server Code Name "Denali" Reporting Services.  Project "Crescent" is designed with end users in mind to quickly, easily, and visually explore their data and answer ad-hoc questions in just a few clicks."

It sounds like exciting stuff and certainly plugs some gaps in the Microsoft stack.

- Paul Steynberg

Saturday 18 June 2011

Paperless Home

The concept of going paperless both at home and at the office has been around for many years. The execution of this idea has not always been easy or successful. I have tried this in the past with flatbed scanners and various methods of storing the output for easy retrieval and indexing. To date my attempts have been somewhat failures. This is until now. A few weeks ago I found the perfect combination of scanner and software that make the entire process a pleasure.

The solution I am using is for the Mac so if you are PC based you may need to do some more research. My requirements were quite simple. The process of scanning should be effortless and the software should be easy to use. I started looking at Evernote but did not like the idea of all my documents being stored in the cloud, especially bank statements, copies of passports, certificates, mortgage statements etc. Evernote is free as a download but in order to get the most out of it you are required to sign up with them on a monthly or annual basis. It is not expensive, $5 per month or $45 per annum and is very handy for ensuring your documents are held in offsite storage for backups. The application is great but the security issue tipped the scales for me.

After a lot of research and testing out various products I finally decided on DevonThink. Similar concepts to Evernote but a once off fee ($49.99) for the product and you store the documents locally in a database. This would mean that I would have to make sure that I am backed up but the Mac does this automatically through Time Machine.

DevonThink is easy to use comes with some great features such as side bar tray to which you can just drag documents from anywhere and an add-in to Safari that allows you to capture the open web page directly.

I then turned by attention to document scanners and opted for the Fujitsu ScanSnap S1300. This great little scanner (280mm long and 100mm wide) allows you to scan up to 10 pages at a time and can do either 1 side or both sides of the document. The other great feature I found is the ability to set up profiles for scanning which then call DevonThink and drop the document into the inbox for categorisation. It also does its own OCR and orientates the pages based on character recognition. Once the document is in DevonThink you can then search within the scanned documents for key words.

Now when the post arrives or we get back from shopping we just put the invoices/documents through the scanner with one touch of the button and then categorise them later in DevonThink when we have some quiet time. In the mean time all those documents can be destroyed and no more filing.

- Paul Steynberg