Sunday, September 4, 2011

Correlate

The Google guys have been cooking up some very interesting stuff in the labs again. I took a look at the new trending service called Google Correlate today. Not entirely sure of its commercial value as an online free tool but it appears quite nifty.

One has to have a Google login in order to access the following site: http://www.google.com/trends/correlate but once in it is quite easy to navigate. One is able to trend web search data the Google has been collecting for many years. You can draw a line graph and the software will then match that profile to the closest profile it can find. One can also type in a web search string and it then finds the closest trend to that. For example, being in the Insurance industry, I typed in "Motor Vehicle Accidents" and the top 4 returned search trends are "vehicle accidents" (duh), "city planning", "questionnaires" and "battered women". The correlation between MVA's and battered women is strong (0.8744). I guess the big question is why? One is also able to upload a data file of time stamped data in order to find the closest correlation.

I guess one just needs to be mindful of the fact that these trends are not actual events but trends of search patterns. I would be interested in hearing from anybody who has used this technology in order to bring real life business use cases to life.

- Paul Steynberg

Tuesday, August 23, 2011

Controlled Use of Excel for BI

1. Introduction

In most Financial Institutions the use of Excel is so embedded that any project to curb this is almost doomed to failure from the start.

This blog is an attempt to elicit discussions and to explore the appetite to address some of the inherent risks with Excel spreadsheets and to see how one can deliver supportable MI using Excel. It should be stressed that this proposal should not be seen as a replacement for strategic tools or processes.

  
2. Benefits and Risks of Using Excel

The use of Excel comes with a multitude of pros and cons. In days past the downsides were considered to be worth the trade-off against the benefits but his is starting to change with the introduction of more stringent governance via legislation such as Solvency II, Basel II and SOX.
Below is a list of benefits and risks of using Excel in your core data streams.


2.1. Benefits
• The skills base for Excel is large and practically every person within a company will be familiar with it.


• Excel is relatively cheap and most users in an organization will have it.

• Excel lends itself to very easily creating financial models in a relatively short time span which normally could not be done in a more structured environment. Less reliance on IT means more flexibility and less time.

• The user experience from creating datasets to graphs is very intuitive and rich.


2.2. Risks
• Any VBA code or macros are specific to the workbook and are not held in a central repository to be reused. This raises concerns about version and change control over the code.

• If the required data set is sourced from any application/database within the IT estate, no way exists to document data lineage or do impact assessments when designing changes to these sub systems.

• Although Excel skills are widespread, most users will develop the model based on easy to develop rather than efficiency of code. No standard way of developing models makes it difficult to support/hand over.

• Any links to external data are also held in the workbook or specific to the PC. Often they contain T-SQL specific to the use-case and this is also not subject to version or change control.

• Excel has restrictions on the amount of data that it can consume. (1 million rows for Excel 2010, 2007 and 65,000 rows for Excel 2003)

• Security to the underlying database is not controlled via a centrally maintained application account but rather based on individual users. This requires users to be added and maintained at Database level.

• Access to spreadsheet models in not restricted via permissions or passwords as they are mostly stored in shared drives accessible to large groups of people.

• Users often save versions of the spreadsheet at various points during a process cycle and between process cycles for archiving. This leads to an explosion of redundant data and cuts of the code being held leading to increased storage costs. This can also lead to confusion as to what is the most recent and correct version. These issues are compounded by users sharing files by e-mailing them to each other.

• The models can become complex in nature and often process large volumes of data. These models can take a long time to run on a PC and in some cases can crash.

• Distributed development and locations lead to key person dependencies.


3. Risk Mitigation

In order to mitigate the risks outlined above one could put the following processes in place. Refer to Annexure A for the risk mitigation matrix.

  
• Set up standards on structure and development of Excel models.

• Ensure all models are developed against these standards, peer reviewed, tested, documented and appropriately transitioned into production.

• Store the models centrally in a controlled and identifiable location.

• Convert macros and VBA into add-ins and store them centrally.

• Implement security around the models for both access and changes.

• Implement appropriate change control and monitoring of the models to ensure that they are not changed without authorization.

In addition to the new or changed processes, technology can be used to mitigate some of the risks. Based on the risk matrix in Annexure B, a combination of Excel Services 2010 with PowerPivot and Prodiance, in theory could mitigate all the identified risks.
Below is a discussion around these technologies and how they could mitigate the risks.


4. Excel Services

Excel Services is the Excel engine delivered through Sharepoint and has the following key advantages:
• As it is delivered via Sharepoint on a server you can leverage the availability of larger memory, many more processors and 64 bit technology. This should reduce the time it takes to run larger and complex data models.

• Security is controlled via standard Sharepoint functionality.

• Centrally used data connections can be used thus eliminating the need for users to be granted direct access to the data.

• When published to Sharepoint one can only make certain parts of the model visible to the users thus hiding any business logic and underlying assumptions, enhancing security even further.

• Automatic scheduling of data updates can be implemented thus making refreshed reports instantly available to the users.

• As workbooks are versioned you can always restore to a previous one should you it be required.
 
Excel services does have some disadvantages listed below:
• Not everything that is native to the desktop version will work in Excel services such as add-ins and certain controls. Also some functionality becomes limited such as Pivot Tables and screen split and zoom functions.

• In order to modify the models you still need to pull them into the desktop version first and then republish them.

• Sharepoint is not a particularly easy technology to deploy and maintain and in order for Excel Services to be installed you have to have the Enterprise version.

5. PowerPivot

PowerPivot is an Excel add-in which has been developed by the Microsoft SQL Server Analysis Services Team and uses in-memory column compression technology branded as Vertipaq. This technology is very similar to that used by such market leaders as Qlikview. PowerPivot allows you to bring in large data sets into Excel and even join these sets to each other and then use the resulting data to report off. It is very efficient at consuming large amounts of data and when used in x64 bit mode with large amounts of memory is quite astonishing in its response.
 
PowerPivot also extends to Sharepoint and when combined with Excel Services in Sharepoint the uses and power suddenly start to become apparent.

6. Prodiance

In one of my blogs a while back I mentioned 4 companies that provide Excel Spreadsheet Control Software. We ultimately decided to back Prodiance as our choice, a decision that has now been ratified by Microsoft.
 
Up until 7th June 2011, Prodiance was an independent company specializing in risk and control software for Excel. It has now been purchased by Microsoft and is a wholly owned subsidiary. With Microsoft now purchasing the company more integration with the Office suite and Sharepoint is anticipated.
Advantages of implementing Prodiance over your spreadsheets are as follows (as per Prodiance pdf):
 
• Electronic sign-off and optional eSignatures

• Email notification of significant or unauthorized changes (e.g. exceptions, policy violations)

• Extensive cell-by-cell, file level and workflow audit trails

• Side-by-side comparison of changes between versions

• Management reports, dashboards and drill-down into detailed reports

• Automated document versioning

• Check-in/check-out (optional)

• Web based access to all historical versions

• Association of parent/child versions and unified view across all audit trails and reports

• Robust document security model integrated with Active Directory/LDAP users and groups

• Permissions to grant appropriate folder and document access levels (e.g. view, add, update, delete, approve, etc.).

• Microsoft Information Rights Management (IRM) encryption for spreadsheets containing sensitive information

• Workbook, worksheet and cell level protection

• Optional lock-down for cell level input control with data validation

• Excel Services for displaying and publishing read-only versions of critical spreadsheets and BI reports

• Extensive cell-by-cell, file level, system level and process level audit trails

• Tracking of changes to key inputs, outputs, spreadsheet data, formulas, macros and queries

• Tracking of changes to queries and data connections to external data sources

• Auditing support for spreadsheets and Access databases

• Optional lock-down for cell level input control with data validation

• Interactive link/dependency diagrams

• Auditing of specific input ranges (including named ranges)

• Automated email alerts upon changes to input ranges

• Color scheme tool to highlight (used and unused) input cells

• Spreadsheet validation and testing via automated cell, formula and range diagnostics

• Proactive identification of spreadsheet development and structural problems

• Comprehensive document and records management

• Enterprise class workflow management

7. Conclusion

In order to achieve the controlled use of Excel within an organization you will need to go through a change in technology, process and culture. I hope this blog has provided some readers with a spark for debate.

Appendix A
 
 
 
Appendix B
 
 
 

  

  

  

Wednesday, July 27, 2011

New Features in SQL Server Code-name Denali - CTP3

Microsoft are shipping some new technology in the upcoming version of SQL Server. This new technology is code-named "Apollo" and introduces two new features.
  • Columnstore Indexes
  • Vector-based query execution
These 2 features claim to speed up Data Warehouse query processing time by a factor of between 10 and 100. Follow this link for a full description of the these features. You can download the CTP3 of Denali here.

- Paul Steynberg

Monday, July 25, 2011

SQL Server Project "Crescent" Demo

Follow this link to see a demonstration of SQL Server Project "Crescent". To quote "a new immersive ad-hoc visualization tool that is part of SQL Server Code Name "Denali" Reporting Services.  Project "Crescent" is designed with end users in mind to quickly, easily, and visually explore their data and answer ad-hoc questions in just a few clicks."

It sounds like exciting stuff and certainly plugs some gaps in the Microsoft stack.

- Paul Steynberg

Saturday, June 18, 2011

Paperless Home

The concept of going paperless both at home and at the office has been around for many years. The execution of this idea has not always been easy or successful. I have tried this in the past with flatbed scanners and various methods of storing the output for easy retrieval and indexing. To date my attempts have been somewhat failures. This is until now. A few weeks ago I found the perfect combination of scanner and software that make the entire process a pleasure.

The solution I am using is for the Mac so if you are PC based you may need to do some more research. My requirements were quite simple. The process of scanning should be effortless and the software should be easy to use. I started looking at Evernote but did not like the idea of all my documents being stored in the cloud, especially bank statements, copies of passports, certificates, mortgage statements etc. Evernote is free as a download but in order to get the most out of it you are required to sign up with them on a monthly or annual basis. It is not expensive, $5 per month or $45 per annum and is very handy for ensuring your documents are held in offsite storage for backups. The application is great but the security issue tipped the scales for me.

After a lot of research and testing out various products I finally decided on DevonThink. Similar concepts to Evernote but a once off fee ($49.99) for the product and you store the documents locally in a database. This would mean that I would have to make sure that I am backed up but the Mac does this automatically through Time Machine.

DevonThink is easy to use comes with some great features such as side bar tray to which you can just drag documents from anywhere and an add-in to Safari that allows you to capture the open web page directly.

I then turned by attention to document scanners and opted for the Fujitsu ScanSnap S1300. This great little scanner (280mm long and 100mm wide) allows you to scan up to 10 pages at a time and can do either 1 side or both sides of the document. The other great feature I found is the ability to set up profiles for scanning which then call DevonThink and drop the document into the inbox for categorisation. It also does its own OCR and orientates the pages based on character recognition. Once the document is in DevonThink you can then search within the scanned documents for key words.

Now when the post arrives or we get back from shopping we just put the invoices/documents through the scanner with one touch of the button and then categorise them later in DevonThink when we have some quiet time. In the mean time all those documents can be destroyed and no more filing.

- Paul Steynberg

Thursday, February 25, 2010

iXBRL - LinkedIn Group

Hi Everyone

I recently attended a product sponsored session on iXBRL and HMRC/Companies House submissions for the UK. I will put together an updated paper on new developments this week but would like to just highlight that I have created a group on LinkedIn called "iXBRL in the UK". This group is specifically designed to address issues that companies are having with the submission of accounts in the format and to discuss not only the process being followed to comply but also a forum to discuss any software solutions that start to make themselves available. Just search for it on LinkedIn and I look forward to any contributions.

- Paul Steynberg

Tuesday, February 16, 2010

PowerPivot Storage

For you hard core techies out there, stop reading now. This post is for those people in the world who had no idea that the Office 2007 (and upwards) files are actually just a bunch of files zipped up. I was curious to see how PowerPivot Data was stored with the Excel spreadsheet and was amazed to find out that a number of my associates were unaware of the make up of the new Excel formats.

In order to illustrate the concept take an Excel (or Word) 2007/2010 file and just to be safe make a copy of it. Then change the file extension from XLSX to ZIP and then open in Winzip. Shock and horror we find a whole bunch of files (mostly XML) in various directory structures. This was done in order to comply with Office Open XML Format. Read all about it here.

Back to the original question about how the PowerPivot Data is stored. It is stored in a directory called xl\customData with the extension DATA within the Excel (zip) file. Here is a bit more on how the data interacts with your memory.

- Paul Steynberg

Monday, February 1, 2010

PowerPivot

Microsoft promised BI for the masses and PowerPivot is the first real taste of this promise. Originally code named Gemini this Excel Add-in was developed by the SQL Server Team. In a nutshell it is SQL Server Analysis Services on your PC. In order to get it up and running you have to have Office 2010 (Beta) installed and XP should be on SP3. Here is a full list of requirements.

I am not going into the full technical bits of the product as this has been done many times before and here is a full list of links to get you up to speed:


Take a look at the demo videos as they are quite quick and really show the power of the product. To illustrate I took a normalised data source and imported 3.5 million rows into PowerPivot in under 2 minutes. I then created a Pivot Table from the data with the new look and feel of having slicers across the top and down the left hand side. The results were blistering fast.

This is a first release and the product is full of bugs. As with most Microsoft products wait for V2 before releasing into your production environment. More to come later.

- Paul Steynberg

Monday, October 12, 2009

iXBRL - Don't Panic

Introduction

Companies in the UK will have to submit financials to Her Majesty’s Revenue and Customs (HMRC) via an online process within the next 2 years. I had a session with Ernst & Young about the language to be used called XBRL. I have also attended a workshop at HMRC titled “Company Tax Returns and Online Filing”. This is a summary of my findings and my personal views on the impact.

Overview of XBRL

XBRL stands for eXtensible Business Reporting Language. It is one of a family of "XML" languages which is becoming a standard means of communicating information between businesses and on the internet. (1). iXBRL is a slight derivative of this language which stands for InLine eXtensible Business Reporting Language and allows the data to be read by computers and viewed by humans all from the same file.

Impact

All companies filing tax after 31st March 2011 will have to comply with the requirements of submitting all the data online. This will affect 3 submissions as follows:

  • CT600 – Currently can be submitted via XML and will not change.
  • Accounts – Must be submitted online in iXBRL format.
  • Tax Computations – Must be submitted online in iXBRL format.

What will not change is WHAT you submit, WHO is responsible for submitting it and WHEN it must be submitted, only HOW you submit it will change.

4 Taxonomies will be supported by HMRC but for the first 2 years of implementation only a limited subset of each taxonomy will be required.

Taxonomy

Full Tag Set

Limited Tag Set

UK GAAP

4375

1182

UK IFRS

3400

<1600

UK Common Data

900

900

CT Computational

4548

1350

One is not restricted to the limited tag set but you are only required to at least submit these for the first 2 years. Thereafter the full tag set will be required.

The current versions of the taxonomies can be downloaded from this website.( http://www.xbrl.org/uk/Taxonomies/)

The Companies House and HMRC issued a joint statement on 1st September 2009 stating that they are working together and that both their filing services are being aligned so that a single point of filing can be used.

Software

As with any technology solution we will find many ways of solving this challenge ranging from ERP Supplier based solutions right through to stand-alone file converters. Unfortunately iXBRL is not yet widely used so I have not yet managed to find anything worth taking a look at yet. HMRC is busy reviewing a number of products and certifying them but when pushed they did not want to yield any names or costs just in case they were seen to favour one supplier over another. They did assure us that a number of suppliers would be ready for market in Q1 of 2010.

Here is a list of companies who I believe are working on a solution. Do not take this as gospel, just through my googling.

www.savanet.net

www.corefiling.com

www.semansys.com

www.rivetsoftware.com

www.ibs.nl

www.edgarfilings.com

www.allocationsolutions.com

www.justsystems.com

Approach

Awareness of XBRL and its impact is at an acceptable level within the UK market. Based on the Q&A session with HMRC the first 2 years are really going to be a phased/teething process and companies will not be penalised or prosecuted if they don’t get it quite right. We also do not have any products to work with as they are still being developed.

I would suggest a measured approach to the online filing which starts with Finance obtaining the taxonomies that they intend to use for filing and to start manually mapping the figures across from the financials. This in my opinion is going to be the largest portion of the work required. A deadline of the mapping tables by end Q2 2010 should be achievable.

Get your IT Department to keep tabs on the development of the various alternatives in the software space and then organize demonstrations and evaluations as and when possible. I estimate that we should be able to find something by Q2 2010. Based on all the input I would at this point suggest that we look to purchasing an Office Add-In that will allow you to continue with our current method of producing the financials and thus reduce impact on the business.

Works Cited

1. XBRL International. What is XBRL. [Online] [Cited: October 9, 2009.] http://www.xbrl.org/WhatIsXBRL/.

- Paul Steynberg

Thursday, October 8, 2009

Solvency II - Data Requirements

Overview of Solvency II

Solvency II is a piece of legislation (directive to be exact) adopted by the European Parliament on 22nd April 2009 and is a fundamental review of the capital adequacy regime for European insurers and reinsurers. Planned effective date is October 2012 and it aims to establish a revised set of EU-wide capital requirements, valuation techniques and risk management standards that will replace the current Solvency I requirements. (2).

The full text of this legislation can be found here. Powers have been granted to CEIOPS in order to produce consulting papers and to engage with the industry in order ensure uniformity and clarity.

IT Impact

A section of the proposed framework deals with Standards of Data Quality which is outlined in a consultation paper from CEIOPS, referred to as CP43. (3). Data, according to this paper, is used to refer to all the information which is directly or indirectly needed in order to carry out a valuation of technical provisions, in particular enabling the use of appropriate actuarial and statistical methodologies, in line with the underlying (re)insurance obligations, undertaking’s specificities and with the principle of proportionality. In the context of this Paper, data comprises numerical, census or classification information but not qualitative information. Assumptions are not regarded as data, but it is noted that the use of data is an important basis in the development of actuarial assumptions.

Whereas this Paper is focused on setting out advice in the context of a valuation of technical provisions, it is noted that the issue of data quality is also relevant in other areas of a solvency assessment, for example for the calculation of the Solvency Capital Requirement (SCR) using the standard formula or internal models.

From the materials I have been reading and through a breakfast (1) I attended here are some issues that I would like to flag up. The CP papers expand on the concept that data should be Accurate, Complete and Appropriate. In doing so they highlight the following potential gaps.

Data Governance
  • Most IT policies focus on security. Substantial changes to policies will be required in order to ensure that we focus on data quality.
  • As a group you will need to understand what data quality means within your context and how to measure it. This also requires documenting.
  • Clear responsibility versus ownership of the data is required which also requires documenting. In my view this would be split between the business units and IT.
  • You need to start looking at the technologies/tools required within your Group in order to ensure data quality.
Monitoring
  • The subtleties of the text indicate a focus on both transactional and non transactional data.
  • Both regular and adhoc types of data will require a degree of monitoring and appropriate controls in place to deal with both types.
  • A move to focusing away from the accuracy of the data but rather its relevance and content is required.
Documentation
  • Additional documentation required with regular updating.
  • Decisions to data quality deviations now require specific documenting and approvals.
In order to ensure accuracy, any data deficiencies should be rectified, with each adjustment justified and documented and should not overwrite the raw data. It was also hinted that a more detailed review of the data is required to ensure that it is valid and appropriate. In other words the fact that data made it from your store system through a multitude of layers to a warehouse does not absolve you from the responsibility of ensuring that the original data was correct. This, in reality, leads me to believe that data profiling and review of individual pieces of data with appropriate monitoring tools is required. This applies to both external and internal data.

Addressing the issue of data quality will go to the core of your IT department and if the hype is to be believed will require substantial investment in ensuring that your processes and tools are up to the task. This will have an impact on your IT strategy both from a technology and approach view.

Audit Impact

Article 46 of the Directive Consolidated Text referred to as Insurance and reinsurance (Solvency II)(recast) defines the responsibility of the Internal Auditors (IA) within the scope of Solvency II. (4). This article requires the usual stance of being independent and specifically requires that IA provide an effective internal audit function which includes an evaluation of the adequacy and effectiveness of the internal control system and other elements of the governance system. (This gets a bit circular as the resolution passed under (18a) says that the governance system includes the risk management function, the compliance function, the internal audit function and the actuarial function, which implies that they audit the same system that they are part off).

CP33 from CEIOPS deals with the advice on Governance and makes reference to the functions of Internal Audit. (5).

Although the requirements do not appear to be any different from any IA business as usual functions they are advised to submit a written report on its findings to the administrative or management body at least annually. It makes sense then to assume that IA have a full understanding of the Solvency II requirements.

Conclusions

Solvency II is going to have quite an impact on the way we do business as an Insurance Company and is something that should be driven from the highest possible position in the company as it will require substantial resourcing. This initiative should not be driven from within one department.

Works Cited

1. Addressing the data and technology challenges of Solvency II. PriceWaterhouseCoopers. London : s.n., 2009.
2. Financial Services Authority. Insurance Risk Management: The Path to Solvency II. s.l. : Financial Services Authority, 2008.
3. CEIOPS. Consultation Paper No.43 - Technical Provisions - Article 85 f Standards for Data Quality. Frankfurt : CEIOPS, 2009.
4. European Parliment. Insurance and reinsurance (Solvency II) (recast). Strasbourg : s.n., 2009.
5. CEIOPS. Consultation Paper No.33 - Draft CEIOPS Advice for Level 2 Implementing Measures on Solvency II: System of Governance. Frankfurt : s.n., 2009.