Friday, June 26, 2015
The ‘data’ Jump off: Drilling down in Tableau dashboards using Action filters
Tableau never ceases to impress me. As a newbie still learning the ropes, I am always discovering a new thing that blows my mind.
During our monthly data review meeting, I realized that users were always asking for further details concerning the performance of a certain indicator. When displaying data from regions and comparing the performance of regions, questions are always being asked- show us the sites contributing to that performance or how has the region been performing over the years or the previous months?
Presenting data in Tableau has made a big difference for me and. I have always wondered why PowerPoint does not provide links to data for drilling down to data easily without leaving PowerPoint or the presentation. Thinking of this; I remembered how I can implement this drill down functionality in SSRS with sub reports and sub report parameters. I didn’t know how to do it in Tableau. I turned to Mr. Google.
Googling led me to a number of links including the Tableau knowledge base article here. But the most helpful information was what I found
Thursday, March 26, 2015
SQL SUM values with NULLS, COALESCE and you will see different results
I thought I have been writing many SQL queries to fully understand SQL. I was wrong. While generating data from our SQL Server database today, I wrote a certain query to get me total of new positive HIV clients at PMTCT per county. This is the query that I wrote.
SELECT County,[Reporting Month], sum([HV0201]) as 'Women tested at ANC', sum([HV0204])as 'Total women tested', sum([HV0205]) as 'Known Positive's at Entry', sum([HV0206]) as 'HIV+ at ANC', sum([HV0207]) as 'HIV+ at L&D', sum([HV0208]) as 'HIV+ at Postnatal', (sum([HV0206]))+(sum([HV0207]))+(sum([HV0208])) as 'Total New Positives', sum([HV0209]) as 'Total Positive', sum([HV0210]) as 'Total with known status', sum([HV0213])as 'NVP only Prophylaxis', sum([HV0214])as 'AZT+SdNVP Prophylaxis', sum([HV0215])as 'Interrupted HAART', sum([HV0216]) as 'HAART', sum([HV0217]) as 'Total Maternal Prophylaxis', sum([HV0220]) as 'Assessed for Eligibility', sum([HV0224]) as 'Infants tested', sum([HV0229]) as 'Infants HIV positive', sum([HV0244]) as 'Total Infants issued with Infant Prophylaxis' from vw_PMTCTData_RetainedSites group by County,[Reporting Month]I was querying a View that I had created in SQL Server. I obtained the results and proceeded to paste them (with headers) to an Excel spread sheet to share with the relevant person. This is how the sheet looked like. My summation returns NULL when any one of the three Indicators - 'HIV+ at ANC','HIV+ at L&D' and 'HIV+ at Postnatal' is NULL. This is not what I wanted. You get to know the Total women who are positive by adding the Women who knew their status as HIV positive when they got pregnant with the new women who have discovered that they are HIV positive after being tested at either the ANC clinic(when still pregnant), during Labour and delivery or at testing positive at the Postnatal care clinic. These are called the 'New positives' So from my results, the results for Congo and Yukon counties are wrong since for Congo, the value for HIV + at LD was NULL while for Yukon, the value for HIV+ at Postnatal was NULL. To sort this out, you use the COALESCE function. The new T-SQL statement will look like this
SELECT County,[Reporting Month], sum([HV0201]) as 'Women tested at ANC', sum([HV0204])as 'Total women tested', sum([HV0205]) as 'Known Positive's at Entry', sum([HV0206]) as 'HIV+ at ANC', sum([HV0207]) as 'HIV+ at L&D', sum([HV0208]) as 'HIV+ at Postnatal', (sum([HV0206]))+(sum([HV0207]))+(sum([HV0208])) as 'Total New HIV+', COALESCE(sum([HV0206]),0)+COALESCE(sum([HV0207]),0)+COALESCE(sum([HV0208]),0) 'Total New HIV+_COALESCE', sum([HV0209]) as 'Total Positive', sum([HV0210]) as 'Total with known status', sum([HV0213])as 'NVP only Prophylaxis', sum([HV0214])as 'AZT+SdNVP Prophylaxis', sum([HV0215])as 'Interrupted HAART', sum([HV0216]) as 'HAART', sum([HV0217]) as 'Total Maternal Prophylaxis', sum([HV0220]) as 'Assessed for Eligibility', sum([HV0224]) as 'Infants tested', sum([HV0229]) as 'Infants HIV+', sum([HV0244]) as 'Total Infants issued with Infant Prophylaxis' from vw_PMTCTData_RetainedSites group by County,[Reporting Month]I pasted my new result set into Excel with the results of the two summations side by side. The summation using the ordinary SUM ( shown in Yellow)shows the Total New HIV positive for Congo and Yukon regions as NULL. The summation using the COALESCE function (shown in green color) shows the Total New HIV positive for the two regions as 15 and 27 respectively. This shows that using bare SUM in T-SQL can return you what you don't expect, least to say the wrong answer when one of the variables in the SUM has nulls in it. To avoid this, you should always use the COALSCE command in SQL Summations. This more so applies to databases where strict controls were not effected and NULL values end up getting stored. I have always said that there is a different between a NULL and a 0. That itself is a post for another day
Subscribe to:
Posts (Atom)