Thursday, March 31, 2016



Blog 4, Presentation and Visualization Methods

In a world where there is an abundance of data, there is an ever increasing need to be able to represent that data in an easy to understand, actionable format. In order to best target and service customers, there are many different methods that businesses can use to help employees and management make informed decisions about their business. In this blog I will be looking at dashboards from three different vignettes- sales, accounting, and transportation.

Sales
The following dashboard gives some very useful visualizations. There are planned vs actual sales, sales by region, and top products to name a few. Though this visualization is very useful, it could be made more useful by adding data such as:
  • Top customers by total revenue
  • Top customers by product
  • Regional planned vs actual sales
  • A top salesperson section if the company wants to make a competition
    • People are naturally competitve so adding some sort of competition visualization may drive salespeople to sales harder

See below for an example dashboard from http://blog.jinfonet.com/wp-content/uploads/2014/02/Sales-Dashboard-Example-2.jpg. The visualization tips mentioned above apply to the dashboard below.


Accounting
A well-functioning accounting department is crucial to the financial wellbeing of an organization. By creating useful dashboards for accounting, businesses can leverage the skills of their employees to a higher extent by presenting them with meaningful data instead of making them mine it out of reports or a database. The dashboard below has information such as year to date revenue, account balances, items on order, and revenue vs. payroll expense to name a few. In addition to the visualizations present in the example dashboard, the following may want to be considered:
  • Current assets
  • Current liabilities
  • Operating expenses
    • Can be broken down by specific time periods(day, week, month, etc.)

Customers with the highest balances owed
Departments with the highest outstanding balances
The dashboard below is taken from https://i.ytimg.com/vi/6mwWwzyX_kg/maxresdefault.jpg.

Telecommunications
Last but not least, a telecommunications company would benefit greatly from a dashboard for its representatives. There are significant costs associated with running a telecommunications company and by visualizing their greatest profit and cost areas, an organization can work to maximize profits and reduce costs. Looking at high level cost and revenue data, the company can then flow some of the relevant information down to the sales department but this dashboard is from more of an operational perspective. Areas of improvement could be:
  • Top infrastructure devices by cost
  • An electricity usage visualization to help them pinpoint the most inefficient devices in their network
  • Average customer support by inquiry type
  • Highest profit margins by customer type.

The dashboard below was taken from http://dashflows.com/ww2/wp-content/uploads/2014/11/CTT-Wireless.png.


Conclusion
Visualizing the data for an organizational department helps the employees of that department make more informed decisions about activities in the business. In turn, that helps them service customers better and give them the best experience possible. Good dashboards and visualizations allow companies to operate more smoothly and with the data available in today’s world, it only makes sense to make sense out of the data.

***This blog was submitted for grade and not to be taken as a professional recommendation***

Thursday, March 3, 2016

Blog 3:Structured vs Unstructured Data

Data Overview
Structured Data is data that is represented in a database, xml, csv, etc. It is easy for machines to process and allows for computations to be run on the data to enrichen it or make it more meaningful. When data is both structured and formatted, it can be easily loaded into databases or data warehouses for queries and processing.(1)
Unstructured Data, on the other hand, is this blog for example. Usually data that is stored in human readable format that is easy for us to understand but very difficult or impossible at times for a computer to understand. Can be analyzed by computers via parsers and such, but it is much easier for a human to put data into a structured format than it is for computers to take unstructured data and turn it into something structured.











(2)






Data Types
Next, I will discuss three different types of data that are frequently seen in business. Communication data for the most part is unstructured but the metadata can be structured. Transactional data is mostly structured and finally, log data can be structured as well.  The graphic below highlights that the three big sources of big data are transactions, emails(communications), and log data. These three data types will be discussed in greater detail below.



















(3)

Communication(email) data is largely unstructured. It can be emails, text messages, phone calls, or video calls. Even chatting with your bro Jim in the hall about the game last night is communication. The actual communication itself is unstructured and difficult to process, but records of communication can be aggregated in a structured way. How frequently and for how long people communicate in business can very well be loaded into a structured format. In most large organizations, employees sign waivers allowing the organization to track communication data. It can be loaded into a data warehouse and trends in communications can be analyzed for investigation or business trends.


Transactional data by nature is the most structured of the three data types being discussed. Transactions are almost always tracked in a database and hold customer, supplier, product, and sale data. All of this data can be easily loaded from a transactional database into a data warehouse for processing and analyzing metadata and trends. When in a transactional database, data is more than likely in 3rd or 4th normal form. The goal in a data warehouse is fast processing of large data sets and normalization often slows this process so it is better to flatten data when loading the data into a data warehouse.
Log data takes on many forms and big businesses generate tremendous amounts of logs every day. Logs can be collected from operating systems, applications, servers, databases, networking devices, and many other sources. Although this data is uniform for many operating systems, it is often unstructured. Many organizations such as Splunk make log aggregators that parse logs into structured formats. From there, log data can be analyzed on many different levels.


Data Warehouse Limitations



















(4)





Next I will discuss the difficulties or limitations of data warehouses when discussing different types of data. As you can see in the above image, data takes many different forms and is collected from many different locations in an organization. One limitation of a data warehouse is more an issue in the actual data that makes the data warehousing process very difficult: non-uniform data. When you have data coming from multiple sources, the likelihood that all data is uniform is unlikely and can cause performance issues in a data warehouse in the ETL process. Another limitation is the sheer amount of data. Especially with large organizations, they can accumulate terabytes of data every day and need a way to archive data in order to make sure that their data warehouses are performing adequately. The quality and amount of data and can be limiting to the effectiveness of data warehouses and their ability to run analysis of data in a reasonable amount of time.

Where Data Warehouses are Headed
In my opinion, data warehouses will be leveraged to make macro predictions based on much more micro data. As our ability to process more quickly and efficiently with physically smaller devices advances, we will be able to aggregate much larger data sets and run analysis on relationships between data in ways we thought were unimaginable years ago. Rather than simulate the economic outcome of events, we will be able to store micro data on such a large scale that we will be able to accurately model macro levels economic change. With this, our ability to accurately predict how changes in GDP and small production changes will have a local, state, and country impact. Increasing the ability to compute with more efficiency and store data in smaller physical formats allows us to analyze data on a very large scale.


(4)    http://hadooptutorial.info/wp-content/uploads/2014/12/Data-ware-house-environment.png

***This blog was submitted for grade and not to be taken as a professional recommendation***