big Data-Enabled Analysis of data: Network DEA

Data envelopment analysis (DEA) should be viewed as a method (or tool) for data-oriented analytics. DEA is a data-driven tool for performance evaluation and benchmarking. While computational algorithms have been developed to deal with large volume of data (decision making units, inputs, and outputs) under the conventional DEA, valuable information hidden in big data that are represented by network structures should be extracted by DEA. These network structures encompass a broader range of metrics that cannot be modelled by the conventional DEA. It is shown that network DEA is different from the standard DEA, although it bears the name of DEA and some similarity with the conventional DEA. Network DEA is big Data Enabled Analysis (big DEA) of data when multiple (performance) metrics or attributes are linked through network structures. These network structures are too large or complex to be dealt with by the conventional DEA. Unlike the conventional DEA that are solved via linear programming, general network DEA corresponds to nonconvex optimization problems. This represents opportunities for developing techniques for solving non-linear network DEA models.

Read the full paper: Zhu, Joe, DEA under big data: data enabled analytics and network data envelopment analysis, Annals of Operations Research, (in press)

More Information

What is Data Envelopment Analysis (DEA)? Data-oriented Analytics and more...

Data_oriented Analyics

Data Envelopment Analysis is a "balanced benchmarking"

Read our  Sloan Management Review  article (Vol. 54, No. 4, Summer 2013, 37-42)

Sherman & Zhu, Analyzing performance in service organization

translated and published in Harvard Deusto business review, No. 228 (2013), 6-15.

According to Cook, Tone and Zhu (2014), Data envelopment analysis: Prior to choosing a model, OMEGA, Vol. 44, 1-4.


From the title, Data Envelopment Analysis (DEA) is a methodology for analysizing data. Specifically, DEA is used to identify best-practices when multiple performance metrics or measures are present for organizations. Based upon Cook, Tone and Zhu (2003), although DEA has a strong link to production theory in economics, the tool is also used for benchmarking in operations management, where a set of measures is selected to benchmark the performance of manufacturing and service operations. In the circumstance of benchmarking, the efficient DMUs, as defined by DEA, may not necessarily form a "production frontier", but rather lead to a "best-practice frontier". For example, if one benchmarks the performance of computers, it is natural to consider different features (screen size and resolution, memory size, process speed, hard disk size, and others). One would then have to classify these features into “inputs” and “outputs” in order to apply a proper DEA analysis. However, these features may not actually represent inputs and outputs at all, in the standard notion of production. In fact, if one examines the benchmarking literature, other terms, such as “indicators”, “outcomes”, and “metrics”, are used. The issue now becomes one of how to classify these performance measures into inputs and outputs, for use in DEA.

In general, DEA minimizes “inputs” and maximizes “outputs”; in other words, smaller levels of the former and larger levels of the latter represent better performance or efficiency. This can then be a rule for classifying factors under these two headings. There are, however, exceptions to this; for example, pollutants from a production process are outputs, yet higher levels of these indicate worse performance. There are DEA models that deal with such undesirable outputs. In certain circumstances, a factor can play a dual role of input and output simultaneously.  For example, when evaluating the efficiencies of a set of universities, if one considers the numbers of Ph.D. students trained as outcomes from the education process, then this factor can rightly be viewed as an output. At the same time, however, Ph.D. students assist in carrying out research, and can therefore be viewed as a resource, hence an input to the process. In such cases, the user must clearly define the purpose of benchmarking so that such performance measures can be classified as inputs or outputs. In some situations, the DMUs may have internal structures, e.g., a two-stage process. For example, banks generate deposits as an output in the first stage, and then the deposits are used as an input to generate profit in the second stage. In this case, “deposits” is treated as both output (from the first stage) and input (to the second stage).

DEA can be viewed as a multiple-criteria evaluation methodology where DMUs are alternatives, and DEA inputs and outputs are two sets of performance criteria where one set (inputs) is to be minimized and the other (outputs) is to be maximized.

Under general benchmarking, the DEA score may no longer be referred to as “production efficiency”. In this case, we may wish to refer to the DEA score as a form of “overall performance” of an organization. Such “overall performance” can appear in the form of composite measure that aggregates individual indicators (inputs and outputs) via a DEA model. For example, composite measures (DEA scores) of quality indicators allow senior leaders to better benchmark their organization’s performance against other high-performing organizations

DEA is not a form of regression model, but rather it is a frontier-based linear programming-based optimization technique. It is meaningless to apply a sample size requirement to DEA, which should be viewed as a benchmarking tool focusing on individual performance. It is likely that a significant portion of DMUs will be deemed as efficient, if there are “too many” inputs and outputs given the number of DMUs. If the goal is to obtain fewer efficient DMUs, then one can use weight restrictions or other DEA approaches to reduce the number of efficient DMUs.

According to Cooper, L.M. Seiford and J. Zhu (2011), "Data Envelopement Analysis: Models and Interpretations", Chapter 1, 1-39,  in W.W. Cooper, L.M. Seiford and J. Zhu, eds, Handbook on Data Envelopment Analysis, 2nd edition, Springer, New York, 2011.

Data Envelopment Analysis (DEA) is a “data oriented” approach for evaluating the performance of a set of peer entities called Decision Making Units (DMUs) which convert multiple inputs into multiple outputs. The definition of a DMU is generic and flexible. Recent years have seen a great variety of applications of DEA for use in evaluating the performances of many different kinds of entities engaged in many different activities in many different contexts in many different countries. These DEA applications have used DMUs of various forms to evaluate the performance of entities, such as hospitals, US Air Force wings, universities, cities, courts, business firms, and others, including the performance of countries, regions, etc. Because it requires very few assumptions, DEA has also opened up possibilities for use in cases which have been resistant to other approaches because of the complex (often unknown) nature of the relations between the multiple inputs and multiple outputs involved in DMUs.

As pointed out in Cooper, Seiford and Tone (2007), DEA has also been used to supply new insights into activities (and entities) that have previously been evaluated by other methods. For instance, studies of benchmarking practices with DEA have identified numerous sources of inefficiency in some of the most profitable firms firms that had served as benchmarks by reference to this (profitability) criterion – but DEA has provided a vehicle for identifying better benchmarks in many applied studies. Because of these possibilities, DEA studies of the efficiency of different legal organization forms such as "stock" vs. "mutual" insurance companies, have shown that previous studies have fallen short in their attempts to evaluate the potentials of these different forms of organizations. Similarly, a use of DEA has suggested reconsideration of previous studies of the efficiency with which pre- and post-merger activities have been conducted in banks that were studied by DEA.

Since DEA was first introduced in 1978 in its present form, researchers in a number of fields have quickly recognized that it is an excellent and easily used methodology for modeling operational processes for performance evaluations. This has been accompanied by other developments. For instance, Zhu (2003, 2009) provides a number of DEA spreadsheet models that can be used in performance evaluation and benchmarking. DEA’s empirical orientation and the absence of a need for the numerous a priori assumptions that accompany other approaches (such as standard forms of statistical regression analysis) have resulted in its use in a number of studies involving efficient frontier estimation in the governmental and nonprofit sector, in the regulated sector, and in the private sector. See, for instance, the use of DEA to guide removal of the Diet and other government agencies from Tokyo as described in Takamura and Tone (2003).

In their originating article, Charnes, Cooper, and Rhodes (1978) described DEA as a ‘mathematical programming model applied to observational data [that] provides a new way of obtaining empirical estimates of relations - such as the production functions and/or efficient production possibility surfaces – that are cornerstones of modern economics’.

Formally, DEA is a methodology directed to frontiers rather than central tendencies. Instead of trying to fit a regression plane through the center of the data as in statistical regressions, for example, one ‘floats’ a piecewise linear surface to rest on top of the observations. Because of this perspective, DEA proves particularly adept at uncovering relationships that would remain hidden from other methodologies. For instance, consider what one wants to mean by “efficiency”, or more generally, what one wants to mean by saying that one DMU is more efficient than another DMU. This is accomplished in a straightforward manner by DEA without requiring explicitly formulated assumptions and variations with various types of models such as in linear and nonlinear regression models.

1. Charnes, A., W.W. Cooper, and E. Rhodes, 1978, Measuring the efficiency of decision making units, European Journal of Operational Research 2, 429-444.
2. Cooper, W.W., Seiford, L.M. and Tone, K., 2nd ed. 2007, Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software, Kluwer Academic Publishers, Boston.
3. Takamura, T. and K. Tone, 2003, “A Comparative Site Evaluation Study for Relocating Japanese Government Agencies Out of Tokyo,” Socio-Economic Planning Sciences 37, 85-102.
4. Zhu, J. 2003, Quantitative Models for Performance Evaluation and Benchmarking: Data Envelopment Analysis with Spreadsheets and DEA Excel Solver, Kluwer Academic Publishers, Boston.
5. Zhu, J. 2009, Quantitative Models for Performance Evaluation and Benchmarking: Data Envelopment Analysis with Spreadsheets, 2nd Edition, Springer, New York.