The 18th ACM Symposium on Document Engineering
August 28, 2018 to August 31, 2018
Halifax, Nova Scotia, Canada
The list of accepted papers is available here.
|Workshops and Tutorials:||Aug. 28, 9:00am - 6:30pm|
|Reception:||Aug. 28 (evening)|
|Main program Start:||Aug. 29, 8:30am|
|Keynotes:||Aug. 29 and Aug. 30 @ 9:00am|
|Banquet:||Aug. 30 (evening)|
|Main program End:||Aug. 31, 12:30pm|
Schedule of Workshop/Tutorial Day (August 28)
Tutorial: T1: Automatic Text Summarization and Classification, Rafael Dueire Lins (Universidade Federal de Pernambuco & Universidade Federal Rural de Pernambuco, Brazil), Steven John Simske (Colorado State University, USA).
Workshop: W1: DChanges 2018 - Document Changes: Modeling, Detection, Storage and Visualization. Gioele Barabucci (University of Cologne, Germany), Uwe Borghoff (Bundeswehr University Munich, Germany), Angelo Di Iorio (University of Bologna, Italy), Ethan Munson (University of Wisconsin-Milwaukee, USA), Sonja Schimmler (Fraunhofer FOKUS Berlin, Germany ).
Tutorial: T2: Visual Text Analytics: Techniques for Linguistic Information Visualization, Mennatallah (Menna) El-Assady (Universität Konstanz, Germany & University of Ontario Institute of Technology, Canada)
|Can Deep Learning Compensate for a Shallow Evaluation?
Department of Computer Science,
University of Toronto
The last ten years have witnessed an enormous increase in the application of ``deep learning'' methods to both spoken and textual natural language processing. Have they helped? With respect to some well-defined tasks such as language modelling and acoustic modelling, the answer is most certainly affirmative, but those are mere components of the real applications that are driving the increasing interest in our field. In many of these real applications, the answer is surprisingly that we cannot be certain because of the shambolic evaluation standards that have been commonplace --- long before thedeep learning renaissance --- in the communities that specialized in advancing them. This talk will consider three examples in detail: sentiment analysis, text-to-speech synthesis, and summarization. We will discuss empirical grounding, the use of inferential statistics alongside the usual, more engineering-oriented pattern recognition techniques, and the use of machine learning in the process of conducting an evaluation itself.
Gerald Penn is a Professor of Computer Science at the University of Toronto. His research interests are spoken language processing and mathematical linguistics. He is a senior member of IEEE, a senior member of AAAI, a past president of the ACL Special Interest Group on the Mathematics of Language, and has led numerous research projects funded by, among others, Avaya, Bell Canada, CAE, the Connaught Fund, Microsoft, NSERC, the NSF, the German Ministry for Training and Research, SMART Technologies, SSHRC, the U.S. Army and the U.S. Office of the Director of National Intelligence.
|The Quest for Total Recall
Gordon Cormack & Maura Grossman
University of Waterloo
The objective of high-recall information retrieval (HRIR) is to identify substantially all information relevant to an information need, where the consequences of missing or untimely results may have serious legal, policy, health, social, safety, defence, or financial implications. To find acceptance in practice, HRIR technologies must be more effective - and must be shown to be more effective - than current practice, according to the legal, statutory, regulatory, ethical, or professional standards governing the application domain. Such domains include, but are not limited to, electronic discovery in legal proceedings; distinguishing between public and non-public records in the curation of government archives; systematic review for meta-analysis in evidence-based medicine; separating irregularities and intentional misstatements from unintentional errors in accounting restatements; performing ''due diligence'' in connection with pending mergers, acquisitions, and financing transactions; and surveillance and compliance activities involving massive datasets. HRIR differs from ad hoc information retrieval where the objective is to identify the best, rather than all relevant information, and from classification or categorization where the objective is to separate relevant from non-relevant information based on previously labeled training examples. HRIR is further differentiated from established information retrieval applications by the need to quantify ''substantially all relevant information''; an objective for which existing evaluation strategies and measures, such as precision and recall, are not particularly well suited.
Gordon V. Cormack is a Professor in the David R. Cheriton School of Computer Science at the University of Waterloo, in Ontario, Canada. Prof. Cormack’s research and consulting activities focus on high-stakes information retrieval, including technology-assisted review, records management, quality assurance, and evaluation methodology. Prof. Cormack has published more than 100 scientific articles, including Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, which was published in the Richmond Journal of Law and Technology in 2011, and has been widely cited in case law, both in the U.S. and abroad. Prof. Cormack is co-author of Information Retrieval: Implementing and Evaluating Search Engines (MIT Press, 2010, 2016), which received Honorable Mention, 2010 American Publishers Award for Professional and Scholarly Excellence (PROSE) in the Computing and Information Sciences category. Prof. Cormack is a program committee member of The Text Retrieval Conference (TREC) at the National Institute of Standards and Technology (NIST) and has served as coordinator of its Spam Track (2005-2007), Legal Track (2010-2011), and Total Recall Track (2015-). Prof. Cormack is the co-inventor of the Continuous Active Learning (CAL) protocol, which he has used to conduct technology-assisted review for dozens of Fortune 500 clients.
Maura R. Grossman, J.D., Ph.D., is a Research Professor in the School of Computer Science at the University of Waterloo, and an Adjunct Professor both at Osgoode Hall Law School of York University and the Georgetown University Law Center. She also is Principal at Maura Grossman Law, an eDiscovery law and consulting firm in New York. Previously, Maura was Of Counsel at Wachtell, Lipton, Rosen & Katz, where, for 17 years, she advised the firm’s lawyers and clients on legal, technical, and strategic issues involving eDiscovery and information governance, both domestically and abroad. Maura is a well-known and influential eDiscovery lawyer. Her scholarly work on TAR, most notably, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, published in the Richmond Journal of Law and Technology in 2011, has been widely cited in case law, both in the U.S. and elsewhere. Her longstanding contributions to eDiscovery technology and process were featured in the February 2016 issue of The American Lawyer and the September 2016 issue of the ABA Journal, where she was recognized asa “Legal Rebel.” In 2017, Maura was one of ten additions to the American Bar Association’s list of Women in Legal Tech;was named as one of the FastCase 50, which honors “the year’s smartest,most courageous innovators, techies, visionaries, and leaders in the law”; was honored by ACEDS and Women in eDiscovery as one of the “women who have served as pioneers and innovators in eDiscovery and legal technology”; and was appointed to the Balsillie School of International Affairs Artificial Intelligence and Human Rights Advisory Board. Maura has served as a court-appointed special master, neutral/mediator, and eDiscovery expert to the court in multiple high-profile federal cases, and has also taught courses in eDiscovery at Columbia, Pace, and Rutgers–Newark law schools. In addition to her J.D. from Georgetown, Maura also holds M.A. and Ph.D. degrees in Psychology from the Derner Institute at Adelphi University.