<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Mentat]]></title><description><![CDATA[Mentat Innovations Blog - AI & Blockchain]]></description><link>https://blog.ment.at</link><image><url>https://substackcdn.com/image/fetch/$s_!g5zW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04dd1947-afea-4c18-ac51-b4ba12189525_660x660.png</url><title>Mentat</title><link>https://blog.ment.at</link></image><generator>Substack</generator><lastBuildDate>Wed, 20 May 2026 23:13:31 GMT</lastBuildDate><atom:link href="https://blog.ment.at/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Mentat Innovations]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[mentatinnovations@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[mentatinnovations@substack.com]]></itunes:email><itunes:name><![CDATA[George Cotsikis]]></itunes:name></itunes:owner><itunes:author><![CDATA[George Cotsikis]]></itunes:author><googleplay:owner><![CDATA[mentatinnovations@substack.com]]></googleplay:owner><googleplay:email><![CDATA[mentatinnovations@substack.com]]></googleplay:email><googleplay:author><![CDATA[George Cotsikis]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is Mentat.]]></description><link>https://blog.ment.at/p/coming-soon</link><guid isPermaLink="false">https://blog.ment.at/p/coming-soon</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Wed, 08 Mar 2023 17:50:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g5zW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04dd1947-afea-4c18-ac51-b4ba12189525_660x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is Mentat.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.ment.at/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.ment.at/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Crossing the blockchain chasm]]></title><description><![CDATA[Moving from visionaries to early enterprise users]]></description><link>https://blog.ment.at/p/crossing-the-blockchain-chasm-ff00a4ddaf3c</link><guid isPermaLink="false">https://blog.ment.at/p/crossing-the-blockchain-chasm-ff00a4ddaf3c</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Tue, 07 May 2019 12:27:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/3xGLc-zz9cA" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Insightful analytics is the secret sauce of our current data-driven hyperconnected economy to run effectively any kind of trade, no matter the business size, vertical or geography. The driving demand and hyperinflation of data in our reach, is due to the digitalisation of almost everything around us. Digitalisation is the essence of <a href="https://en.wikipedia.org/wiki/Digital_transformation">digital transformation</a>, which is the evolution that traditional brick and mortar enterprises are currently undergoing. The result is an ongoing demand for new and innovative digital solutions.</p><p>When it comes to blockchain technology there is not a lot more we can add to the myriad of articles on the internet. For completeness a simple intro can be viewed below.</p><div class="captioned-image-container"><figure><div id="youtube2-3xGLc-zz9cA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;3xGLc-zz9cA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/3xGLc-zz9cA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div></figure></div><p>We will not go into cryptographic monetary assets like <a href="https://en.wikipedia.org/wiki/Bitcoin">bitcoin</a> and surely we will not get involved in the bitcoin vs blockchain discussion. We believe there are specific use cases of digital transformation where trustless distributed databases like blockchain are an ideal solution and this is the focus of this post.</p><p>In the classic innovation work <a href="https://en.m.wikipedia.org/wiki/Crossing_the_Chasm">Crossing the Chasm</a>, the author begins with the diffusion of innovations theory and argues there is a <em>chasm</em> between the early adopters of the product (the technology enthusiasts and visionaries) and the early majority (the pragmatists). In the case of blockchain the early adopters are the users that started getting involved in the crypto world since the 2008 <a href="https://bitcoin.org/bitcoin.pdf?">genesis paper</a> of bitcoin up to late 2017, the peak of the blockchain hype bubble. We are now in the chasm phase in anticipation of the slower but deeper adoption of the technology by an early majority of enterprise users.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gFY8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gFY8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 424w, https://substackcdn.com/image/fetch/$s_!gFY8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 848w, https://substackcdn.com/image/fetch/$s_!gFY8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 1272w, https://substackcdn.com/image/fetch/$s_!gFY8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gFY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gFY8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 424w, https://substackcdn.com/image/fetch/$s_!gFY8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 848w, https://substackcdn.com/image/fetch/$s_!gFY8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 1272w, https://substackcdn.com/image/fetch/$s_!gFY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67409ee3-960a-4a6e-abb6-b627a00eb4f2_800x366.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>The hype that blockchain has generated the in the last few years seems to have steamed up the industry with various PoC projects in numerous different industries. In 2018 and 2019 we have already witnessed some of the experimental MVPs see the light of production and blockchain to become a vital part of the digital transformation strategy of different businesses. Like AI, Blockchain is high on the innovation agenda of all major enterprises.</p><p>There are public, private, protected or hybrid blockchain networks. Whilst the technical details and differences of each are not a focus here, the main gist of each is exactly the same as we have with the Internet when you compare it to Intranet, or LAN networks. Each of them has a very critical and very justified role to play in the technology ecosystem. And while public blockchains are receiving most of the media attention these days we feel hybrid networks are the future for most enterprise users.</p><blockquote><p>New technologies, are new pathways, that enable you to activate new revenue models, that otherwise you couldn&#8217;t even conceive.</p></blockquote><p>The main areas where blockchain&#8217;s impact will be most strongly felt are&nbsp;:</p><ul><li><p>Fintech&nbsp;: Banking, Payments, Capital Markets, Credit Scoring, Remittances, New Currencies</p></li><li><p>Identity&nbsp;: Personal Data, Medical Records, Authentication</p></li><li><p>Charity&nbsp;: Donations, ESG Investing, Project Monitoring</p></li><li><p>Asset Tokenisation&nbsp;: Democratising ownership and transferability of real tangible and intangible assets</p></li><li><p>Logistics&nbsp;: Supply chain, physical goods trade</p></li></ul><p>Among the plethora of use cases there is an urgent need for early adopters to experiment easily with blockchain solutions to develop a clear understanding of the cost-benefits involved within the context of a wider enterprise digital transformation. That experimentation brings its own challenges, namely in two main areas&nbsp;: infrastructure and analytics.</p><p>In terms of infrastructure, despite the rich open source ecosystem of the toolsets created by the vibrant crypto community there is a lack of easy integration within enterprise workflows. Blockchain-based integrations require a whole new architecture paradigm. This brings a new challenge to the IT operations the enterprise&nbsp;: gone is the simplicity of CI/CD, monitoring, logging, alerting, cloud native integrations, or serverless stacks. There is no support team to ask for an SLA, on-call hours, or DevOps cookbooks. To introduce anything that uses blockchain as part of the solution, the enterprise has to be prepared to throw enough investment to facilitate an experimentation and eventually production-ready infrastructure. Problem is&nbsp;&#8230; when you cannot fully understand the benefits how do you decide how much to invest&nbsp;?</p><p>Another issue at hand is the analytics required to understand the operational efficiencies of the blockchain experiments. Blockchain is all about trust. But how can you trust something, if there is no easy way to explore it and understand what is happening in the network and why&nbsp;? Blockchain networks hold raw data that answers questions such as who interacts with whom, how much value is being exchanged and when, how does every move in the network impact the other parts of the network and so on. The mining of the rich graph data embedded in the blockchain is paramount to the validation of any PoC experiment and eventual enterprise deployment and productisation.</p><blockquote><p>&#8220;I think it&#8217;s a technical tour de force&#8221;</p></blockquote><blockquote><p>- Bill Gates</p></blockquote><p><a href="http://www.ment.at">Mentat</a> guest blog post in collaboration with <a href="https://www.linkedin.com/in/evalon/">Evangelos Pappas</a>, CEO of <a href="https://ocyan.com/">Ocyan</a>, the enterprise ready cloud native blockchain platform.</p>]]></content:encoded></item><item><title><![CDATA[Shipping to the future]]></title><description><![CDATA[In a sea of data]]></description><link>https://blog.ment.at/p/shipping-to-the-future-39823b3c273d</link><guid isPermaLink="false">https://blog.ment.at/p/shipping-to-the-future-39823b3c273d</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Wed, 20 Mar 2019 08:55:37 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/63b4f3c9-67e2-47be-a6e8-87b7098a5016_800x450.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Df4E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Df4E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Df4E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Df4E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Df4E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Df4E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Df4E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Df4E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Df4E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Df4E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a8dea58-e9d5-400b-bd03-feb607059bc7_800x450.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Container Ship&#8202;&#8212;&#8202;<a href="https://www.cma-cgm.com/">CMA&nbsp;CGM</a></figcaption></figure></div><p>When I was working in the ship repairs industry we had a saying: if we could make the ship sail with half a propeller or even one fin, ship-owners would use it immediately. What was meant with this is that shipping is a business that is extremely cost sensitive and ship-owners create operational margins out of ruthless cost cutting. There are of course periods of high margins as the market goes up. Lots of money is made during these times. But often these good times are followed by troughs in the business cycle, when ships are sold, decommissioned, scrapped, slow-steamed and possibly mishandled to make the business sustainable.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lDKz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lDKz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 424w, https://substackcdn.com/image/fetch/$s_!lDKz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 848w, https://substackcdn.com/image/fetch/$s_!lDKz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 1272w, https://substackcdn.com/image/fetch/$s_!lDKz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lDKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a28741ca-093c-4af7-b49c-871560d6735b_785x566.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lDKz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 424w, https://substackcdn.com/image/fetch/$s_!lDKz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 848w, https://substackcdn.com/image/fetch/$s_!lDKz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 1272w, https://substackcdn.com/image/fetch/$s_!lDKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa28741ca-093c-4af7-b49c-871560d6735b_785x566.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://en.wikipedia.org/wiki/Baltic_Dry_Index">Baltic Dry&nbsp;Index</a></figcaption></figure></div><p>In particular employing ships in the spot market can have incredible revenue volatility. Spot markets, where vessels are chartered on daily floating rates are cases where free market conditions prevail and thus strategic advantages are quite short lived. Ship owners and operators are price takers and there is little room for negotiation. Consequently, ship-owners need to cut costs to make margins.</p><blockquote><p>Anything between brilliant strategic thinking to reckless business manoeuvring are used to save costs in seaborne trade operations.</p></blockquote><p>Shipping and related supply chains in general, have been operating for a long time in a regime of opaqueness and heavily constrained information flow. This is changing recently, not so much around the cost part, but in terms of opacity. Operations, under the pressure of regulators and charterers alike, are becoming more transparent. This puts a lot of pressure onto the supply chain and this is an irreversible fact. Think of the oil majors: chartering a crude carrier vessel that is subpar and results in an accident, is bad for everyone, catastrophic for the environment and ultimately severely damages the brand long term. Vetting of ships is one aspect but the broader operations of the principals are also of interest. Authorities on the other hand are tightening regulations on how ships can operate in different locations. The most vivid example is <a href="http://www.imo.org/en/mediacentre/hottopics/pages/sulphur-2020.aspx">IMO2020</a> whereby ships should ensure that their <a href="https://en.wikipedia.org/wiki/Sulfur_oxide">SOx</a> emissions are capped especially in emission control areas. Thirdly, the entire supply chain wants to optimize its operations and avoid uncertainty and associated bullwhip effects. Contracts are tightening and penalties are becoming more granular depending on the operation of a ship.</p><p>Under this pressure for transparency, the industry is obviously in transformation. All the stakeholders are seeking to optimize their operations beyond what was achievable by the great business minds of the past. Optimization requires precision, thus automation and most crucially data. Data flows are increasing and will continue to do so, as the ship and the supply chain interlace with a growing grid of sensors, IoT enabled devices, services and machinery. As the ships are sailing somewhere in solitude, satellites take up the load of conveying the data. Modelling and simulation are tasked with predicting the near future and providing guidance for routing and operations. The ship is also a big factory, packed with machinery and instruments that suffer breakdowns multiple times within a year. It also hosts a crew, that needs supplies, gets sick and&#8202;&#8212;&#8202;let&#8217;s be frank here&#8202;&#8212;&#8202;is sometimes anything between unreliable to reckless due to poor training. All this is happening in an adverse environment&nbsp;: nature is relentless and this is especially true for the open sea. Add to that the operation along with other ships of all sizes in ports, terminals or through straights, or in areas of the world where there&#8217;s political turmoil, piracy or difficult connectivity.</p><blockquote><p>It is not the ship so much as the skillful sailing that assures the prosperous voyage</p></blockquote><blockquote><p>George William Curtis</p></blockquote><p>The increase in data flows also introduces the dangers of cybersecurity. Hacking a ship is becoming a possibility. Not only are the connections vulnerable but also the ship&#8217;s systems. Moreover the crew has low IT literacy and can easily introduce malware. That creates a whole new area of potential innovation in a critical sector where awareness of cybersecurity is from non-existent to at best low.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MuaB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MuaB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 424w, https://substackcdn.com/image/fetch/$s_!MuaB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 848w, https://substackcdn.com/image/fetch/$s_!MuaB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!MuaB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MuaB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MuaB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 424w, https://substackcdn.com/image/fetch/$s_!MuaB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 848w, https://substackcdn.com/image/fetch/$s_!MuaB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!MuaB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651af2a8-0286-4219-84e7-a519ba124084_638x359.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://www.slideshare.net/GeorgePouraimis/maritime-cyber-security">Maritime cyber&nbsp;security</a></figcaption></figure></div><p>This vast, complex ecosystem lends itself to complex but substantial business opportunities given the addressable market. The shipborne commerce is a healthy 90% of global trade and will continue to be so. The seaborne trade is a 500 billion USD yearly market with over 50,000 merchant ships trading internationally, transporting every kind of cargo. The world fleet is registered in over 150 nations, and manned by over one million seafarers of virtually every nationality. Cargo ships are technically sophisticated, high value assets which can cost over 200 million USD to build in some cases.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XG8a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XG8a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 424w, https://substackcdn.com/image/fetch/$s_!XG8a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 848w, https://substackcdn.com/image/fetch/$s_!XG8a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 1272w, https://substackcdn.com/image/fetch/$s_!XG8a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XG8a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XG8a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 424w, https://substackcdn.com/image/fetch/$s_!XG8a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 848w, https://substackcdn.com/image/fetch/$s_!XG8a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 1272w, https://substackcdn.com/image/fetch/$s_!XG8a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3cb5c2e-7449-427b-8cfe-047129df9dcf_638x232.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There&#8217;s a massive gap between the need for technology and the actual state of deployed technology, and that&#8217;s where a substantial opportunity lies. The traditional model of the operators controlling almost every aspect of the ship is slowly but surely getting challenged. The ship is a floating bundle of disparate systems that streams data to a remote HQ location where operation specialists may or may not review it to take decisions. The amount of data is overwhelming, making constant data connectivity expensive not to mention difficult to achieve in remote areas. Furthermore, the model suffers from its capacity to process data on-site and provide guidance. Let&#8217;s draw a parallel to the automotive industry here. Consider the car that needs to emit all its information to a central authority in order to determine its health state and how to steer it. That would be unacceptable. Although the tolerable latency is higher for ships, this is still a major bottleneck. Furthermore, the decisions to be taken are much more complex and involve a balance between several variable classes: operational, maintenance, contractual just to name a few. The fleet could be seen as a distributed computing system where a lot of decisions are calculated on board, and different pieces of information are directed to the different stakeholders. In addition, operators and ship owners are concerned about the financial performance of fleets and subsets of fleets so further decisions for the planning can be taken on-shore using different KPIs than those on the ship and require the coordination among the ships. The ship is therefore a key component, but there are several management views, the operators, the ship-owners, the port/terminal authorities, the agents. There is a clear and urgent need for true edge compute and analytics capabilities that do not rely on a constant satellite uplink to the HQ servers. On the other hand, adaptive and unsupervised anomaly detection models can determine automatically if any data needs to be sent back to HQ outside infrequent predetermined datacomms windows.</p><p>A number of existing and future technology solutions, adapted to operate in a true edge environment can establish a smart ship IoT platform that replaces the guesswork of the &#8220;noon-report excel attachment&#8221; process. The current outdated model of once-daily manual data entry and macro driven upload of a small subset of metrics back to HQ will be transformed dramatically. Automating data collection, computing with appropriate real time machine learning models optimal courses of action, helping the crew optimize operations using AI, automatically parsing unstructured data in legacy systems using NLP, reducing substantially fuel usage and environmental footprint via adaptive optimization, paving the path to eventual maritime asset autonomy&nbsp;&#8230; the list is endless.</p><blockquote><p>Shipping is a massive global industry that is moving to new models of operational efficiency. Data will indeed be the new oil in the maritime technology whitespace.</p></blockquote><p><a href="http://ment.at">Mentat</a> guest blog post in collaboration with:</p><p><a href="https://gr.linkedin.com/in/dimitrisservis">Dimitris Servis&nbsp;</a>, PhD Naval Engineering</p>]]></content:encoded></item><item><title><![CDATA[Datastream.io scikit-learn integration]]></title><description><![CDATA[A few days ago we open-sourced our platform for anomaly detection in Python &#8212; you can read more about that here.]]></description><link>https://blog.ment.at/p/datastream-io-scikit-learn-integration-a019aa4b60be</link><guid isPermaLink="false">https://blog.ment.at/p/datastream-io-scikit-learn-integration-a019aa4b60be</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 12 Feb 2018 08:27:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g5zW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04dd1947-afea-4c18-ac51-b4ba12189525_660x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few days ago we open-sourced our platform for anomaly detection in Python&#8202;&#8212;&#8202;you can read more about that <a href="https://medium.com/@ment_at/datastream-io-open-source-anomaly-detection-64db282735e0">here</a>.</p><p>This post is focused on one feature of our framework: integration with scikit-learn. Sklearn is the flagship ML toolbox for Python, and growing by the day. To ignore their models and design patterns would be to reinvent the wheel.</p><p>So we have added a small example about how you can bring the full strength of scikit-learn to bear upon your detection problem, while still using dsio. Consider the following file, which you can find in the examples folder:</p><p><code>datastream.io/examples/lof_anomaly_detector.py</code></p><p>&#8220;Lof&#8221; stands for &#8220;Local Outlier Factor&#8221;, an old and well-tested technique for detecting anomalies in Euclidean space (although it can be generalised to any space for which you feel comfortable defining a distance metric). The basic idea of LOF is to identify points whose nearest neighbours are not so near, in comparison to other points in the dataset. This allows us to detect anomalies whose values might not look so abnormal when you compare them to the maximum and minimum values found in the dataset, but in truth they occupy an empty space somewhere in the middle, where no other data lives. For one-dimensional data it might be counterintuitive to imagine such anomalous gaps, but as the dimension of the data increases it becomes increasingly likely that your anomalies will not contain extreme values in all dimensions.</p><p>Scikit-learn contains an implementation of LOF, wonderfully explained <a href="http://scikit-learn.org/stable/auto_examples/neighbors/plot_lof.html">here</a>. However, the Sklearn framework itself does not contain an interface for anomaly detection: it only supports classification/regression (supervised learning) and clustering (unsupervised learning where the main objective is to assign datapoints to clusters, rather than to produce anomaly scores). This is by no means a crippling disadvantage: any clustering algorithm can be easily modified to produce an anomaly detector via ideas similar to LOF.</p><p>However, our proposed interface is a step forwards in recognising anomaly detection as a core data science problem category. We have followed sklearn design patterns in introducing it as a <code>Mixin</code> rather than an object, which means that you can use pretty much any class you want, as long as you introduce (or override if they exist already) the following methods:</p><p><code>fit, update, score_anomaly, flag_anomaly</code></p><p>Our <code>fit</code> function will be revised soon to follow sklearn conventions fully (currenty it only supports unidimensional input, so it diverges), so that you don&#8217;t have to worry about it. The <code>score_anomaly </code>function is probably the most important, as it produces the final output of the detectors, and the <code>flag_anomaly</code> function serves to produce a binary output if one is needed.</p><h3>The update&nbsp;function</h3><p>The <code>update</code> method present in <code>AnomalyMixin </code>is another key innovation of <code>dsio</code> whereby we request that all supported models feature a way to update their states given new data, rather than having to refit from scratch. We will be writing another post soon to delve deep into the world of model updates!</p>]]></content:encoded></item><item><title><![CDATA[Datastream.io : Open Source Anomaly Detection]]></title><description><![CDATA[We are proud to launch the very first version of our open-source project for Anomaly Detection and Behavioural Profiling on data-streams&#8230;]]></description><link>https://blog.ment.at/p/datastream-io-open-source-anomaly-detection-64db282735e0</link><guid isPermaLink="false">https://blog.ment.at/p/datastream-io-open-source-anomaly-detection-64db282735e0</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Tue, 30 Jan 2018 14:08:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/be69b03c-ee77-428a-9cf9-fb40acde06ac_600x440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We are proud to launch the very first version of our open-source project for Anomaly Detection and Behavioural Profiling on data-streams, <strong>datastream.io (<a href="https://github.com/MentatInnovations/datastream.io">dsio on github</a>)</strong>.</p><p>We have a long roadmap ahead of us, but, release often and release early, as they say. So here it is&#8202;&#8212;&#8202;a minimal viable full-stack Python anomaly detector:</p><pre><code>pip install -e git+https://github.com/MentatInnovations/datastream.io#egg=dsio</code></pre><h3>Features</h3><p>The purpose of the project is to perform the following functions:</p><ul><li><p><strong>Consume</strong> data from a variety of file and stream formats.</p></li><li><p><strong>Transform</strong> data streams on the fly to derive statistics of interest such as aggregations, counts, sessions, groupings, or extract features.</p></li><li><p><strong>Model</strong> the resulting stream via unsupervised machine learning to capture normal baseline behaviour either globally, or at the level of a device/user.</p></li><li><p><strong>Score</strong> every new event by comparing it to the baseline model.</p></li><li><p><strong>Visualise</strong> anomalous events on a lightweight customisable dashboard, with a lightweight back-end, involving minimal fuss by the user.</p></li></ul><p>In the spirit of a minimal first release, we start by supporting consumption from CSV files, filtered by column, a couple of basic modelling and scoring options, followed by visualisation via an Elastic-Kibana solution involving a dashboard which is auto-generated in accordance to the column names.</p><h3>Bring-your-own detector</h3><p>Those of you that read our <a href="https://blog.ment.at/datastream-io-4863db7286b7">previous post</a> know that we are about to unleash some pretty powerful anomaly detection models in this project. But like any open-source project, our main ambition is to create a platform. So for the first release, we have offered two basic example detectors (see below), as a template for you to build your own! All you need to do is support some basic interfaces, like a way to <strong>update </strong>your model, a way to <strong>train it from scratch </strong>(this addresses the cold start problem), and a way to <strong>detect </strong>anomalies, which often will often involve a threshold on a <strong>scoring </strong>function that numerically describes how likely each new event appears in comparison to the model.</p><p>You can try one of our own detectors from the command line like this:</p><p><code>dsio --detector gaussian1d examples/data/cardata_sample.csv</code></p><p>to run against a sample dataset comprising IoT measurements from a car. But if you&#8217;d like to write your own, just add your module and run instead:</p><p><code>dso --modules examples/detector.py --detector Percentile1D examples/data/cardata_sample.csv</code></p><p>Here is the result:</p><p>We are looking forward to your feedback and contributions! We will be adding exciting contributions from our friends and colleagues in UK academia and industrial partners.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pQIL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pQIL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 424w, https://substackcdn.com/image/fetch/$s_!pQIL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 848w, https://substackcdn.com/image/fetch/$s_!pQIL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 1272w, https://substackcdn.com/image/fetch/$s_!pQIL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pQIL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pQIL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 424w, https://substackcdn.com/image/fetch/$s_!pQIL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 848w, https://substackcdn.com/image/fetch/$s_!pQIL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 1272w, https://substackcdn.com/image/fetch/$s_!pQIL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e610305-9f6f-4dde-ae4f-2c41c5210dec_600x440.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Iv-V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Iv-V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 424w, https://substackcdn.com/image/fetch/$s_!Iv-V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 848w, https://substackcdn.com/image/fetch/$s_!Iv-V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 1272w, https://substackcdn.com/image/fetch/$s_!Iv-V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Iv-V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Iv-V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 424w, https://substackcdn.com/image/fetch/$s_!Iv-V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 848w, https://substackcdn.com/image/fetch/$s_!Iv-V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 1272w, https://substackcdn.com/image/fetch/$s_!Iv-V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c462c1-bf2c-436d-8043-d3d620c58dae_800x462.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">datastream.io in&nbsp;action</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dAlm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dAlm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 424w, https://substackcdn.com/image/fetch/$s_!dAlm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 848w, https://substackcdn.com/image/fetch/$s_!dAlm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 1272w, https://substackcdn.com/image/fetch/$s_!dAlm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dAlm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dAlm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 424w, https://substackcdn.com/image/fetch/$s_!dAlm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 848w, https://substackcdn.com/image/fetch/$s_!dAlm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 1272w, https://substackcdn.com/image/fetch/$s_!dAlm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0aae4d-9322-417a-bf99-bda87f17ebb6_800x456.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">datastream.io Kibana dashboard</figcaption></figure></div>]]></content:encoded></item><item><title><![CDATA[datastream.io]]></title><description><![CDATA[Robust Anomaly Detection at Scale]]></description><link>https://blog.ment.at/p/datastream-io-4863db7286b7</link><guid isPermaLink="false">https://blog.ment.at/p/datastream-io-4863db7286b7</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Sat, 02 Dec 2017 14:09:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f2b6d15c-770d-4b42-9ab5-327dc681c769_224x80.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Robust Anomaly Detection at Scale</p><p>One of the core competencies of the Mentat team has been anomaly detection, in particular unsupervised streaming data anomaly detection.</p><blockquote><p>Anomaly detection (also known as outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. These events may indicate network intrusions, industrial component failures, financial fraud or health problems.</p></blockquote><p>Classifying anomalies correctly and efficiently determines the usability and effectiveness of many algorithms. It is a horizontal technology that is core to data driven methodologies.</p><p>Googling for anomaly detection we find the Twitter R package with almost 2300 stars at github at the top. We were tempted to test that approach with a subset of our anomaly detection models for benchmarking on the same dataset. We have been working on a set of tools for scalable streaming anomaly detection under the project name <a href="http://www.datastream.io">datastream.io</a> or <strong>dsio</strong> for short.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5xIL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5xIL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 424w, https://substackcdn.com/image/fetch/$s_!5xIL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 848w, https://substackcdn.com/image/fetch/$s_!5xIL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 1272w, https://substackcdn.com/image/fetch/$s_!5xIL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5xIL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5xIL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 424w, https://substackcdn.com/image/fetch/$s_!5xIL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 848w, https://substackcdn.com/image/fetch/$s_!5xIL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 1272w, https://substackcdn.com/image/fetch/$s_!5xIL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa10b83f-e2a1-4969-8c74-843dadb5df3e_224x80.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption"><a href="http://www.datastream.io">datastream.io</a></figcaption></figure></div><p>Given the unsupervised nature of the problem in a streaming context we need to take into account the following complications:</p><ul><li><p>Without labels, we must rely on a model of normal behaviour.</p></li><li><p>All models of normal behaviour make assumptions about the data.</p></li><li><p>It is important to make these assumptions robust to common sources of variation.</p></li></ul><p><strong>Periods</strong></p><p>Periods are a common problem in anomaly detection: the natural fluctuation of the data means that it might not be as easy to detect local anomalies (for example, peaks that fall within the periods of the data). Overcoming this problem was a flagship property of the <a href="https://github.com/twitter/AnomalyDetection">Twitter AnomalyDetector</a> package:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wP1B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wP1B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!wP1B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!wP1B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!wP1B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wP1B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wP1B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!wP1B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!wP1B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!wP1B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57db4c8a-7cac-4f0f-a2e7-ff261caae61c_480x480.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Here we illustrate the anomalies detected by the Twitter package and our own detector, part of which will be open-sourced soon. Happily, they largely agree, and are both able to detect local outliers even when these lie within the normal range of the data.</p><p><strong>Period shifts</strong></p><p>However, anyone who has worked with real data will know that sometimes periods shift&nbsp;&#8230; This might not be true in calendar-driven events where periods are forced by day/week/month/year patterns, but, for example, in industrial IoT devices it is extremely common: deactivating a component for a little while, or pausing a drill, will cause an otherwise periodic signal to shift its period. Frequency-based methods such as Fourier Transforms are very confused by such behaviour. It turns out, so is Twitter&#8217;s method, if we shift the period mid-way by 1000 steps, and results in reporting a much greater number of anomalies, most of which are false alarms resulting from the introduction of bias into the estimate of the period by &#8220;pausing&#8221; the periodicity for a very short period of time, just once:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L5Et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L5Et!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!L5Et!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!L5Et!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!L5Et!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L5Et!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9e09067-9287-49fe-b331-feb8f9464858_480x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L5Et!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!L5Et!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!L5Et!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!L5Et!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e09067-9287-49fe-b331-feb8f9464858_480x480.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In contrast, our detector is not confused at the least and continues to flag the same anomalies.</p><p>Another way in which data can fool you is by the introduction of a trend. Although there are several techniques for de-trending time series, the point we are trying to make here is that a real-world generic anomaly detector should be robust to such disturbances without the user having to take special precautions according to the use case. Twitter&#8217;s detector is confused by the introduction of a trend, and ends up reporting no anomalies at all, whereas datastream.io remains stable:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!60W9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!60W9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!60W9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!60W9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!60W9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!60W9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!60W9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!60W9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!60W9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!60W9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb780ae9a-7b8a-4821-b08e-3f71ea51565b_480x480.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Yet another way to get confused is when the signal suddenly jumps to a new mean level&#8202;&#8212;&#8202;a very common occurrence in real-world data, like in the case of a web server which suddenly becomes more popular due to a successful marketing campaign. This shift is handled poorly by Twitter&#8217;s AnomalyDetector package, which results in removing all anomalies from its report, as an (exaggerated) reaction to the increased range of the time series. Yet again datastream.io remains stable:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e0T1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e0T1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!e0T1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!e0T1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!e0T1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e0T1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e0T1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!e0T1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!e0T1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!e0T1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8740429-30f1-42e9-83e4-bcc9c928bd39_480x480.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Conclusion</strong></p><p>Looking at this simple univariate problem its easy to see there is significant room for improvement in the anomaly detection space. We have argued elsewhere and continue to argue here that <strong>robustness </strong>is the secret sauce that makes machine learning methodology truly generic and hence usable in the real world. Put differently, it increases the ROI of any investment in machine learning projects, because it reduces drastically the biggest cost: the amount of data preprocessing, cleaning and custom modelling that the data scientist needs to do before they can deploy their favourite method. That&#8217;s what datastream.io is all about: robustness.</p><p>We will start open sourcing some components of the stack and we are looking to create a community around robust anomaly detection, bringing together startups, researchers and practitioners. Feel free to reach out to us <a href="mailto:info@ment.at">here</a> should you want to be involved.</p>]]></content:encoded></item><item><title><![CDATA[Machine Learning for Predictive Maintenance : From Physics to Data.]]></title><description><![CDATA[Assume you want to maintain constant pressure in a certain container. You have control of a valve that can pump more air into it, or&#8230;]]></description><link>https://blog.ment.at/p/machine-learning-for-predictive-maintenance-from-physics-to-data-ae1a094b3669</link><guid isPermaLink="false">https://blog.ment.at/p/machine-learning-for-predictive-maintenance-from-physics-to-data-ae1a094b3669</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Tue, 25 Apr 2017 16:25:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7b54603d-a4a6-4851-8b4c-615d7e882ea8_800x388.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Assume you want to maintain constant pressure in a certain container. You have control of a valve that can pump more air into it, or release some air. It sounds like a simple task: if the pressure is below the target pump some more air in, and if it&#8217;s above the target, release some&#8202;&#8212;&#8202;that should do it. Months pass, and your valve accumulates wear and tear: but you notice nothing, because your controller is doing its job, keeping the pressure constant&#8202;&#8212;&#8202;even if that means pumping a little more air each week to offset a small leak in the connection between the valve and the container. But the damage accumulates, and one day, something goes &#8216;crack&#8217; inside the valve, imperceptibly. No warning. The container loses pressure, the failure cascades downstream with critical systems failing, one after the other. Red alerts sound, the floor manager orders &#8216;all systems down&#8217; until the fault is diagnosed. The culprit is identified: it was the valve. Thankfully you have some extra ones stocked for emergencies. The engineers replace it as quickly as they can&#8202;&#8212;&#8202;they work overtime, the clock is ticking. One day later, you are back up.</p><blockquote><p>The new valve cost you less than 100 dollars, but the downtime set you back a million. Wouldn&#8217;t it be nice to have had some advance&nbsp;warning?</p></blockquote><p>This is the promise of Predictive Maintenance: early warnings for critical failures to avoid downtime. Despite what the media hype might like you to think, the problem is exceptionally hard. Our toy example of air in, air out, oversimplifies real-world industrial control systems, which typically feature a large number of variables that all inter-depend in complex, non-linear fashions. Engineers and physicists will list all these variables and their physical properties, and a few hundreds of pages of mathematics later, they come up with a set of equations that prescribe what actions need to be taken at any given second (indeed, millisecond) to ensure the target variables take the values that they need to take at all times, so as to maintain constant pressure in a container, move a robotic arm in a certain way, etc. But wear and tear is unpredictable: by definition, it pushes the system into a state other than the one originally assumed by the mathematicians and the physicists. Indeed, &#8216;worn&#8217; states are orders of magnitude harder to describe via physical modelling. Ask any physicist and they will confirm: describing the laws of motion of a bicycle is child&#8217;s play in comparison to describing the laws of motion of a bicycle with a nail stuck on its front tyre. In brief, the &#8220;physicist-driven&#8221; approach does not scale. Can we do better?</p><h3>Data</h3><p>Well, in the words of Lord Kelvin, &#8220;if you can&#8217;t measure it, you can&#8217;t improve it&#8221;. And this is precisely why Industry 4.0 is so important: it marks the switch from legacy technologies that involved largely hard-wired industrial systems with minimal data acquisition and data sharing abilities, to next-generation technologies featuring millisecond granularity remote sensing, with gigabit ethernet connections all the way from the IoT device to the cloud. And this is the foundation of the <a href="https://en.wikipedia.org/wiki/Industry_4.0">Industry 4.0</a> promise.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QnRu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QnRu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 424w, https://substackcdn.com/image/fetch/$s_!QnRu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 848w, https://substackcdn.com/image/fetch/$s_!QnRu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 1272w, https://substackcdn.com/image/fetch/$s_!QnRu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QnRu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32398804-d721-4ea8-a981-d20c428db1ab_800x388.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QnRu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 424w, https://substackcdn.com/image/fetch/$s_!QnRu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 848w, https://substackcdn.com/image/fetch/$s_!QnRu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 1272w, https://substackcdn.com/image/fetch/$s_!QnRu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32398804-d721-4ea8-a981-d20c428db1ab_800x388.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Industrial Revolutions (credit to <a href="http://www.allaboutlean.com/">Christoph Roser</a>)</figcaption></figure></div><p>This paradigm shift opens the door to altogether new approaches. It is known from other areas such as machine translation, that with sufficient data, machine learning is able to compete with or even outperform tedious substantive case-by-case theory.</p><blockquote><p>In the same way that machine learning coupled with huge multilingual online corpora managed to outperform decades of linguistic analysis and translation theory, we can expect that machine learning coupled with terabytes of continuous data streams from IoT devices will rapidly compete with physical modelling of each&nbsp;device.</p></blockquote><h3>The Stack</h3><p>Indeed, that was our expectation when we entered the <a href="https://www.techfounders.com/">TechFounders</a> program with the proposal to deploy our proprietary algorithms for anomaly detection on data streams against data from one of the largest German supplier of pneumatic and electrical automation components, <a href="https://www.festo.com/group/en/cms/index.htm">Festo</a>. Festo is launching at Hannover Messe its Festo Motion Terminal, an intelligent pneumatic automation platform replacing the functions of around 50 different individual components, a technical tour de force in &#8220;software defined hardware&#8221;. In 2016 Mentat used the new intelligent valve to develop and test AI-methods for condition monitoring.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tNBt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tNBt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tNBt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tNBt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tNBt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tNBt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tNBt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tNBt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tNBt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tNBt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F101744e9-082a-41d7-bee5-6a6dc50c02ab_800x418.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Our engine has four components:</p><ul><li><p>A data collector deployed on the device (in our case, a Raspberry Pi connected via an ethernet cable to the actual component) would take care of the ETL process of transforming raw data generated from the device&#8217;s sensors into messages that we can consume downstream. This component is our <em><strong>data ingestion engine.</strong></em></p></li><li><p>An anomaly detector sitting on-the-edge (in our case, on-the-Pi), would convert the raw data signals into what we refer to as a &#8220;feature vector&#8221;: this is simply a list of metrics that describe various properties of the raw signal. Just like the signal varies over time, so will its properties, so features must be computed anew every few seconds, using either a sliding window implementation or some form of exponential decay. This component is called the <em><strong>feature extraction engine.</strong></em></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ixvi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ixvi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ixvi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ixvi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ixvi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ixvi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ixvi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ixvi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ixvi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ixvi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F331fe083-d658-41a2-a4bd-bf7b634a1c23_549x417.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 1: an example of the feature extraction engine. A periodic step signal can be summarised using three metrics: the period, the amplitude or range, and the level of noise featured in between jumps. In a real-life process these might vary over time, so they need to be computed continuously.</figcaption></figure></div><ul><li><p>The features then are scored according to a probabilistic profile of what normal behaviour should look like. At the first instance of &#8220;abnormal&#8217; behaviour we raise a flag. We refer to this edge component as the <em><strong>decision engine, </strong></em>and it also forms part of our on-the-edge stack.</p></li><li><p>The data (appropriate compressed and summarised) are also uploaded in regular batches to a cloud instance of our <em><strong>learning engine.</strong></em><strong> </strong>This engine is responsible for profiling each device, and maintaining these profiles as more data accumulate. With every new data batch, the engine refreshes the device profiles, and if any of them changes significantly, it is deployed in the form of a &#8220;profile patch&#8221; to the IoT device.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IoKi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IoKi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IoKi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IoKi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IoKi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IoKi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IoKi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IoKi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IoKi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IoKi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0742e73b-596b-4eb7-bf44-5592d7f9e246_602x460.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 2: A visual summary of our architecture.</figcaption></figure></div><p>The end product was designed to closely match Industry 4.0 specs:</p><ul><li><p><strong>Alerts in real-time</strong>, without dependency on always-on cloud connections (if that sounds too stringent, consider a gas pipeline in the desert)</p></li><li><p><strong>High-Bandwidth, Low-CPU </strong><em><strong>edge</strong></em><strong> components</strong>: the raspberry Pi was connected via ethernet cable, but is nowhere near as powerful as our Amazon Cloud servers. Thankfully, it doesn&#8217;t need to be: once the device has been profiled, comparing its latest measurements against the profile is very cheap computationally (a little like going through a decision tree).</p></li><li><p><strong>Low-Bandwidth, High-CPU </strong><em><strong>cloud</strong></em><strong> components</strong>: we strived to compress the raw data signals as much as possible before we upload. This is critical because as the IoT proliferates, data production will once again exceed our abilities to store data, or in any case process it from cold storage. Indicatively, for a single device with 8 sensors per component and 8 components, each producing one measurement per millisecond, we obtained 250KB of data each second. This amounts to 21GB per day&#8202;&#8212;&#8202;and that&#8217;s just for one device. We expect to support thousands. We used two main techniques to achieve meaningful compression: feature extraction itself compresses by orders of magnitude, and can be followed by <em>active learning</em> techniques, whereby only parts of the raw signal that fail to conform with our device profile are communicated upstream, on the grounds that they are more to likely convey new information.</p></li></ul><h3>Context, context,&nbsp;context</h3><p>Anomaly detection solutions are often plagued by the &#8220;cry wolf&#8221; problem: they tend to raise too many alerts, so that eventually people stop paying attention to them, which renders them useless. The sophistication of the algorithms and the robustness of the implementation are critical in overcoming this problem, but a factor which is often disregarded is <em><strong>context</strong></em>: the human expert might be aware of information that the algorithm is not. In our use case, the &#8220;context&#8221; of the operation of a device comprises the environmental conditions in which it operates, as well as the operational mode it is in: what sort of task has it been assigned to perform? Needless to say that a machine learning algorithm tasked with profiling a robotic arm that has never before moved upwards will raise a big red flag the first time it sees the arm move upwards, much to the confusion of the human operator, who is of course aware that such a move is both possible and normal.</p><blockquote><p>We overcome this problem by allowing our profiles to be <em><strong>context-specific</strong></em>: along with the signal features, the learning and decision engines additionally receive data about the state and configuration the device is in, and condition the behavioural profile on this information.</p></blockquote><h3>Labels</h3><p>Our toolbox is designed to operate without the need for any labels: it is not a requirement that an expert should manually tag certain data segments as corresponding to &#8220;abnormal&#8221; behaviour, and tag everything else as &#8220;normal&#8221;. Ours is referred to as an <em><strong>unsupervised </strong></em>approach, in contrast to <em><strong>supervised </strong></em>approaches where the algorithm learns to discriminate between examples labelled as &#8220;normal&#8221; versus examples labelled as &#8220;abnormal&#8221;.</p><p>The benefit of an unsupervised approach is that it does not rely on expensive manual labelling, and is not overly focused on past examples of abnormal behaviour. This gives it better generalisation ability. By the same token, however, unsupervised techniques will exhibit some loss of accuracy in comparison to supervised techniques when it comes to examples of device faults that have been previously seen in the historical dataset.</p><p>In the case of predictive maintenance, it is possible to consider a number of common faults and try and reproduce them in a lab, so as to produce a labelled dataset. This is still an expensive process, but one that happens anyway by way of testing components for wear and tear. So in some sense, a certain amount of labelled data is available anyway.</p><p>This gave us the chance to switch to what is known as <em><strong>semi-supervised </strong></em>mode, where our profile is instantiated via a classifier that seeks to classify each signal as coming from a healthy device, a device experiencing a failure of a known type (i.e., one that has been seen before in the labelled dataset), and a device experiencing a failure of an unknown type (unsupervised). This covered all the requirements of our industrial partners, so the pilot was on.</p><h3>Results</h3><p>When we received the first data batch, I was actually under the weather. My team pinged me on Slack, and I crawled out of bed to my computer to check our first set of results. I actually had to get out of bed because &#8220;results&#8221; at Mentat is never a single number... Our company has inherited from its founders a masochistic discipline in using best practices when it comes to assessing performance. Our CEO, developed this habit while devising statistically-driven trading strategies, where overestimating predictive accuracy during backtesting could cost you dearly. I, on the other hand, spent much of my time in academia researching <a href="http://www.hmeasure.net">this very topic</a>. In any case, our results summaries always need a big screen to review.</p><p>But in this case, it didn&#8217;t. At a glimpse, we knew this was a huge success. A total of 14 different metrics all agreed that we had reached <strong>99% accuracy</strong> (no matter how you measured it) with just 5% of the data.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hnhQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hnhQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 424w, https://substackcdn.com/image/fetch/$s_!hnhQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 848w, https://substackcdn.com/image/fetch/$s_!hnhQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 1272w, https://substackcdn.com/image/fetch/$s_!hnhQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hnhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hnhQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 424w, https://substackcdn.com/image/fetch/$s_!hnhQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 848w, https://substackcdn.com/image/fetch/$s_!hnhQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 1272w, https://substackcdn.com/image/fetch/$s_!hnhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98b3b0b4-99b6-4b75-b673-c92abf453e01_334x391.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 3: when 14 different metrics think you can&#8217;t do much&nbsp;better.</figcaption></figure></div><p>Across several different modes of operation, and facing several different types of faults, the machine learning algorithms were (almost) always able to detect when something went wrong. Perhaps a little excess noise in a certain part of the signal, or a slightly earlier onset; or perhaps it was something more complex, like the fact that a spike in one of the 64 different sensor feeds was not immediately followed by a spike in another one, as it is usually does. Our feature engine could detect all of these different signatures&#8202;&#8212;&#8202;and a lot more, indeed, tens of thousands of them.</p><p>In truth, we were relieved to see our metrics rise up to 99% without any fine-tuning. A research paper might impress with accuracy even in the low 90s&nbsp;, but we knew full well that probability is a tough lover. If you are monitoring a single device over a period of a few days (as was the case in the historical dataset we were given), then 99% accuracy is more than sufficient. If, however, you monitor thousands of devices over months, or years, you had better start off with 99% accuracy, otherwise <em>alert fatigue </em>will kick in soon.</p><blockquote><p>Our results confirmed not so much that the problem was &#8220;solved&#8221;, but rather that it is &#8220;solvable&#8221; with our choice of technology: a combination of sophisticated machine learning algorithms, with built-in failsafes to handle temporal variation, and a lean edge-cloud architecture.</p></blockquote><h3>The Future of Industrial IoT</h3><p>The technical success of this initial proof of concept can only hint at the powerful implications for the future of the manufacturing industry.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gmoD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gmoD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 424w, https://substackcdn.com/image/fetch/$s_!gmoD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 848w, https://substackcdn.com/image/fetch/$s_!gmoD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 1272w, https://substackcdn.com/image/fetch/$s_!gmoD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gmoD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4703499e-fb8a-46c9-8130-82371054773c_800x373.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gmoD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 424w, https://substackcdn.com/image/fetch/$s_!gmoD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 848w, https://substackcdn.com/image/fetch/$s_!gmoD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 1272w, https://substackcdn.com/image/fetch/$s_!gmoD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4703499e-fb8a-46c9-8130-82371054773c_800x373.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Change brings opportunity and danger. The opportunities are obvious for manufacturers&nbsp;: better customer support, differentiation from the competition, software revenues via SaaS applications attached to their devices, better understanding of usage patterns, more uptime. The dangers are also pretty obvious&nbsp;: brands that have built a reputation of solid quality over generations will be competing with more agile manufacturers that can possibly be more aggressive in their cost plus pricing via creating intelligent revenue from software services.</p><blockquote><p>We feel there is a seismic shift which will happen in the manufacturing sector in the next few years. The cards will be dealt again and the game will have new&nbsp;rules.</p></blockquote>]]></content:encoded></item><item><title><![CDATA[Drones : The Flying IoT]]></title><description><![CDATA[The fusion of the IoT with Artificial Intelligence is driving a new Industrial Revolution. A key ingredient in this transformation is&#8230;]]></description><link>https://blog.ment.at/p/drones-the-flying-iot-47c741c51feb</link><guid isPermaLink="false">https://blog.ment.at/p/drones-the-flying-iot-47c741c51feb</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Fri, 26 Feb 2016 11:57:43 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7a390542-4a21-4826-bd44-0a8f07c917d4_480x480.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The fusion of the IoT with Artificial Intelligence is driving a new Industrial Revolution. A key ingredient in this transformation is autonomy: as learning machines gain sophistication and experience, they reach a point where they can be trusted to take decisions on their own, without direct or continuous human control. A great example of this can be found in the form of <a href="https://en.wikipedia.org/wiki/Unmanned_aerial_vehicle">drones</a>.</p><blockquote><p>At <a href="http://www.ment.at">Mentat</a> we view drones as an agile flying platform for advanced sensors, hence the &#8220;Flying&nbsp;IoT&#8221;.</p></blockquote><p>Consider the example of a remote wind farm that requires regular visual inspection of the wind turbines to detect and monitor cracks on the blades. It used to be necessary for an engineer to regularly conduct personal inspection visits to the site. A safer alternative would involve camera drones, remotely controlled from the ground by a qualified engineer. In this setup however the human remains the bottleneck: physical presence of a suitably qualified engineer/pilot is still required, a costly proposition.</p><p>An altogether superior solution would involve an autonomous drone, scheduled to perform a visual inspection of the entire farm at regular intervals, paying special attention to existing cracks, and optimising its flight path to take into account wind conditions, which can be particularly challenging in a wind farm scenario due to local vortices produced by the turning blades.</p><p>Multiple other use cases will be revolutionised by the advent of autonomous drones: inspecting forest areas for early detection of wildfires; surveillance in security deployments; collecting traffic statistics and managing first responders in urban environments; deliveries of products in an industrial or retail scenario, or critical supplies in an emergency management scenario. The list is endless and every day seems to bring another great use case for drones.</p><blockquote><p>However, autonomous does not and should not mean entirely unsupervised. Miscalculations or faults in a drone&#8217;s on-board logic might lead it to fail its objective, or even to become a liability. The question then becomes: as autonomous drone technology scales, what sort of monitoring technology is able to scale with it and ensure safety and quality control without introducing bottlenecks&nbsp;? Can we learn automatically from the trajectories and provide an intelligence layer for drones&nbsp;?</p></blockquote><p>The aim must be to minimise the human-to-drone ratio in any use case. One can envisage an alerting system that monitors all drones, prioritises them in terms of degree of concern, and asks for human input only in the most urgent cases.</p><p>Analogous alerting systems are already in place in other areas where software agents enjoy some degree of autonomy, such as robotic installations in manufacturing plants, and IT/cybersecurity. However, drones are an idiomatic case as they generate masses of live and fast geo-temporal data in the form of their 3D GPS tracks. Despite the abundance of GIS systems for storing such data, powerful solutions for analysing it are not available. This is a common theme in Big Data: easy to store, much harder to analyse&nbsp;! Furthermore analysing this data must happen at the drone level (at the edge of the network) rather than using a data downlink (i.e. sending data to an on-premise or cloud based server).</p><p>The current state-of-the-art involves a technique known as <a href="https://en.wikipedia.org/wiki/Geo-fence">geo-fencing</a>, where basically a specific hard coded area of space is manually specified in the monitoring platform, and if any drone escapes that area an alert is generated. This technique cannot scale. If one tries to optimise the geo-fenced area, the manual configuration step will in effect fix the drone trajectory, which stands in the way of drone autonomy (a good analogy to keep in mind is the difference between a self-driving car, and a tram which runs on predefined rails).</p><blockquote><p>The missing ingredient is a system designed to monitor intelligent agents. That system must itself be intelligent! Our solution is able to learn the preferred trajectories of each drone from the GPS tracks they generate, without any need for manual configuration.</p></blockquote><p>When that trajectory profile in geospatial or temporal terms is violated, an alert is generated. Such a system offers incredible flexibility. First, it decouples monitoring from drone configuration, so that if a drone is suddenly reassigned to a different task, the system will initially raise an alert but then quickly adapt to the new route without the need for reconfiguration by a human. Second, it is able to profile drones controlled by entirely separate systems or entities, as long as it&#8217;s able to catch a glimpse of their GPS tracks. The learning algorithms can detect abnormal behaviour both at the macro level (&#8220;this drone is heading into territory it has never accessed before&#8221;), and the micro level (&#8220;this drone is performing odd manoeuvres that might indicate loss of stability or malfunction&#8221;). Some of the common shortcomings of geo-fencing are naturally overcome by our platform: for example, we can detect abnormal direction and/or speed, not just location (i.e., when a drone is within the bounds of its normal trajectory, but is moving in the reverse direction than usual). We can also grasp manoeuvres that surveillance drones typically employ to cover an area (such as the lawnmower or spiral manoeuvres&#8202;&#8212;&#8202;see use case 2 related to agriculture).</p><p>We have paid all due respect to the great forerunner of our system: target tracking software, but extended that methodology massively. Broadly speaking, the mathematics that underlie classical target tracking solutions are exceptionally accurate in forecasting trajectories of ballistic objects (where an initial or constant force defines the trajectory of an object such as a missile), or for short-term forecasting of autonomous objects with known constraints on their manoeuvrability (consider the difference in a fighter jet&#8217;s evasive manoeuvre versus the agility of a bumble bee). To track drones one must instead use more flexible methodology, inspired by advances in machine learning.</p><p>Below we include two video demonstrations. We make use of a 3D Unity front-end coupled with our streaming machine learning engine. This allows us to perform live demos, but it also expresses our view that a VR front-end is the right choice here: human supervisors need to understand drone trajectories in the context of the physical terrain they are navigating, but a constant live video feed is unrealistic due to both battery limitations, bandwidth constraints and cybersecurity issues.</p><p>In the above video, we demonstrate the ability of the system to learn a 3D trajectory from scratch. The cones in the video indicate the track the user is expected to follow using the controller (as a way of helping the &#8220;pilot&#8221; visualize the track), but the system has no prior knowledge of that, which explains why every time the drone turns during its first time round the track an alert is generated. However, as the drone repeats the track, the frequency of alerts (show in the bottom left corner as a time percentage and visually in the bottom right corner) decreases dramatically. At the end of the video, an excursion of the drone outside its typical trajectory is immediately flagged as anomalous, even while the drone still remains at the interior of the tracks, where this departure would have been missed by a classical geo-fencing solution.</p><p>We also show here the ability of the algorithms to understand patterns, which is a common set of manoeuvres employed by drones when they are trying to cover an area. Above is a a real GPS track from a drone in an agricultural use case, where a so-called &#8220;lawnmower&#8221; pattern is employed to cover the area. Here we focus on a collision alert use cases, where one drone (depicted in red in the video) is attempting to cross a region which is currently being surveyed by another drone (depicted in green in the video). The trajectory of the drone is determined autonomously, depending on its objective, the weather conditions, its battery power, as well as potentially other more complex criteria. The objective of the monitoring agent is to forecast potential collisions between the two drones.</p><p>Clearly the challenge here is to understand the recurring lawnmower pattern, typical in precision agriculture cases. Although the pattern is quite clear to a human, it is a great challenge to automated forecasters&#8202;&#8212;&#8202;it switches from linear (during take-off) to a recurring pattern, it involves a 3D diagonal movement which renders it asymmetric, and is awash with small departures/delays caused by wind. Moreover, the use case requires us to pick up the pattern very quickly, after just one or two repetitions. These features and requirements virtually incapacitate any classical &#8220;periodicity detector&#8221; that can be found in off-the-shelf time series or target tracking packages.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gyss!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gyss!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!gyss!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!gyss!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!gyss!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gyss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/575afd5d-a794-44ad-8550-a54487a1f575_480x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gyss!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 424w, https://substackcdn.com/image/fetch/$s_!gyss!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 848w, https://substackcdn.com/image/fetch/$s_!gyss!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 1272w, https://substackcdn.com/image/fetch/$s_!gyss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F575afd5d-a794-44ad-8550-a54487a1f575_480x480.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Real Time Geospatial Trajectory Forecaster</figcaption></figure></div><p>The detector instead does a great job picking up up the repeating pattern very early. The smoothness and accuracy of the pattern improves over time as the drone settles into its pattern, and can extend over very long horizons without losing accuracy. This in sharp contrast to most commercial target tracking software that are only able to forecast long horizons when the target is exhibiting what is technically known as &#8220;second-order stationarity&#8221;, a practical translation of that would be that the &#8220;steering wheel and gas pedal&#8221; are held in a fixed period (constant angular and linear acceleration).</p><p>It&#8217;s important to note that we don&#8217;t raise an alert as soon as the two forecasts start intersecting. Our prediction is in fact 4D, since it takes into account time (i.e., the speed of the drones). Therefore, forecasts are allowed to overlap as long as the drones never occupy <em>the same space at the same time</em>. That is why there is an alert for one of several intersections between the light blue and light red forecasted trajectories&#8202;&#8212;&#8202;and indeed that is the only one which would have led to a collision, as is evident near the end of the video.</p><p>The action that would follow a collision alert depends on the use case. It could trigger a human override in a centralised control scenario, or interact with the software agents in the drones in a distributed control scenario to avoid potential collisions. In this video simulation the red drone avoid the collision by moving higher temporarily. This capability becomes particularly useful in scenarios of multiple drones in challenging environmental conditions.</p><p>We are particularly grateful to have had the support of <a href="https://www.ordnancesurvey.co.uk/">Ordnance Survey</a> (<a href="https://www.geovation.org.uk/">Geovation</a>) on the geospatial modelling side and of <a href="https://www.gov.uk/government/organisations/innovate-uk">InnovateUK</a> on the augmented &amp; virtual reality side to bring this higher risk feasibility study to fruition. We are working on a number of drone data (raw and sensor/imaging) related projects which we will share soon.</p>]]></content:encoded></item><item><title><![CDATA[Streaming Random Forest]]></title><description><![CDATA[As promised in our previous blog post (Flavours of Streaming Processing), in this post we report the performance of our weapon of choice&#8230;]]></description><link>https://blog.ment.at/p/streaming-random-forest-90d39277d71f</link><guid isPermaLink="false">https://blog.ment.at/p/streaming-random-forest-90d39277d71f</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 13:07:22 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8072a00d-98e9-4594-bfc3-b10f38ce4ecc_638x638.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As promised in our previous blog post (Flavours of Streaming Processing), in this post we report the performance of our weapon of choice when it comes to classification on data streams: the <em>Streaming Random Forest</em> (SRF).<br>&#8203;<br><a href="https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm">Random Forests</a> were introduced by Leo Breiman et al in 2001. They combine two of the most powerful ideas in classification, decision trees and bagging, to represent a decision rule as a kind of majority vote over a large number of different decision trees that are generated probabilistically.<br>However, Random Forests scale poorly with the size of the dataset. This makes them impractical in streaming contexts: similarly to nearest neighbours, Random Forests need multiple runs over the full data history in order to update their predictions whenever a new datapoint arrives. Eventually, no matter how fast your hardware, Random Forests will become slower than your data arrival rate; and no matter how big your memory, they will outgrow it, at which point you will have to scale out. This is now possible using tools like <a href="https://spark.apache.org/docs/1.2.0/mllib-ensembles.html">Apache Spark MLib</a> but the fundamental truth that the algorithm becomes slower over time remains true no matter how advanced the infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2G8x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2G8x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 424w, https://substackcdn.com/image/fetch/$s_!2G8x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 848w, https://substackcdn.com/image/fetch/$s_!2G8x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 1272w, https://substackcdn.com/image/fetch/$s_!2G8x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2G8x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2G8x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 424w, https://substackcdn.com/image/fetch/$s_!2G8x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 848w, https://substackcdn.com/image/fetch/$s_!2G8x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 1272w, https://substackcdn.com/image/fetch/$s_!2G8x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96846f23-96b2-4c5b-8b97-10f96c319939_638x638.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><strong>Everything Changes</strong></p><p>&#8203;So one reason why one should switch to online Random Forests is speed and scalability. But the other is that decision rules that were valid last week might be obsolete today. An immediate &#8220;hack&#8221; is to retrain your classifier in regular intervals, using a fixed number of the most recent observations. This is a great quick fix, but it suffers from two drawbacks:</p><ul><li><p>Your classifier is out-of-date most of the time (except just after retrains!)</p></li><li><p>The length of the window size is a critical parameter for performance.</p></li></ul><p>To demonstrate this latter issue, we run an experiment on a dataset knownn as <a href="http://moa.cms.waikato.ac.nz/datasets/">ELEC2</a> which has become a benchmark in the streaming data literature. Each record contains a timestamp, as well as four covariates capturing aspects of electricity demand and supply for the Australian New South Wales (NSW) Electricity Market from May 1996 to December 1998. Labels indicate the price change related to a recent moving average.</p><p>Below we report the error rate (lower is better) achieved by a sliding window implementation of random forests using the <a href="https://cran.r-project.org/web/packages/randomForest/index.html">randomForest</a> package in R, for 8 different window sizes (left group of bars in the Figure below). When the dataset is ordered by timestamp, the best performing window size is 100, on the lower end of the scale. This is classic case of &#8220;more data does not equal more information&#8221;: using 100 times more data (w=10,000 vs w=100) almost doubles (175%) the error rate!!<br>To drive the point home, we took the same data, but presented it to the classifier in random order so that it was no longer possible to take advantage of temporal effects. In this case, without any temporal effects, indeed the accuracy improved with larger window sizes.<br>&#8203;<br>The advantage that a well-calibrated streaming method can have over its offline counterpart in a streaming context is quite dramatic: in this case, the best-performing streaming classifier has an error rate of 12%, whereas a random forest trained on the entire dataset (minus 10% withheld for testing) achieves an error rate of 24%. <strong>A fraction of the data, double the accuracy&nbsp;!</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AePz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AePz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 424w, https://substackcdn.com/image/fetch/$s_!AePz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 848w, https://substackcdn.com/image/fetch/$s_!AePz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 1272w, https://substackcdn.com/image/fetch/$s_!AePz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AePz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AePz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 424w, https://substackcdn.com/image/fetch/$s_!AePz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 848w, https://substackcdn.com/image/fetch/$s_!AePz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 1272w, https://substackcdn.com/image/fetch/$s_!AePz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa95a39-0182-4387-bdc2-82bc1601be6b_680x680.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>So it seems that you can have your cake and eat it. Or can you? Well, you can only achieve this fantastic result if the window size is well-tuned. And that&#8217;s a tricky business&#8230; Set it too small, and you will have to learn everything from scratch every few observations. Set it too long, and obsolete patterns will start misleading you. And to make things work, a technique like random forest consists of a composition of thousands of decision rules, some of which may be getting obsolete faster than others.</p><p><em>Put simply, any model that runs on a live stream must be updated regularly, but there are fundamentally better ways to update a model than to rebuild it from scratch. A bit like a human, successful online learning algorithms recognise that they have a finite memory, and that some patterns &#8220;age&#8221; faster than others. The brute force of a sliding window cannot take you very far in real applications.</em></p><p><strong>The Mentat SRF</strong></p><p>Mentat&#8217;s Streaming Random Forest has three unique features:</p><ol><li><p>It makes use of the cutting edge in online learning techniques to stay up-to-date by letting patterns &#8220;age&#8221; at different rates.</p></li><li><p>It has a fixed memory footprint: a model update operation takes the same amount of time whether the data has seen 10 datapoints or 10 million datapoints.</p></li><li><p>It utilises an &#8220;ensemble-of-ensembles&#8221; techniques, by combining a streaming self-tuning implementation of random forests with a number of other, simpler but highly agile classifiers. These classifiers have the benefit that they can pick up on short-term linear trends much faster than a complex technique like a random forest, while allowing the latter to extract deeper patterns from the residual signal.</p></li></ol><p>Indeed, in the example above, the SRF tool achieves the same result as the best-performing window, without any manual tuning whatsoever. This performance is identical to the state-of-the-art in the literature (see a related publication by our Chief Data Scientist <a href="https://www.researchgate.net/profile/Nicos_Pavlidis/publication/262396201_Online_linear_and_quadratic_discriminant_analysis_with_adaptive_forgetting_for_streaming_classification/links/545ca0a20cf27487b44b98df.pdf">here</a>).</p><p>Empirical Study: Detecting MalwareThe Kaggle competition of 1999 introduced a benchmark dataset for classification on large datasets. It considered five types of traffic:</p><ul><li><p>Normal</p></li><li><p>Attack Type 1: Probe</p></li><li><p>Attack Type 2: Denial-Of-Service (DOS)</p></li><li><p>Attack Type 3: user-to-root</p></li><li><p>Attack Type 4: remote-to-local</p></li></ul><p>This is an example of a <em>multi-label classification</em> problem: we don&#8217;t simply need to detect malicious traffic, but want to know what type of attack it is in real-time, so that we can take automatic action.<br>It was revealed that Probe and DOS traffic can be distinguished from normal traffic by very simple methods, such as a nearest neighbours classifier&#8202;&#8212;&#8202;the <a href="http://cseweb.ucsd.edu/~elkan/clresults.html">winning method</a> only managed to improve accuarcy in these cases by a few percentage points. User-to-root and remote-to-local attacks were much more challenging, with the winning method achieving 92% and 87% error rate respectively&#8202;&#8212;&#8202;more than 8 out of 10 examples were misclassified in both cases.<br>&#8203;<br>Below we compare SRF to the winning method, in terms of the error rate (lower is better):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vPJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vPJp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 424w, https://substackcdn.com/image/fetch/$s_!vPJp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 848w, https://substackcdn.com/image/fetch/$s_!vPJp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 1272w, https://substackcdn.com/image/fetch/$s_!vPJp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vPJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6929476-0d82-4eca-a997-ce23902de1e1_710x710.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vPJp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 424w, https://substackcdn.com/image/fetch/$s_!vPJp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 848w, https://substackcdn.com/image/fetch/$s_!vPJp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 1272w, https://substackcdn.com/image/fetch/$s_!vPJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6929476-0d82-4eca-a997-ce23902de1e1_710x710.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>When it comes to DOS and Probe there wasn&#8217;t much to improve upon&#8202;&#8212;&#8202;the baseline method already achieved error rates below 5%. However, for remote-to-local and user-to-root, the &#8217;99 Winner improved upon the baseline significantly (red is lower than blue in the last two columns). SRF blows it out of the water, with less than half the errors for remote-to-local, and a 40% decrease for user-to-root. These improvements look even more impressive when one takes into account two things:<br>&#8203;</p><ul><li><p>SRF was applied out-of-the-box to this dataset, whereas the 99 winner was bespoke.</p></li><li><p>SRF maintains a fixed memory footprint</p></li></ul><p><strong>Description of the API</strong></p><p>The Mentat SRF has a very simple API, with three components:</p><ul><li><p>A config file which describes:</p></li><li><p>the number of input features, and their type: categorical (e.g., a colour, or profession) or numeric (e.g., height).</p></li><li><p>the number of labels (e.g., &#8220;yes/no&#8221; versus &#8220;malicious/DOS/probe/&#8230;&#8221;).</p></li></ul><p>Don&#8217;t worry, if you don&#8217;t provide a config file, we will guess.</p><ul><li><p>A predict call, which submits a new unlabelled datapoint, and returns its predicted label. Being mathematicians at heart we never just report the label alone, but also provide you with estimated probabilities, so you can know how confident we are in each prediction.</p></li><li><p>An update call, which submits to our API a new datapoint, its timestamp and label.</p></li></ul><p>To make it easier for you we can also accept multiple datapoints in one request. Our webservice for both predict calls is extremely fast and scalable, and our premiums service can support even the most demanding real-time applications, via Amazon Kinesis.</p><p>To sum up:</p><ul><li><p>no need for classifier selection and tuning (window size or otherwise)</p></li><li><p>no need for you to host the decision engine for your web app</p></li><li><p>no need for you to monitor the performance of your model and decide when and how to retrain it&#8202;&#8212;&#8202;our self-tuning engine monitors performance and ensures it remains optimal at all times</p></li><li><p>no need for you to scale out as you get more data</p></li></ul><p>For the enthusiasts out there, we can expose more control parameters, as well optionally return the updated decision rule as an XML file after an update call. We can also expose monitoring parameters such as the recent error rate or AUC (although we prefer to measure classification performance by the <a href="http://www.hmeasure.net">H-measure</a> as well as a metric of how fast your data is aging.) If you want to be part of the private beta, please <a href="mailto:info@ment.at">get in touch</a>.</p><p><strong>Feature extraction and unstructured data</strong></p><p>Often datasets require a feature extraction step before they can be turned into the right format for classification. This step can be another crucial factor in performance. Our experienced data science team can support this on a consulting basis.</p><p>If your dataset or tasks involve unstructured data, such as free text, images, or networks, <em>do not despair!</em>. We can support feature extraction steps on all such data types, either using our own bespoke software or via integration with <a href="http://www.alchemyapi.com">Alchemy API</a>, as we are an <a href="http://www.ibm.com/smarterplanet/us/en/ibmwatson/">IBM Watson</a> Ecosystem Partner.</p><p><strong>What else is coming?</strong></p><p>Classification is a supervised technique: a bit like a child, the classifier needs to be shown labelled examples of the ground truth in order to extrapolate for the future. But children also learn by themselves, without explicit guidance by gradually developing an understanding of &#8220;normal behaviour&#8221; of their environment, and responding appropriately to any &#8220;anomalous events&#8221;.</p><p>This is also true of machine learning algorithms. For example, in cybersecurity, the most dangerous attacks come in the form of previously unseen types of threats (zero-day). A supervised classifier is by construction hopeless against such novel threats. Similarly, new types of fraud appear every day; new types of bugs are are introduced in software; and new customer behaviours can develop on a weekly basis.</p><p>When everything changes, profiling and anomaly detection is a necessary complement to agile, adaptive classification technology. This is why our core product, <em>datastream.io</em>, is designed to perform unsupervised profiling and anomaly detection at scale, as a service. Please stay tuned for our next blog post on this topic.</p><p><em>Originally published at <a href="http://www.ment.at">www.ment.at</a> on 10-Dec-2015</em></p>]]></content:encoded></item><item><title><![CDATA[Flavours of Streaming Processing]]></title><description><![CDATA[&#8203;In his excellent recent blog post &#8220;Streaming 101&#8221;, Tyler Akidau made a great contribution to the streaming community by teasing apart&#8230;]]></description><link>https://blog.ment.at/p/flavours-of-streaming-processing-c472d2454e76</link><guid isPermaLink="false">https://blog.ment.at/p/flavours-of-streaming-processing-c472d2454e76</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 13:03:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/aa574e7d-8248-43f8-b961-84517285e0d9_375x75.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#8203;In his excellent recent blog post &#8220;<a href="http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html">Streaming 101</a>&#8221;, Tyler Akidau made a great contribution to the streaming community by teasing apart certain notions that have been confounded by standard albeit increasingly obsolete practices. Within that same spirit of paving the ground for the streaming revolution, this blog post wishes to emphasise the observation that <strong>streaming does not necessarily mean approximate, </strong>with a particular focus on machine learning.</p><p>Let&#8217;s start with the simplest possible example, that of a linear sum: it can be easily computed in an incremental (exactly-once) processing manner:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q7o8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q7o8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 424w, https://substackcdn.com/image/fetch/$s_!q7o8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 848w, https://substackcdn.com/image/fetch/$s_!q7o8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 1272w, https://substackcdn.com/image/fetch/$s_!q7o8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q7o8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q7o8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 424w, https://substackcdn.com/image/fetch/$s_!q7o8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 848w, https://substackcdn.com/image/fetch/$s_!q7o8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 1272w, https://substackcdn.com/image/fetch/$s_!q7o8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c27541-3f7c-4b11-9bf7-bcf7e7566ccd_375x75.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>&#8203;The answer in this case is exact: it agrees precisely with a batch sum. Such examples are referred to as <strong>exact incremental </strong>or<strong> exact online </strong>processing.</p><p>However, not all queries can be computed exactly in an online fashion. A classical example is top-N, or a median. To understand why, it is best to define our terms: what does <strong>incremental</strong> or<strong> online </strong>really mean? Put simply, <strong>it means that the memory and time requirements of the process do not grow over time&#8202;</strong>&#8212;&#8202;despite the fact that the number of data entries processed does (since data streams are by definition semi-infinite&#8202;&#8212;&#8202;in Tyler&#8217;s terminology: unbounded).</p><p>Indeed, to maintain a sum, one only needs one to store one floating number in memory (the last value of the sum). To maintain an average, one needs to store two: the sum, and a count of how many data entries have been seen so far. However, to maintain a median, one needs to keep in memory an array whose size grows with the total number of data entries seen. Similarly for top-N. Consequently, the best we can do in a streaming context is offer approximate answers to queries such as a median, top-N, or a percentile calculation.</p><p>&#8220;Approximate answers&#8221; has been a longstanding criticism of streaming processing. MapReduce enthusiasts have always argued that, no matter how big your dataset is becoming over time, unless you desperately need real-time answers, you should always scale out and run batch to ensure correctness.</p><p>Needless to say that, as Tyler also mentions, approximate queries could be implemented in batch mode, too, in time-critical contexts. In fact, the initial motivation behind online processing was precisely that: to generate fast approximations by sequentially processing massive datasets.</p><p>However, as soon as we move beyond simple analytics such as sums and top-Ns, and into the realm of machine learning, the situation becomes a lot more interesting. Very few batch Machine Learning algorithms have exact online counterparts (one example being linear regression which can be exactly computed online via the Recursive Least Squares algorithm). Therefore, your favorite batch random forest will only approximately match its online version.</p><p>Does that automatically make the online algorithm inferior, or sub-optimal to the batch version? The answer is a definite (and perhaps surprising): <strong>No&nbsp;!</strong></p><p>A major distinction between analytical queries and machine learning algorithms is that the former are fixed functions of the data, such as a sum or a rank, whereas the latter is a process, rather than a function, whose objective is to discover hidden patterns or rules that generalise well against future datapoints. There is therefore no theoretical reason why a batch random forest must necessarily outperform a well-designed online random forest.</p><p>In the seminal book <a href="http://dl.acm.org/citation.cfm?id=500820">Principles of Data Mining</a> the chairman of our advisory board Professor David Hand and his coauthors make this point really nicely. They break down a data mining algorithm into the following constituents:</p><ul><li><p>The task to be solved (e.g., classification or clustering)</p></li><li><p>The structure of the model/pattern being fit to the data (e.g., decision tree)</p></li><li><p>The scoring function which measures how good the fit is (e.g., squared loss)</p></li><li><p>The search/optimisation algorithm which seeks high-scoring patterns</p></li></ul><p>Possibly the easiest way to understand this is to consider a neural network with a fixed architecture (i.e., number of hidden layers etc.), which has to be trained on a dataset of labelled examples. Because neural networks are infamous for producing optimisation surfaces with a huge number of local optima, no gradient descent method is guaranteed to identify the global optimum, and, indeed, conjugate GD might produce a different answer than simple GD. In this sense, an online algorithm using Stochastic Gradient Descent is just another candidate, which might or might not outperform the batch candidates. In any case, it makes no sense to consider the SGD answer as an approximation to the GD answer. In truth, they are both approximations of the unknown &#8220;true&#8221; decision boundary, and, if they are well-designed, they should both approach the &#8220;true&#8221; answer as the size of the dataset increases. Which one gets there faster is largely dependent on the problem and the class of methods. Put differently, <strong>in machine learning, both batch and online algorithms are approximations, that use data and computational resources differently</strong>.</p><p>A clarification is in order at this point: sliding windows are an extremely inefficient way to perform streaming learning, precisely because they are designed as approximations to batch learning, albeit starved of information due to the fact that they use only a fraction of the available data. Overarching frameworks for online learning such as stochastic gradient descent and stochastic approximation ensure instead that information is extracted and stored in the most efficient manner possible and updated sequentially.</p><p>But, surely, you might argue, the batch paradigm has more computational resources and can therefore support superior algorithms, right? As a general statement, this argument makes sense, but in practice, several statistical phenomena conspire to make online algorithms extremely competitive. We only mention here a few, and promise to analyse further in a future post:</p><ul><li><p><strong>Overfitting</strong>. Online learning proceeds to improve its answers by continuously comparing its predictions with the next datapoint. This means that, by design, online algorithms are less prone to overfitting than offline methods, where a great amount of discipline is needed to avoid that pitfall.</p></li><li><p><strong>Simulated annealing. </strong>When faced with complex optimisation surfaces riddled by multiple optima, the noise introduced by sequential data processing is a very successful way to escape local optima. In fact, online learning was first introduced in the machine learning literature as a way to train neural networks in batch mode, as it was observed that feeding the data to the learning algorithm sequentially can result in better answers.</p></li><li><p><strong>Temporal drift. </strong>Most real datasets feature a certain amount of temporal variation which might be too subtle or too complex to model explicitly (as opposed, for example, with clear periodicities or jumps). The data decay which is a built-in feature of most online learning algorithms offers robustness against this ubiquitous phenomenon, whereas batch algorithms need substantial refactoring to accommodate drift. This is in fact a critical and under-addressed point, which we may address in a subsequent post.</p></li></ul><p>Finally, are streaming algorithms really &#8220;more complicated&#8221;? We think this question is moot. Some models admit simple algorithms whereas others don&#8217;t, and the same holds of streaming versions. Moreover, online learning is no longer as fragmented as it was ten years ago, since overarching frameworks such as Stochastic Approximation and Stochastic Gradient Descent haved matured to the extent that it is now possible to reuse a lot of components in a streaming machine learning library.</p><p>Let&#8217;s recap our key observations:</p><ul><li><p>In discussing the comparative accuracy of streaming vs batch processing, one must crucially distinguish between analytics and machine learning.</p></li><li><p>For analytical queries, some cases such as a sum or a standard deviation are possible to exactly compute online (i.e., using constant resources), whereas others can only be approximated (e.g., top-N), albeit very well.</p></li><li><p>In machine learning, both batch and online algorithms offer approximations to an &#8220;unknown truth&#8221; (e.g., a decision boundary or a cluster structure), and should be assessed according to their generalisation ability rather than their in-sample performance, so as to avoid overfitting. Therefore it makes no sense to consider online algorithms as approximations to batch algorithms.</p></li><li><p>Despite their computational efficiency, online learning algorithms have built-in features that can often make them outperform batch counterparts.</p></li></ul><p><em>Originally published at <a href="http://www.ment.at">www.ment.at</a> on 28-Sep-2015</em></p>]]></content:encoded></item><item><title><![CDATA[Aeolian Machine Intelligence]]></title><description><![CDATA[A typical wind turbine is equipped with around 100 sensors, each producing 100 datapoints per second. And yet communication can be patchy&#8230;]]></description><link>https://blog.ment.at/p/aeolian-machine-intelligence-cfe3ad00c41</link><guid isPermaLink="false">https://blog.ment.at/p/aeolian-machine-intelligence-cfe3ad00c41</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 13:00:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1e71fee8-a440-4297-a21f-b4b5ae56522e_800x599.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A typical wind turbine is equipped with around 100 sensors, each producing 100 datapoints per second. And yet communication can be patchy, 3G rare. CCTV networks can only afford to send back to the data center a small fraction of the captured videostreams. Robots in a manufacturing plant can sense at millisecond granularity, but the plant&#8217;s SCADA infrastructure can handle perhaps 1 observation per minute (with 1 observation per 15 minutes being a typical sampling rate).&nbsp;<br>&nbsp;<br>This is a very &#8216;2015&#8217; problem. As the IoT gets more commoditized and high-frequency, one Moore&#8217;s law (cheaper sensors) goes against another (cheaper storage) and the outcome is not predetermined: in a large variety of applications it is now necessary to perform data reduction at the edge. Put simply, one reports less frequently than the capture rate, by performing some form of aggregation at the <strong>edge</strong>.&nbsp;<br>&nbsp;<br>At <a href="http://www.ciscolive.com/global/">Cisco Live</a> in San Diego where we presented our solution for on-the-fly, on-the-edge Machine Learning, an <a href="https://developer.cisco.com/site/iox/technical-overview/">IoX</a> product manager put it nicely: &#8220;right now we have devices flooding the network with a constant stream of &#8216;I am doing OK. I am doing OK. I am doing OK. I am doing OK&nbsp;&#8230;&#8217;. I&#8217;d much rather just hear from them when they are in trouble&#8221;. Mentat&#8217;s anomaly detection technology can enable fruitful interactions between humans and the tsunami of data from the IoT.<br>&nbsp;<br>Senior voices joined in during the Executive Symposium panel discussion, where <a href="http://blogs.cisco.com/author/malaanand">Mala Anand</a>, our Chief Data Scientist and other prominent figures in Data Science addressed the future of data and machine learning. In her concluding remarks, Mala emphasised the need for data reduction on the edge as data sources such as video and high-frequency sensing from the IoT proliferate.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LJw-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LJw-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 424w, https://substackcdn.com/image/fetch/$s_!LJw-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 848w, https://substackcdn.com/image/fetch/$s_!LJw-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 1272w, https://substackcdn.com/image/fetch/$s_!LJw-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LJw-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LJw-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 424w, https://substackcdn.com/image/fetch/$s_!LJw-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 848w, https://substackcdn.com/image/fetch/$s_!LJw-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 1272w, https://substackcdn.com/image/fetch/$s_!LJw-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F512e9fc1-63bc-4818-a84d-2c00a39b9b90_800x599.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>What is it that makes Streaming Edge analytics different from Traditional analytics, or Big Data analytics? There are three main differences&nbsp;:<br>&nbsp;<br>1. Scale with Velocity (not just Volume). A Hadoop query that runs a little slow is a discomfort; a streaming query that runs too slow is a disaster, resulting either in data loss or a system crash (or both), as new observations pile in waiting to be processed. Moreover, most edge use cases involve the need for real-time actionable insights with short lifespan, where &#8220;delayed&#8221; is just as bad as &#8220;missed&#8221;.&nbsp;<br>&nbsp;<br>2. You can&#8217;t store everything. By definition of the use case, some data will be lost. But the information contained in this data need not be lost, if your streaming analytics are done properly. The simplest example is a sum: it is trivial to compute it in a streaming fashion, via a cumulative sum that yields a 100% accurate answer without any data storage. Surprisingly many statistical insights can be extracted in this way.&nbsp;<br>&nbsp;<br>3. You can&#8217;t scale out easily (neither on CPU nor on RAM). Edge computing generally relies on few, small computers. Think 2000. Forget about machine learning techniques that need GPU farms to get started.&nbsp;<br>&nbsp;<br>&nbsp;These are the constraints we listed on our white board at the South West of rainy London when we set out to design our demo for Cisco Live at sunny San Diego a few months back. And this is what we came up with.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JR7T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JR7T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 424w, https://substackcdn.com/image/fetch/$s_!JR7T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 848w, https://substackcdn.com/image/fetch/$s_!JR7T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 1272w, https://substackcdn.com/image/fetch/$s_!JR7T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JR7T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80717af0-6999-44bf-bd04-a6536db9d846_796x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JR7T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 424w, https://substackcdn.com/image/fetch/$s_!JR7T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 848w, https://substackcdn.com/image/fetch/$s_!JR7T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 1272w, https://substackcdn.com/image/fetch/$s_!JR7T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80717af0-6999-44bf-bd04-a6536db9d846_796x800.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>The core idea came easily: we needed to deploy on a <a href="https://www.raspberrypi.org/">Raspberry Pi</a>. These tiny very low cost devices are fairly close to the capabilities of a typical fog node. The Pi would receive a stream of data in one port at high frequency, process it in real-time, and output the resulting analytical insights in another port at a lower frequency.<br>&nbsp;<br>&nbsp;The challenge for the data replay was to stream the data at the same pace as the original dataset was being saved. In order to do so, we created a small tool coded in <a href="http://elixir-lang.org/">Elixir</a> that would read and replay a dataset (CSV format) and re-stream over the network it in real time, re-timing each observation with the current timestamp. The tool would detect the original time delta between two observations and would make sure that we don&#8217;t emit a new observation too early or too late.&nbsp;<br>&nbsp;<br>&nbsp;Since Linux&#8217;s Process Scheduler doesn&#8217;t give you any guarantee about when your process is going to be put asleep and how long this will last, we made sure to detect and discard any record that would be considered as &#8220;too old&#8221;.&nbsp;<br>&nbsp;<br>&nbsp;I/O is the general bottleneck and in our case, reading the CSV dataset from the disk was of course slow. Furthermore, as the Process Scheduler was putting us asleep from time to time, we had to make sure that we had a buffer for the CSV records.&nbsp;<br>&nbsp;<br>&nbsp;We therefore had a bunch of different Elixir (Erlang) Processes in charge of the entire pipeline: reading the CSV file, transforming each line in a new structured record, retiming the data and coercing some of the values, queuing the records, consuming the queue and streaming the records.<br>&nbsp;<br>&nbsp;In order to have a clear, visual and realistic demo, we decided to use a second RaspberryPI to run the data-replay tool. This way we could very easily plug and unplug the Ethernet cable that was linking them to stop and resume the consumption and analysis of the data.<br>&nbsp;<br>&nbsp;Since only part of the team could go to San Diego, we had to make sure that we could still deploy fixes and updates remotely.&nbsp;<br>&nbsp;<br>&nbsp;We deployed an OpenVPN and a private Docker registry server in AWS to make sure that the technical team could deploy new containers with the latest version of the code and we provisioned the Raspberry PIs with Chef to ease the setup. In just a couple of commands the latest versions were deployed remotely round the world.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W01f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W01f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 424w, https://substackcdn.com/image/fetch/$s_!W01f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 848w, https://substackcdn.com/image/fetch/$s_!W01f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 1272w, https://substackcdn.com/image/fetch/$s_!W01f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W01f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W01f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 424w, https://substackcdn.com/image/fetch/$s_!W01f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 848w, https://substackcdn.com/image/fetch/$s_!W01f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 1272w, https://substackcdn.com/image/fetch/$s_!W01f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c368b92-16ee-4f06-87fa-6ea70d49a998_800x586.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>So one Rpi streaming 90 wind turbine sensor readings at 100 Hz each (or 9000 datapoints per second in total) going into our datastream.io core engine running on a single core of the CPU of the second Rpi, which then displayed a custom dashboard built for the demo. Anomalies were detected on all sensor streams in real time and our alert correlation technology gave a holistic view of the wind turbine health.</p><p><em>Originally published at <a href="http://www.ment.at">www.ment.at</a> on 13-Aug-2015</em></p>]]></content:encoded></item><item><title><![CDATA[Internet of Everything and Machine Intelligence]]></title><description><![CDATA[Let&#8217;s start on a light note. For a brief period of time, the Internet of Things became associated with the fridge that orders milk by&#8230;]]></description><link>https://blog.ment.at/p/internet-of-everything-and-machine-intelligence-9172a5d18859</link><guid isPermaLink="false">https://blog.ment.at/p/internet-of-everything-and-machine-intelligence-9172a5d18859</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 12:55:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g5zW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04dd1947-afea-4c18-ac51-b4ba12189525_660x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Let&#8217;s start on a light note. For a brief period of time, the Internet of Things became associated with the fridge that orders milk by itself. This retro-futurist icon is a great example of a common tendency for extremely disruptive technological waves to first enter the public realm in the form of low impact nice-to-have use cases (personal computers and robotics suffered the same fate at first). Besides being amusing, these are also instructive. The small-mindedness of a fridge that has a direct line to the supermarket is a great way to make a really important point: the value of the <a href="http://www.cisco.com/c/r/en/us/internet-of-everything-ioe/index.html">Internet of Everything</a> (IoE), ultimately, is about the network, not the individual connections.</p><p>The IoE will treat the home as a living system of devices; the manufacturing sector as a society of robots and sensors; the retail sector as a never-ending multi-channel interaction; and the city as a hive of homes, retail outlets, factories, utilities and infrastructure. Accordingly, its disruptive nature will not primarily lie in the time-saving effect of automating simple tasks but rather feeding from, and feeding back into, intricate economies of network and of scale. The IoE will manifest itself in enabling previously unthinkable efficiencies and capabilities.</p><p>We are nearly there. Ubiquitous sensing, connectivity and processing are enabling continual data capture and exchange in a rapidly increasing variety of physical settings. These mini-nervous systems are still islands&#8202;&#8212;&#8202;the home, the hospital, the school, the manufacturing plant&#8202;&#8212;&#8202;but soon they will be subnets of an emerging interconnected network of the physical and industrial world. This is how it starts. But the end goal, invariably, is not just to sense, but to make sense of, to understand, to predict, to control. This is a superhuman task when faced with a sea of machine chatter. Machine nervous systems require machine minds.</p><p>Luckily, we can now build those. The last few years have seen a cascade of Machine Intelligence victories, reaching milestone after milestone: vision, translation, medical diagnosis. The list of tasks where machines now routinely outperform humans is growing at an unprecedented rate. All things considered, we don&#8217;t think it&#8217;s an exaggeration to say that the first half of the 21st century will be remembered for the convergence of two of the most disruptive technological waves in human history: the IoE, and Machine Intelligence. As machines become able to comprehend sensory input of various types and impossibly high dimensions, plugging them into a vast nervous system will mark a turning point in technological history. Machine Intelligence is the value driver that can turn continual streams of data into actionable insights.</p><p>A new paradigm is needed to address the challenges and opportunities of this new era of increasing data velocity and variety. We need machine intelligence designed to deal with data in motion. Data in motion will dominate the sea of data generated by the IoE. And the ability to efficiently process, analyze and produce actionable insights from streaming data in ways that can be consumed efficiently by humans will be paramount to bringing the vision to fruition. Real-life processes change over time and designing systems with adaptability and scalability as core principles is paramount. Business intelligence can yield a lot of insights, but we truly need to be able to deploy at scale the full arsenal of machine learning capability in order to leverage the wealth of structured and unstructured data generated by the IoE.</p><p><a href="http://www.ment.at">Mentat Innovations</a> feels privileged to be able to collaborate with Cisco to deliver the vision of a connected, intelligent IoE to the world. We can now finally imagine a shift from diagnostics to prognostics in manufacturing plants, smart electricity grids and intelligent retail solutions. We can imagine a connected healthcare system offering early warnings for life-threatening conditions without additional burden on overworked doctors. Use case by use case, across all verticals, a bigger picture is emerging, of a new economy where ambient machine intelligence empowers humanity to build a better future. This starts now.</p><p>Tags: <a href="http://blogs.cisco.com/tag/christoforos-anagnostopoulos">Christoforos Anagnostopoulos</a>, <a href="http://blogs.cisco.com/tag/cisco">Cisco</a>, <a href="http://blogs.cisco.com/tag/internet-of-everything">Internet of Everything</a>, <a href="http://blogs.cisco.com/tag/internet-of-things-iot">Internet of Things (IoT)</a>, <a href="http://blogs.cisco.com/tag/ioe3">IoE</a>, <a href="http://blogs.cisco.com/tag/iot">IoT</a>, <a href="http://blogs.cisco.com/tag/mentat-innovations">Mentat Innovations</a></p><p><em>Originally published at <a href="http://blogs.cisco.com/ioe/internet-of-everything-and-machine-intelligence">blogs.cisco.com</a> on 30-Mar-2015</em></p>]]></content:encoded></item><item><title><![CDATA[The Rise of Overconfident Machines]]></title><description><![CDATA[At the time of writing, a number of eminent scientists have raised concerns about the potential adverse consequences of the belated rise of&#8230;]]></description><link>https://blog.ment.at/p/the-rise-of-overconfident-machines-bbd5023394ff</link><guid isPermaLink="false">https://blog.ment.at/p/the-rise-of-overconfident-machines-bbd5023394ff</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 12:52:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fcd5afd0-071c-460a-9fb5-cef7f9503e39_740x300.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At the time of writing, a number of eminent scientists have raised concerns about the potential adverse consequences of the belated rise of Artificial Intelligence. Belated because it has been anticipated since the 1960s, but, until a few years ago, it had not materialised except in toy (albeit challenging) worlds, such as, for example, chess. The last few years have, however, been game changing. Amazon&#8217;s robotic warehouses, Google&#8217;s visual search, Skype&#8217;s automatic voice translation all hint towards what is known in academic circles as &#8216;strong AI&#8217;, i.e., the ability of a machine to respond to real-world stimuli in ways indistinguishable, or superior, to human behaviour. Put simply, machines can increasingly beat us. And this is significant.</p><p>It is important for two reasons. First, because whenever a piece of technology beats humans in something vital, this usually causes a revolution. When fire beat human saliva in making food safe and digestible, the modern human evolved. When stone tools beat human hands in warfare and hunting, the Stone Age began. Then iron beat stone. Steam. Electricity. The Digital Age. And now Artificial Intelligence.</p><p>It is hard for us to envisage the &#8216;next day&#8217; where thinking machines routinely outperform humans. Massive unemployment, the potential for asymmetric warfare and a deepening rich-poor divide have all featured as potential elements of a dystopian AI future. Equally, eradication of disease, poverty and social injustice have also been put forward as a utopic alternative.</p><p>Hard to tell. Even harder to tell is whether machines will eventually dominate the human race, which some thinkers see as theoretically inevitable&#8230;geeky end-of-the-world enthusiasm aside, this is not as extreme as it <a href="http://www.techworld.com/news/startups/stephen-hawking-elon-musk-warn-about-rise-of-ai-3593491/">sounds</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Prmf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Prmf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 424w, https://substackcdn.com/image/fetch/$s_!Prmf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 848w, https://substackcdn.com/image/fetch/$s_!Prmf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 1272w, https://substackcdn.com/image/fetch/$s_!Prmf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Prmf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Prmf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 424w, https://substackcdn.com/image/fetch/$s_!Prmf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 848w, https://substackcdn.com/image/fetch/$s_!Prmf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 1272w, https://substackcdn.com/image/fetch/$s_!Prmf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2adfcbb4-dee0-4cc8-9145-a7aeb6be2080_740x300.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>But there is a more practical, immediate concern, with real implications about the choice of methodology that AI companies like us are building right now. Our worry is that our immediate future will be plagued by overconfident machines. Put differently, if true wisdom is to know what you don&#8217;t know, machines are still pretty stupid.</p><p>Machine translation will always translate the input text, even if it has to clutch at straws and output junk. A bit like an overeager translation intern, it lacks the wisdom to say &#8220;I am not quite sure what this means, sorry&#8221;.</p><p>This carries risks that are all-too-real. Our foremost experience of AI will not be in the form of a charming golden robot with a British accent and a lovable sidekick, but rather in the form of hidden automation. And just like a human, an unsupervised machine that thinks too highly of itself can prove catastrophic.</p><p>So is it within our ability to code up &#8220;modesty&#8221; and &#8220;self-criticism&#8221;? Yes, and no.</p><p>Yes, because, to the relief of statistically-minded machine learning researchers most ML is now probabilistic, which simply means that answers are given in the form of a probability, rather than a straight yes/no. A picture might be a picture of a cat with 90% probability, or with 50% probability. It is then a matter of deciding how best to use this estimated uncertainty on a case-by-case basis, and, when it matters, one can always choose not to report the 50/50s.</p><blockquote><p>However, uncertainty is subtler than that. Most Machine Learning is model-based, which means that its output (e.g., the probabilities above) is computed itself on the basis of certain assumptions about the data-generating process. These assumptions will, almost surely, be violated at some point. A thinking machine should then recognise that fact and report higher uncertainty.</p></blockquote><p>Unfortunately, honest reporting of uncertainty remains challenging. Lets briefly examine why that is the case in the two flavours of AI that are &#8216;hottest&#8217; right now: <a href="https://en.wikipedia.org/wiki/Bayesian_network">Bayesian AI</a>, and <a href="https://en.wikipedia.org/wiki/Artificial_neural_network">Neural Networks</a> (of which <a href="https://en.wikipedia.org/wiki/Deep_learning">Deep Learning</a> is the latest branch).</p><p>Bayesians are proud in relying on a coherent, all encompassing mathematical framework for expressing uncertainty, and they follow best practices in always reporting probability distributions, rather than point estimates. A Bayesian who is worried about a certain assumption will invariably express that assumption in a &#8220;fuzzier&#8221; manner, by introducing uncertainty about it in the form of yet another probability distribution&#8202;&#8212;&#8202;assumptions about assumptions. Unfortunately, even the most carefully stacked Bayesian model might still be betrayed by the real world, especially since the computational burden of the Bayesian calculus often force over-simplistic assumptions for the sake of faster computations. Now, assumptions are violated al the time&#8202;&#8212;&#8202;this is not a problem in itself, and to over-engineer the assumption set is not always the right decision. What is problematic, though, is that, if the model assumptions are wrong, it is really hard to know whether the answers are still somewhat valid. Ironically, for more complex models, the problem gets worse, because complex models fail in complicated ways. This results in the following well-kept secret: 40-year old tools such as linear and logistic regression are still the workhorses of business intelligence and advanced analytics. Users will call them &#8220;reliable&#8221;. The correct technical term is &#8220;robust&#8221;, which, in statistics parlour, means, roughly speaking, a model that tends to remain pretty accurate even when its assumptions somewhat break down. Not all is lost of course&#8202;&#8212;&#8202;Bayesians are aware of this problem, and are working on it. But to this day off-the-shelf AI will not generally come with strong robustness guarantees, these have to be engineered into the system separately.</p><p>What about deep learning? Such methods are often referred to as &#8216;black-boxes&#8217;&nbsp;, and typically involve a highly parameterised input-output relationship fitted to the data via optimisation using a training dataset. These algorithms are fascinatingly complex and correspondingly powerful (indeed at Mentat we are building our own deep learning library for IoT cybersecurity right now). However, their complexity, and the fact that parameters of deep learning models are not really meaningful in themselves (technically speaking, they are not generally viewed as random variables or population parameters), makes it incredibly difficult for them to report their own uncertainty. Deep learning is in many ways similar to deep thinkers, or our lizard brains: it knows the answer instinctively, but cannot explain why.</p><p>That this should be a challenge is not surprising. Self-criticism, or, more broadly, self-reflection is one of the most elusive, and important, aspects of intelligence. Indeed, a certain school of thought in the philosophy of mind lists the ability to self-reflect as the defining characteristic of consciousness. Needless to say, humans do not always exhibit such intelligence themselves: cognitive biases, racism, fanaticism are all in many ways consequences of the inability to question one&#8217;s assumptions. However, the human race as a whole has historically demonstrated exquisite abilities to self-reflect, and reason about assumptions about assumptions, a little like the nested Bayesian model above.</p><blockquote><p>So how can we build &#8220;self-critical&#8221; thinking machines? Our answer here at Mentat is that robustness and diagnostics are the unsung heroes in real-world deployments of AI.</p></blockquote><p>The former consists in introducing failsafes into models that allow them to decrease their confidence when they feel overstretched (shrinkage in classical machine learning is a wonderful example of a simple solution to a difficult problem, and analogous techniques are applicable in almost all AI/ML algorithms, albeit with much greater care and technical difficulty at times). Failsafes against the possibility that the data-generating process changes over time is in fact a core innovation in our Machine Learning In Motion toolbox, and an unrecognised performance bottleneck in many competing offerings. Model diagnostics complement robustness and stem from an approach that was all-the-craze in the golden era of classical statistics, consisting in tracking performance indicators of the model, and taking suitable action when they take unreasonable values. This may sound mundane (who doesn&#8217;t keep track of their model&#8217;s performance?) but the art of diagnostics is in constructing the right statistic, and understanding the range of values it should be allowed to take without raising an alarm&#8202;&#8212;&#8202;a challenging mathematical problem, when done correctly. Thankfully, recent methodological developments can allow us to do both tasks with generality.</p><p>There is, of course, a lot more to &#8220;self-reflection&#8221; than what was explored here, but self-diagnostic, robust model-based ML is an excellent starting point.</p><p>Some time ago, a member of our university study group, after a long period of intense, silent thinking aimed at solving an infamous maths past paper question, announced: &#8220;Am I talking cr@p? That is the question.&#8221; Simply put we should not fully trust Thinking Machines until we can witness them spontaneously make a similar claim.</p><p><em>Originally published at <a href="http://www.ment.at">www.ment.at</a> on 17-Mar-2015</em></p>]]></content:encoded></item><item><title><![CDATA[Ten things we believe about machine intelligence]]></title><description><![CDATA[Everything changes and that changes everything. Adaptivity is essential.]]></description><link>https://blog.ment.at/p/ten-things-we-believe-about-machine-intelligence-ffc360e69b9a</link><guid isPermaLink="false">https://blog.ment.at/p/ten-things-we-believe-about-machine-intelligence-ffc360e69b9a</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 12:46:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g5zW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04dd1947-afea-4c18-ac51-b4ba12189525_660x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<ol><li><p>Everything changes and that changes everything. Adaptivity is essential.</p></li><li><p>When insights are extracted from data in real-time, there is little additional value in storing data.</p></li><li><p>Machine Intelligence is a robust system, not just a library of algorithms.</p></li><li><p>Deep understanding of statistical uncertainty matters.</p></li><li><p>Privacy is important.</p></li><li><p>Machine intelligence should serve and empower human decision makers.</p></li><li><p>The edges of a network have as much value as its nodes.</p></li><li><p>Reduce information overload, do not contribute to it.</p></li><li><p>No single family of algorithms is fit-for-all-purposes.</p></li><li><p>Scalable infrastructure without scalable real time intelligence is information-poor.</p></li></ol><p><em>Originally published at <a href="http://www.ment.at">www.ment.at</a> on 19-Feb-2015</em></p>]]></content:encoded></item><item><title><![CDATA[Big Data Privacy]]></title><description><![CDATA[The Mentat team spent six weeks at Level 39 in Canary Wharf, as a finalist at the EY Challenge on privacy. This program inspired us to&#8230;]]></description><link>https://blog.ment.at/p/big-data-privacy-ea2cbaf37057</link><guid isPermaLink="false">https://blog.ment.at/p/big-data-privacy-ea2cbaf37057</guid><dc:creator><![CDATA[George Cotsikis]]></dc:creator><pubDate>Mon, 01 Feb 2016 12:15:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c15d9b9a-0846-4114-be79-c638ab20662f_800x355.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Mentat team spent six weeks at <a href="http://www.level39.co/">Level 39</a> in Canary Wharf, as a finalist at the <a href="http://www.level39.co/news/eystartup-challenge-announces-seven-finalists/">EY Challenge</a> on privacy. This program inspired us to rethink much of what is currently standard practice in analytics with respect to privacy issues.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MYLJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MYLJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 424w, https://substackcdn.com/image/fetch/$s_!MYLJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 848w, https://substackcdn.com/image/fetch/$s_!MYLJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 1272w, https://substackcdn.com/image/fetch/$s_!MYLJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MYLJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MYLJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 424w, https://substackcdn.com/image/fetch/$s_!MYLJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 848w, https://substackcdn.com/image/fetch/$s_!MYLJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 1272w, https://substackcdn.com/image/fetch/$s_!MYLJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b6b9a-1b2d-424f-91db-c136eba70d7f_800x355.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Data collection and storage technology is transforming our world. Bicycles are connected to the internet, smartphones will become our personal physicians and cities have already become cashless. A connected world, full of opportunity&#8202;&#8212;&#8202;a Big Data dreamland. And yet the old paradigm of storing all this data is challenged. The sheer volume sometimes makes it impossible. But a more alarming downside is the potential for privacy infringement. Embarrassing pictures on Facebook are only the beginning. Consider arbitrary third parties knowing your physical location, your habits inside the privacy of your own home. A Big Data nightmare. So is the Big Data dream in danger?</p><blockquote><p><em><strong>Should we have to choose between Big Data and&nbsp;Privacy?</strong></em></p></blockquote><p>Let&#8217;s get more specific, to remind ourselves of exactly what is at stake. The three big telcos know exactly where every single Londoner is at any given time. A high-street retail chain might be interested to see the demographics of the people walking past its store on a typical Sunday, right now, or even better, tomorrow afternoon, so that they can promote items that appeal to this particular customer segment. Golden third-party sharing opportunities.&nbsp;<br>&nbsp;<br>Smart metering. The National Grid needs to anticipate demand surges so as to balance the grid as a whole. It only has information accurate down to sub-station. In an ideal world National grid would be able to have information down to the postcode level, ensuring non-storable energy (e.g., renewables) is used optimally and demand peaks are handled efficiently. In an ideal world the utilities would be able to detect daily and hourly patterns of electricity demand at the household level, and offer personalised tariffs that incentivise off-peak consumption. The end product: a balanced grid, cheaper energy for all, and reduced energy waste.&nbsp;<br>&nbsp;<br>Neither the Telcos nor the utilities are able to deliver this without a huge privacy risk: it only takes one incident to bring down the full force of the law, and irreparably damage the company&#8217;s reputation.&nbsp;<br>&nbsp;<br>The EU has been pushing for tighter legislation around privacy for a while, largely driven by the potential for large-scale privacy infringement by the data behemoths: Facebook, Google and similar organisations plus the governments of this world. The reality though is that even SMEs or an innocent app can cause irreparable damage to an individual if they misuse private information. What is being challenged here is a &#8220;tacit agreement&#8221; between consumer and digital service providers that has so far served the latter much better than the former: &#8220;click here to accept the terms and conditions about data usage&#8221;. One click can no longer offer a get-out-of-jail card for the data holder. Consumers that initially give consent to share data now retain the right to change their mind. The EU aptly calls it &#8220;the right to be forgotten&#8221;, and, let&#8217;s be honest, it seems fair. But is it enforceable?</p><p>Mentat thinks John Oliver probably has it right&nbsp;: No way&nbsp;!</p><div class="captioned-image-container"><figure><div id="youtube2-r-ERajkMXw0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;r-ERajkMXw0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/r-ERajkMXw0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div></figure></div><p>Private data, once stored outside the confines of your home and personal devices, are at risk. A healthy new ecosystem will no doubt grow around &#8216;safer data control&#8217; and &#8216;data provenance&#8217;, but the reality is, every lock can be picked, and as the amount of private information being stored and shared increases exponentially, so does the risk of it being abused. So it does seem that Big Data and privacy might be incompatible, after all.<br>&nbsp;<br>But what if we could think outside that box? What if we could have it all?<br>&nbsp;<br>&nbsp;We feel the real culprit is a decades-old Business Intelligence practice which we refer to as the &#8220;store everything, analyse later&#8221; approach. Indeed, the standard workflow of a Business Intelligence unit involves storing everything about everyone in massive data warehouses and pulling reports at arbitrary points in time later to satisfy the appetite of the next internal Powerpoint presentation, or to help determine the next-best-action during a customer service call.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2az-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2az-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 424w, https://substackcdn.com/image/fetch/$s_!2az-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 848w, https://substackcdn.com/image/fetch/$s_!2az-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 1272w, https://substackcdn.com/image/fetch/$s_!2az-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2az-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2az-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 424w, https://substackcdn.com/image/fetch/$s_!2az-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 848w, https://substackcdn.com/image/fetch/$s_!2az-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 1272w, https://substackcdn.com/image/fetch/$s_!2az-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df95f81-0c12-44d9-b617-b9a44bbf2435_553x455.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>This &#8220;keep it, just in case&#8221; rationale is outdated, and is, increasingly, seen as offensive, as it rests on an assumption that private data is a free, up-for-grabs resource. This is precisely the viewpoint that is becoming untenable.&nbsp;<br>&nbsp;<br>&nbsp;At Mentat, we have a different viewpoint. Our cornerstone belief is that &#8220;value lies with information, not with data&#8221;. Value is derived from actionable insights, which in turn rely on processed information, not on raw data. Of course in some settings, you do need raw data (e.g., for billing or auditing), but in the vast majority of Business Intelligence tasks, it is aggregate insights you want, rather than the private data itself. Well, you are in luck: with streaming analytics technology you can keep the insights, but throw away the data.&nbsp;<br>&nbsp;<br>&nbsp;So how does it work? A suitable analogy is that of an experienced store manager: they have learnt on the basis of past experience how to judge a character and sell differently, but they don&#8217;t remember the national insurance numbers and postal addresses of every single customer they have interacted with. This analogy cuts deep: a learning agent glimpses at new data, and uses it to update their view of the world, without storing every single bit and byte.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H6Ka!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H6Ka!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 424w, https://substackcdn.com/image/fetch/$s_!H6Ka!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 848w, https://substackcdn.com/image/fetch/$s_!H6Ka!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 1272w, https://substackcdn.com/image/fetch/$s_!H6Ka!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H6Ka!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H6Ka!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 424w, https://substackcdn.com/image/fetch/$s_!H6Ka!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 848w, https://substackcdn.com/image/fetch/$s_!H6Ka!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 1272w, https://substackcdn.com/image/fetch/$s_!H6Ka!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90f3dc2c-4ddc-4d2e-af8a-eb4bc8a18a36_664x492.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Going back to the use case of the mobility map, we can learn from the data how different customer segments move around the city in real-time, anticipate flows as they propagate&#8202;&#8212;&#8202;has a tourist bus just deposited hundreds of potential customers on Marble Arch? How likely are they to come near your store? We call this an <em>information asset</em>, in contrast to a <em>data asset</em>. The information asset only ever holds aggregated information, that cannot be traced back to the individual. So it&#8217;s not all or nothing: most of the useful insights that you need, don&#8217;t actually need the private data to be stored. In an ideal world, you shouldn&#8217;t have to face dilemmas like this one.&nbsp;<br>&nbsp;<br>Smart metering. We define our information asset: a geographical predictive map of energy demand patterns in real-time, down to the lowest granularity that allows anonymity: e.g., the first half of the postcode. The techniques we use are compliant: they rely on the same principles used by, say, the Office for National Statistics. But our reports are real-time, so that National Grid has time to act and match demand and supply.<br>&nbsp;<br>&nbsp;The same principle applies to any personal device in the Internet-of-Things. Privacy risks associated with wearable health sensors connected to smartphones will dwarf the risk of smart metering data. It is critical to rely on information assets that are privacy-respecting. Imagine a real-time map of asthma attacks in the city of London, split by age-group, offering warnings to individuals at risk.&nbsp;<br>&nbsp;<br>&nbsp;Make no mistake: the digital economy will look different in one or two years from now, and privacy-enhancing service providers will need to introduce an array of technologies to ensure their data-driven business models remain intact while respecting consumer privacy. Better authentication systems, or smarter data management solutions, like our fellow EY challengers <a href="http://www.sedicii.com/">Sedicii</a> and <a href="http://www.exonar.com/">Exonar</a> will no doubt be part of the solution. But when it comes to business analytics, forward-thinking companies that embrace the empowering technology of on-the-fly aggregation using in-motion analytics will proudly gain the reputation of respecting not only the letter, but also the spirit of the law: private information will be glimpsed at, to ensure that consumer and service provider alike benefit from the Big Data value chain, but it will then be forgotten, literally milliseconds later. That&#8217;s what drives us at Mentat: true innovation, which can helps us break free of false dilemmas.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bfJ6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bfJ6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 424w, https://substackcdn.com/image/fetch/$s_!bfJ6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 848w, https://substackcdn.com/image/fetch/$s_!bfJ6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 1272w, https://substackcdn.com/image/fetch/$s_!bfJ6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bfJ6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bfJ6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 424w, https://substackcdn.com/image/fetch/$s_!bfJ6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 848w, https://substackcdn.com/image/fetch/$s_!bfJ6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 1272w, https://substackcdn.com/image/fetch/$s_!bfJ6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cb47a02-a80f-4b58-a1c3-8df282742db6_800x545.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Originally published at <a href="http://www.ment.at">www.ment.at</a> on 8-Jan-2015</em></p>]]></content:encoded></item></channel></rss>