Lumos: Improving Smart Home IoT Visibility And Interoperability Through .

Transcription

Lumos: Improving Smart Home IoT Visibility andInteroperability Through Analyzing Mobile AppsJeongmin Kim , Steven Y. Ko† , Sooel Son and Dongsu Han KAIST , University at Buffalo, The State University of New York, USA†Abstract—The era of Smart Homes and the Internet of Things(IoT) calsl for integrating diverse “smart” devices, includingsensors, actuators, and home appliances. However, enablinginteroperation across heterogeneous IoT devices is a challengingtask because vendors use their own control and communicationprotocols. Prior approaches have attempted to solve this problemby asking for vendor support, or even fundamentally re-designingthe architecture of IoT devices. These approaches face limitationsas they require disruptive changes.This paper explores a new approach to improving IoT interoperability without requiring architectural changes or vendorparticipation. Focusing on smart-home environments, we proposeLumos that improves interoperability by leveraging Android appsthat control IoT devices. Lumos uses this information learnedfrom IoT apps to enable “best-effort” interoperation acrossheterogeneous devices. Our evaluation with 15 commercial IoTdevices from three major IoT platforms and in-depth user studiesconducted with 24 participants demonstrate the promising efficacy of Lumos for implementing diverse interoperation scenarios.I. INTRODUCTIONThe smart home market was valued at 76.62 billion in 2018and is expected to reach 151.38 billion by 2024 [52]. Severalmajor players (Apple, Amazon, Google, and Samsung) areconsistently introducing new smart appliances and promotingtheir own platforms to increase their market share. Alongwith their popularity, the demand for their interoperability hasincreased [46, 65]. However, it is not trivial to implement awell-integrated system [3, 18, 21, 37, 44, 60].The difficulties stem from three limitations: 1) Smart homedevices often have their own means of control, such as mobileapps, that are proprietary [21, 37]. 2) No single “open” IoTplatform is able to cover all IoT devices—SmartThings andWink, the two largest “open” platforms, respectively support219 and 115 devices (May 2018), while few devices aresupported by both [58, 67]. 3) Interoperation across differentIoT platforms is not a design priority.Existing approaches to IoT interoperation in the smarthomecontext are based on either voluntary vendor participation [1,24, 31, 40, 43] or architecture generalization [2, 3, 6, 18, 32,60, 69]. However, these approaches have their own limitations.The participation-based approaches provide a standardizedcommon interface (e.g., an open API) with which vendors arerequired to conform. However, due to the extra engineeringeffort required to support these APIs, these approaches havenot been well received; hence a limited number of devicesare supported. On the other hand, the generalized archi978-1-7281-6992-7/20/ 31.00 2020 IEEEtectures propose common data structures and interfaces forheterogeneous IoT devices. Unfortunately, they have not beenwidely deployed because they demand significant changes tothe current IoT architecture and require the agreement of allstakeholders.We posit that interoperability will remain a long-lastingproblem in the foreseeable future and seek for an alternativesolution that can be driven by users. Thus, we explore adifferent approach, in which we leverage only the information already available, enabling interoperation of IoT deviceswithout explicit vendor support or architectural changes. Fromthis perspective, the core challenges are to: 1) obtain visibilityand controllability for IoT devices only from the availableinformation; and 2) improve interoperability among existingIoT architectures. To this end, we design a new system calledLumos that empowers users with an automated frameworkthat supports interoperability. Lumos leverages the key insightthat many IoT devices for smart homes are controlled byAndroid apps [26, 27, 53]. The apps are readily available andalready know how to control IoT devices and query deviceinformation.Leveraging information obtained through app analysis,Lumos improves the interoperability with minimal user configuration. To infer the semantics and control/status messagesgenerated by an app, Lumos combines UI, program andtraffic analyses performed on the app binary. It then integratesthe analysis results with finer-grained semantic informationfrom the user who best knows the context of the actionshe triggers when using the app. Based on the information,Lumos creates an interoperation gateway that understands thesemantics of messages between the app and its IoT device andactively sends control messages and status queries to the IoTdevice. Finally, Lumos helps users easily create interoperationscenarios of their own by automatically generating scripts. Ourcontribution shows that a pure user-driven approach is viablein bridging the gap until the market fully resolves the problem.We evaluate Lumos using 15 commercial off-the-shelfsmart-home devices. We show that Lumos can learn thesemantics of IoT operation from network traffic for all devicefeatures (29 out of 29), and is able to generate status andcontrol messages for most features (26 out of 29, §V-B).Finally, our user study with 24 participants shows that Lumosoffers a practical programming framework that enables theinteroperation of IoT devices, offers interoperation featuresthat are not available on commodity IoT platforms, andrequires reasonable configuration effort compared with three

OthersSmartThings219Wink114Device directlyWemoCloud basecontrol msg (turn on t1Req body (xml - upnp)u:Envelope BinaryState 1 /BinaryState /u:Evelopestatus msg (get status of motion sensor)wiredwemoInsightSmartThings hublampplugcloudInsteon52Count(SmartThings Wink Insteon) 1Count(SmartThings Wink) /./api/devices/./mainTileRes body (json)"currentState": { ,"name": "motion","value": "inactive"}lampstatus msg (use chromecase)Fig. 1: Fragmented smart home ecosystemspopular IoT platform-native apps.In summary, this paper makes two key contributions: Novel approach to interoperability: We present an automated framework that combines static program analysisand dynamic learning to understand and re-construct control messages with user-given semantics. System prototype and evaluation: Our in-depth evaluation shows how Lumos enables interoperation between IoTdevices in a unilateral fashion, further enabling new valueadded home IoT services. To the best of our knowledge,our evaluation covers the largest set of commercial IoTdevices among existing work. Our user study with 24participants shows the promising efficacy of Lumos andquantifies the user effort.II. M OTIVATIONA. The Status Quo of IoT InteroperationMarket reports [29] indicate the existence of 450 IoTplatforms world-wide as of 2017, marking a 25% increasesince 2016 and showing how the IoT ecosystem is becomingincreasingly fragmented. Interoperability is the ability to createa coherent service by interacting with multiple IoT devices.Many studies confirm that such fragmentation presents asignificant barrier that impedes interoperability and wideradoption of home IoT services [19, 28, 30, 44, 68]. We oftencome across users experiencing frustrations and ordeals as theytry to make different IoT devices interoperable [51, 54].Fig. 1 shows three major IoT platforms and the numberof devices they support. We make the following observations: Only devices on the same platform are interoperable.Cross-platform inter-operation is generally not supported.Out of the three, only SmartThings exposes external APIsfor controlling and monitoring devices [59]. If devices belong to a platform, they are locked-in to thatspecific platform. They neither support multiple platformsnor change their platform. We suspect that this limitationoriginates from implementation costs and business partnerships [57, 61]. Many IoT devices are still stand-alone and do not interoperate with other devices. For example, Chromecast [23]does not belong to any of the three major platforms andcannot interact with any devices on these platforms.NetflixURL: https://customerevents.Netflix.com/users/.Req body (json){EventName: “MDX Target Manager Action”,data:{languages: ,eventType: “target playback”ChromecastFig. 2: Connection types of devices and network trafficIndustry efforts: Notable approaches to this problem fromthe industry include OpenT2T [43] and IFTTT [24]. OpenT2Tdefines common IoT schemas, which consist of properties forsimilar devices. For example, a common schema for IoT lightbulbs can define an “on-off” property with two possible values,“on” and “off”. These schemas are then translated into vendorspecific implementations. A common schema then providesa consistent user experience when operating similar devices,even when they are from different manufacturers or supportdifferent protocols. While the idea of abstraction is noble,it requires vendor participation to support common schemas.No tangible incentive for this participation has contributed torendering OpenT2T inactive for more than three years at thetime of writing [42].IFTTT [24] allows users to write Applets that connectpopular web services, apps, and IoT devices. A user can setup triggers that specify when an Applet should run, filters thatexpress a desired condition, and actions that are executed whenfilter conditions are met. Combined with IoT devices, it can beused to customize IoT services. However, it leverages existingopen APIs to access device state and issue commands, whichrequires vendor supports. Even when vendors expose openAPIs, we find that they expose only a small set of features.B. Challenges and Key InsightApproaches that require vendor participation or architecturalchange are still far from wide-deployment, despite the growingneeds for interoperability among users. It is necessary to fullyunderstand IoT devices (e.g., protocol) even when we aim toimprove interoperability without vendor participation. However, it is difficult to obtain such understandings since almostall vendors do not open this information (§II-A). Furthermore,they do not wish to incur the high implementation costs thatdesign changes entail [50]. The main challenges are to 1)capture the device state for context monitoring (visibility)and issue a desired command (controllability) without vendorsupport and 2) improve interoperability without requiringarchitectural modifications. We take the following usecaseas a concrete running example to illustrate the problem.Alice has an IoT light bulb and a streaming dongle (e.g.,Chromecast [23]). Each device has a companion mobile appthat allows Alice to control the device. Now, Alice wants to

(a) Software componentsSmartphoneinstallLumos-app(b) Deployment environmentsetdefault gatewayChromecastAliceSmartphone Lumos-gatewayinstallLumos-gatewayWemoInsightPC or Router(c) Configuring interoperation ruleCondition: streaming a Netflix movie: Overlay UI for recordingControl: turning off a light bulbLearningConfigurationCondNetflixWemoCond ChromecastCtrlUI actionLumos-appAliceon the top of an IoT appLumos-gatewayCtrlWemo InsightFig. 3: Usage Model - System components, Deploymentenvironment, and Configuring interoperation ruleconfigure them together so that the light bulb can automaticallyturn itself off when she streams a movie to her TV. Asmarthome system should be aware of whether the streamingdevice is currently playing a video and be able to issue thecontrol command to turn off the light bulb. Once the twotasks are addressed, implementing an IoT service with multipledevices amounts to writing a composition rule.Key insight: Our goal is to enable the two tasks in a userdriven, best-effort manner with automation support. Our keyinsight is that mobile apps play a key role in communicatingwith IoT devices and contain valuable information for interoperability. 1) They already have the ability to control andmonitor IoT devices. 2) Often vendors themselves provide theapps and keep them up-to-date. 3) The graphical user interface(GUI) of the apps provides semantic information (e.g., thisbutton turns off the light).Fig. 2 presents an example in which Alice uses a WemoInsight Plug to control room lighting. When Alice watchesa Netflix movie using Chromecast, the Netflix app sends anHTTP request message to the server. This request containsa message (“eventType”:“target playback”) denoting that themovie is to be played on the TV connected with Chromecast.When the request is detected, Lumos triggers a request tothe Wemo Insight Plug to power off. This is feasible becauseLumos learns from IoT apps to recognize the condition andgenerate the control message.III. L UMOS U SAGE M ODELIn order to use Lumos, a user installs two software components (Fig. 3 (a)). One component is Lumos-app, a mobile appthat allows users to configure interoperation. The other component is Lumos-gateway, a middlebox that can be installedon either a desktop or a router that can run a custom OS(e.g., a NETGEAR router). Lumos-gateway is a trusted partythat monitors all traffic that IoT apps generate (Fig. 3 (b)). Tomonitor the traffic, users need to configure their devices to useLumos-gateway as the default gateway. Note, an IoT app maydirectly connect to an IoT device (e.g., Wemo Insight Plug)or go through an IoT hub that talks to IoT devices (Fig. 2).Lumos-gateway supports both cases.Configuring interoperation: Our interoperation rule consistsof a condition and a control action. For example, in ourrunning example with Alice, the condition is streaming aNetflix movie and the control action is turning off the light.When Lumos detects the condition, it performs the controlaction. To configure a rule, users “teach” Lumos by performingUI actions that correspond to the condition and the control.For example, Alice teaches Lumos her condition (streaminga Netflix movie) by opening her Netflix app and playing amovie. She also teaches her control (turning off a light bulb)by opening her Wemo app and turning the light bulb off.We automate this process by capturing the user interactionwith Lumos-app that runs in the background and displaysan additional UI overlaid on top of an IoT app UI. Theadditional UI displayed by Lumos-app guides the user throughthe process of configuring a condition action and a controlaction. While a user configures a condition action and a controlaction, the Lumos-app monitors the UI actions performed bythe user and communicates with Lumos-gateway to capturethe requests and responses caused by these UI actions atthe network level, as illustrated in Fig. 3 (c). Then, Lumosgateway analyzes the requests and responses to detect thecondition and trigger control actions. Our implementationof Lumos-app extends an existing UI record-and-replay toolcalled SUGILITE [36], which relies on the Android Accessibility API to monitor, intercept, and inject UI actions.IV. D ESIGNAchieving our goal requires satisfying three requirements:1) Lumos-app must provide a way for users to leverage IoTapp UIs to “teach” our system of their intended conditions andcontrol actions. 2) Lumos-gateway must be able to recognizepre-configured conditions from the network messages that IoTapps generate and issue control messages to trigger desiredactions. 3) It must offer programmability using the “learned”information to configure interoperation rules. Fig. 4 presentsa system overview that delivers the goal. We detail eachcomponent of our design below.A. Learning from UIs and UsersLumos starts by learning the semantics of IoT operationthrough the UI and user actions on an app. The objective ofthis process is to identify the actions of interest that generatecontrol/status messages and label them with a specific semantic tag. This tag denotes an IoT device operation controlledvia an IoT app UI component, such as turning on/off a bulb.Lumos takes a user-assisted semantic labeling approach toreduce human effort. In the ‘teaching’ phase, when a userclicks a UI component, Lumos assigns the resource ID of theUI component as the semantic tag because a resource ID is ahuman-readable string that usually has semantic information(e.g. brightness slider). This is done by Lumos-app by monitoring user interactions through the Android Accessibility API.However, the tag might be insufficient. Some resource IDsdo not contain any semantic information (e.g., button1), or asingle button may trigger different actions depending on thecontext (e.g., a single switch button for power on and off).

4.B Learningfrom IoT AppsTraffic and Semantics LearningSig-UI-Packet PairPair DBPacketMatcherPLumos-gatewayuser Lumos-appCase1Sig UIPPLumos-appUI interaction & SemanticsUnchangeable tionConditionPacket LearnerHTTP req/resbody comparison 4.A Learning from UIs and Users 4.C Network Traffic LearningRequest (POST)Interoperation herPPPacketReplayerP Build interoperation rule 4.D Interoperation SupportFig. 4: Lumos system overviewTo manage this, Lumos allows a user to edit semantic tagsdisplayed during the ‘teaching’ phase to make the meaningmore specific and personalized to the user. For example, whenAlice wants to teach the system how to turn on a WINIXair cleaner using its app, she demonstrates it by clickingthe power-on button while Lumos-app is running. Lumosapp then records all of the interactions. When the power-onbutton is clicked, Lumos-app shows a dialog with the defaultsemantic tag (“power on”), which the user can then edit forcustomization.Example: Fig. 5 (left) illustrates the teaching phase in whicha user teaches Lumos-app how to turn on a WINIX aircleaner with a specific wind force level. She sets the windforce options. She then turns on the cleaner. Æ Lumosapp automatically assigns the resource ID (“power on”) to thepower-on button as the semantic tag. Lumos-app records theseinteractions for a later step of automatically replaying theseoperations (§IV-C) and enables to customize the tag with aconcrete meaning (e.g. power on with wind force level 1).B. Learning from IoT AppsBy design, Lumos-gateway should trigger a programmedaction upon observing network messages that represent anIoT operation. This architecture necessitates Lumos-gatewayto identify HTTP(S) requests that an IoT app generates whenclicking a particular UI component. Unfortunately, identifyingsuch requests is non-trivial without an understanding of theapp logic due to other unrelated requests in the background.To illustrate this, we quantify the amount of traffic generatedwhile performing specific actions on the IoT apps listed inTable I. We capture traffic from each app from the time thetarget app is started, perform UI interactions as quickly aspossible, and stop capturing traffic immediately. We reportthe average of 10 runs. We perform nine actions using theapps: 1) lock an August door lock; 2) stream to a Chromecast;3) turn on a HUE bulb; 4) power on an Insteon plug 5) getstatus from Nest Protect; 6) power on a SmartThings plug; 7)power on a Wemo Insight; 8) play Wink chime; and 9) poweron a Winix air cleaner. On average, each app generates 27.4HTTP transactions. According to our manual traffic analysis,all apps in our dataset continuously perform synchronizationas long as the app is in the foreground. We suspect thisis due to the need for minimizing synchronization delaysto enrich user experience. To isolate transactions that a UILumos-gateway"controlData": "2", ."header": {"reqTime": "20181112150826790"SelectClickRequest (POST)"controlData": "2", "header": { wind force level-1"reqTime": "20181112150842526"same semantic tagsCase2that users can modify laterRequestRequest(POST)(POST)Manual interactionAutomatic interactionTag: power on with level1"controlData": "2", ."header": {"reqTime": "changeable"Tag: power on with level3"controlData":"controlData": "5","5", "header":"header":{ { 0482630""controlData": "5", "header": {"reqTime":"changeable"Requests to turn on the cleanerLearning instanceFig. 5: Dynamic learning examples of WINIX appcomponent generates, Lumos uses static program analysis. Ittakes an Android binary (APK) and then pairs a UI componentUINetworkwiththe regexsignatures of control/status messages that theControlSignatureUI component generates. In addition, it tracks dependenciesSignature-UI pairbetween messages to dynamically learn fields that come fromprevious messages.Building network signatures: To identify the exact controland status message that a UI component generates or displaysafter receiving it, we start from an existing tool, Extractocol [33], that conducts a static taint analysis to extract networkmessage signatures that an Android app generates or receives.It automatically identifies all app-defined methods that sendnetwork messages, extracts message signatures, and outputsthem in regular expressions. Fig. 6 shows a regular expressionexample. Extractocol also provides a call graph of an app aswell as its control-flow and interprocedural data-flow graphs.We leverage the information in the next phase of our analysis.UI control identification: Given an app’s call graph and itsnetwork signatures, we associate each message signature witha UI component that generates a network message matchingthe signature. The goal is to precisely identify the networkmessages generated by the UI actions of a user when the userconfigures conditions and control actions. The Android accessibility API allows the monitoring of which UI components auser interacts with by providing their resource IDs. If we canassociate a resource ID with a network message signature, wecan isolate the traffic that the UI component generates fromLumos-gateway, which observes the traffic.Lumos takes the following two steps to accomplish thegoal. First, it identifies all event listeners that eventuallygenerate network messages. Since every interactable UI component has an event listener, identifying an event listeneris equivalent to identifying a UI component. Lumos doesa backward call graph analysis for this; it starts from eachapp-defined method that generates a network message until iteither finds an event listener such as OnClickListener()and OnSeekBarChangeListener(), or reaches the top ofthe call chain. Second, Lumos identifies the resource IDto which an event listener is attached. Android allows adeveloper to register an event listener of a UI componenteither dynamically (in the app code) or statically (in theapp’s XML manifest). Statically-registered event listeners arenot difficult to identify since it simply requires parsing of

class: BrightnessSeekBarViewpublic BrightnessSeekBarView(Context arg7, AttributeSet arg8, int arg9) { resource id Backward taint analysisthis.a this.findViewById(0x7F0D009C); Semantic r OnSeekBarChangeListener)this)); a seed of backward taint}Action: turn onURI signature & trafficAll OFFid: brightness sliderhttp://(.*)/api/(.*)/lights/([0-9]) /stateReq body signature & trafficIdentify findviewbyId(AbstractBrightnessSeebar){on: (.*), bri: ([0-9] )}6: Examplesof Networksignature,control,andFig.Figure6: Examplesof Networksignature,UI UIcontrol,Figure6: IDExamplesNetwork signature, UI control,andstringfor ofHUEstringforIDHUEapp appand string ID for HUE appthe sonXMLHowever,dynamically-registeredeventis thatmanifest.it frequentlysends HTTPrequests for device statusare not straightforwardto to ourrequestsmanualtrafficanalysis,allson listenersis thatit frequentlysends HTTPfordeviceperformsanalysisto identifydynamically-registeredapps in furtherour assynchronization.to our manualtrafficanalysis, alleventlisteners.Lumosstartsthis analysisby lookingup thelongastheappisintheforeground.Wesuspectthis is dueapps in our dataset continuously perform synchronizationascall siteof every event listenerregistrationmethod (e.g.,the needsynchronizationdelayslong astotheapp isforinminimizingthe foreground.We suspectthistoisenrichduesetOnClickListener()). From there, it finds all objectsuser experiencs.to o isolatea UI componentgenerates,Lumosthat invokethetransactionsevent encs.uses staticprogram analysis.UIIt objects.takes an Androidapp binaryare ctionsa UIcomponentLumos(APK) backwardas an inputtaintandthenpairsUI generates,componentand the UIperformsanalysisfora ol/statusmessagesthattheUIcomuntil it finds a method that provides the resourceID foraddition,dependenciesbetween(APK)as an generates.inputthenpairsit tracksa UI componentandtheis athe ponentobject,such andas InfindByViewID(). This resourceIDmessages toofdynamicallyfieldsthat hat fromthe stringUIcomhexadecimal,andwe can learnfind thecorrespondingID sisisponentgenerates.In addition,it tracksanotherXML file(public.xml). dependencies betweendoneinoffline.messagesto dynamicallylearnhowfieldsLumosthat comefroma previousExample:Fig. 7 showscouplesUI compoBuildingnetworksignatures:To Note,identifythetheanalysisexact hecontrolmessageitgeneratesusingthe isHUEand status message that a UI component generatesordonein d,whichdisplays after receiving it, we start from an existing tool, ExLumosidentifiesto be(eventually)by OnSeekBarChangeBuildingnetworkidentifythe exactcontractocol[35], signatures:thatconductsaTostatictaintanalysisto extractTherefore,Lumosfor a ethata thatUIlookscomponentgeneratesnetworksignaturesan esthatfromregistersthe eventhandler.displaysafter receivingit, westartexistingtool,Exreceives.It automaticallyallanapp-definedmethodsFromtheit computesa taintbackwardsliceto tsmessagesignatures,andtractocolthatsite,conductsa staticanalysisextractthe outputsobject(this.a) to whichtheAndroideventwasattached.themin regularexpressions.Fig. handler6 appshowsanexamplenetworkmessagesignaturesthat a callmethodsgraphwhichofIt finallyreachesthe receives.It automaticallyidentifiesallan appasas its control-flowandpublic.xmlinterprocedurallabeledas wella messages,“brightnessslider”messagein. datathatissendnetworkextractssignatures,andflow graphs. We leverage the information in the next phaseoutputs them in regular expressions. Fig. 6 shows an exampleof our analysis.C. LearningfromNetwork Trafficregularexpression.Extractocolalso provides a call graph ofUI control identification: Given an app’s call graph and itsan appnetworkas wellsignatures,as its control-flowanddatawe associateeachinterproceduralmessagesignaturewith usThe UI-signature pairswe extractfrom anapp allowflow graphs.Weleveragetheinformationinthenextphasea UI componentthat generatesa networkto distinguishthe traffictriggeredby the messageapp’s UImatchingfrom otherof ouranalysis.thesignature.goal is to preciselyidentifythe networktraffic.However, Thestatically-extractedmessagesignaturesdo notmessagesgeneratedbytheUIactionsofauser,whenUI controlidentification:Givenanapp’scallgraphand theitsandprovide run-time values, such as URIs, query s.The turewithheaders. Yet, Lumos should be able to construct a networkaccessibilitythatAPIgeneratesallows monitoringwhichUI components aa UIrequestcomponenta networkmessagefor monitoringand controllingthe statusmatchingof a device.user interacts with, by providing their resource IDs. If we canthe Thissignature.Thegoalmustis llinactualvalues.associate a resource ID to a network message signature, addressthis,Lumosintegratesarun-timepacketcan isolate the traffic that the UI component generates ticanalysis.ForLumos-gateway that observes traffic.accessibilityAPItakesallowswhichthis, Lumos-appreplaysall of theinteractionsrecorded theina theLumosthe monitoringfollowingtwostepsUIto componentsaccomplishphaseto arninginteractswith,by§IV-Aprovidingtheir resourceIDs.IfeventweLumoscangoal. First,it ssageseveryassociatea resourceID togeneratea ponenthascomponentan romthephasecanthatisolatethe thetrafficthat theUIgeneratesfromevent Withlisteneris equivalentto identifyingobtainsa UI compoin an§IV-B.task, observesnent.takesLumosa backwardcallgraphfor shownthis;messageinstancesforeach analysisaccomplishtheit instartsfromeachapp-definedmethodfromthat generatesa tworkgoal. First, it identifies all developer-implemented event lismessageHowever,(given by someExtractocol)it eitherfindschangean eventmessages.the untilattributevaluestenersthateventuallyge

We evaluate Lumos using 15 commercial off-the-shelf smart-home devices. We show that Lumos can learn the semantics of IoT operation from network traffic for all device features (29 out of 29), and is able to generate status and control messages for most features (26 out of 29, §V-B). Finally, our user study with 24 participants shows that Lumos