Analytics Zoo: Distributed Tensorflow, Keras And BigDL In .

Transcription

Analytics Zoo: Distributed Tensorflow, Keras andBigDL in production on Apache SparkJennie Wang, Big Data Technologies, IntelStrata2019

Agenda Motivation BigDL Analytics Zoo Real-world applications Conclusion and Q&AStrata2019

MotivationsTechnology and Industry TrendsReal World ScenariosStrata2019

Trend #1: Data Scale DrivingDeep Learning Process“Machine Learning Yearning”,Andrew Ng, 2016Strata2019

Trend #2: Real-World ML/DL Systems AreComplex Big Data Analytics Pipelines“Hidden Technical Debt in Machine Learning Systems”,Sculley et al., Google, NIPS 2015 PaperStrata2019

Trend #3: Hadoop Becoming theCenter of Data GravityPhillip Radley, BT GroupStrata Hadoop World 2016 San JoseMatthew Glickman, Goldman SachsSpark Summit East 2015Strata2019

Unified Big Data Analytics PlatformApache Hadoop & Spark ameDataProcessing& AnalysisSQLResource Mgmt& Co-ordinationFlumeStormMRGiraphML PipelinesSparkRStreamingMLlibGraphXSpark quetAvroHBaseStrata2019

Chasm b/w Deep Learning and Big DataCommunitiesTheChasmDeep learning expertsAverage users (big data users, data scientists, analysts, etc.)Strata2019

Large-Scale Image Recognition at JD.comStrata2019

Bridging the ChasmMake deep learning more accessible to big data and data science communities Continue the use of familiar SW tools and HW infrastructure to build deep learningapplications Analyze “big data” using deep learning on the same Hadoop/Spark cluster where thedata are stored Add deep learning functionalities to large-scale big data programs and/or workflow Leverage existing Hadoop/Spark clusters to run deep learning applications Shared, monitored and managed with other workloads (e.g., ETL, data warehouse, featureengineering, traditional ML, graph analytics, etc.) in a dynamic and elastic fashionStrata2019

BigDLBringing Deep Learning To Big Data Platform Distributed deep learning framework for Apache Spark* Make deep learning more accessible to big data usersand data scientists Write deep learning applications as standard Spark programs Run on existing Spark/Hadoop clusters (no changes needed) Feature parity with popular deep learning frameworks E.g., Caffe, Torch, Tensorflow, etc. High performance (on CPU) Powered by Intel MKL and multi-threaded programming Efficient scale-out Leveraging Spark for distributed training & inferenceDataFrameML PipelineSQLSparkR StreamingMLlibGraphXSpark //bigdl-project.github.io/Strata2019

BigDL Run as Standard Spark ProgramsStandard Spark jobs No changes to the Spark or Hadoop clusters neededIterative Each iteration of the training runs as a Spark jobData parallel Each Spark task runs the same model on a subset of the data (batch)BigDL ProgramDL App on DriverSparkProgramBigDL Sparklibrary jobsWorkerSparkExecutor(JVM)Spark BigDL libTaskWorkerWorkerIntel MKLStandardSpark jobsWorkerSparkExecutor(JVM)Spark BigDL libTaskWorkerWorkerIntel MKLStrata2019

Distributed Training in BigDLParameter Server Architecturedirectly inside Spark (using Block Manager)3Gradient54 Weight 23Gradient4 Worker11GradientWeight 2Worker35 54Weight 2Worker1 Partition 1Partition 2Partition nTraining SetPeer-2-Peer All-Reduce synchronizationStrata2019

Training ScalabilityThroughput of ImageNet Inception v1 training (w/ BigDL 0.3.0 and dual-socket Intel Broadwell 2.1 GHz);the throughput scales almost linear up to 128 nodes (and continue to scale reasonably up to 256 nodes).Source: Scalable Deep Learning with BigDL on the Urika-XC Software ning-bigdl-urika-xc-software-suite/)Strata2019

Analytics ZooA unified analytics AI platform for distributedTensorFlow, Keras and BigDL on Apache zooStrata2019

Analytics ZooUnified Analytics AI Platform for Big DataDistributed TensorFlow, Keras and BigDL on SparkReference Use CasesBuilt-In Deep LearningModelsFeature EngineeringHigh-Level Pipeline APIsBackbends Anomaly detection, sentiment analysis, fraud detection, imagegeneration, chatbot, etc. Image classification, object detection, text classification, text matching,recommendations, sequence-to-sequence, anomaly detection, etc.Feature transformations for Image, text, 3D imaging, time series, speech, etc. Distributed TensorFlow and Keras on Spark Native support for transfer learning, Spark DataFrame and ML Pipelines Model serving API for model serving/inference pipelinesSpark, TensorFlow, Keras, BigDL, OpenVINO, MKL-DNN, oo/https://analytics-zoo.github.io/Strata2019

Analytics ZooUse CasesAnomalyDetectionSentimentAnalysisFraud DetectionImage er(VAE)Web servicesHigh-LevelPipeline n Deep Learning equenceKeras-likeAPIsDataFrame and MLpipeline supportImage3DImageTextSpeechModel rasBigDLOpenVINOMKLDNNStrata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale Model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale POJO model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

Distributed TensorFlow on Spark in Analytics Zoo1. Data wrangling and analysis using PySparkfrom zoo import init nncontextfrom zoo.pipeline.api.net import TFDatasetsc init nncontext()#Each record in the train rdd consists of a list of NumPy ndrraystrain rdd sc.parallelize(file list).map(lambda x: read image and label(x)).map(lambda image label: decode to ndarrays(image label))#TFDataset represents a distributed set of elements,#in which each element contains one or more TensorFlow Tensor objects.dataset TFDataset.from rdd(train rdd,names ["features", "labels"],shapes [[28, 28, 1], [1]],types [tf.float32, tf.int32],batch size BATCH SIZE)Strata2019

Distributed TensorFlow on Spark in Analytics Zoo2. Deep learning model development using TensorFlowimport tensorflow as tfslim tf.contrib.slimimages, labels dataset.tensorslabels tf.squeeze(labels)with slim.arg scope(lenet.lenet arg scope()):logits, end points lenet.lenet(images, num classes 10, is training True)loss tf.reduce mean(tf.losses.sparse softmax cross entropy(logits logits,labels labels))Strata2019

Distributed TensorFlow on Spark in Analytics Zoo3. Distributed training on Spark and BigDLfrom zoo.pipeline.api.net import TFOptimizerfrom bigdl.optim.optimizer import MaxIteration, Adam, MaxEpoch, TrainSummaryoptimizer TFOptimizer.from loss(loss, Adam(1e-3))optimizer.set train summary(TrainSummary("/tmp/az lenet", "lenet"))optimizer.optimize(end trigger MaxEpoch(5))More icszoo/blob/master/apps/tfnet/image classification flow/distributed training/train stributed training/train mnist keras.pyStrata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale POJO model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

Keras, Autograd &Transfer Learning APIs1. Use transfer learning APIs to Load an existing Caffe modelRemove last few layersFreeze first few layersAppend a few layersfrom zoo.pipeline.api.net import *full model Net.load caffe(def path, model path)# Remove layers after pool5model full model.new graph(outputs ["pool5"])# freeze layers from input to res4f inclusivemodel.freeze up to(["res4f"])# append a few layersimage Input(name "input", shape (3, 224, 224))resnet model.to keras()(image)resnet50 Flatten()(resnet)Build Siamese Network Using Transfer LearningStrata2019

Keras, Autograd &Transfer Learning APIs2. Use Keras-style and autograd APIs to build the Siamese Networkimport zoo.pipeline.api.autograd as Afrom zoo.pipeline.api.keras.layers import *from zoo.pipeline.api.keras.models import *input Input(shape [2, 3, 226, 226])features TimeDistributed(layer resnet50)(input)f1 features.index select(1, 0) #image1f2 features.index select(1, 1) #image2diff A.abs(f1 - f2)fc Dense(1)(diff)output Activation("sigmoid")(fc)model Model(input, output)Build Siamese Network Using Transfer LearningStrata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale POJO model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

nnframesNative DL support in Spark DataFrames and ML Pipelines1. Initialize NNContext and load images into DataFrames using NNImageReaderfrom zoo.common.nncontext import *from zoo.pipeline.nnframes import *sc init nncontext()imageDF NNImageReader.readImages(image path, sc)2. Process loaded data using DataFrame transformationsgetName udf(lambda row: .)df imageDF.withColumn("name", getName(col("image")))3. Processing image using built-in feature engineering operationsfrom zoo.feature.image import *transformer ChainedPreprocessing([RowToImageFeature(), ImageChannelNormalize(123.0, 117.0, 104.0),ImageMatToTensor(), ImageFeatureToTensor()])Strata2019

nnframesNative DL support in Spark DataFrames and ML Pipelines4. Define model using Keras-style APIfrom zoo.pipeline.api.keras.layers import *from zoo.pipeline.api.keras.models import *model Sequential().add(Convolution2D(32, 3, 3, activation 'relu', input shape (1, 28, 28))) \.add(MaxPooling2D(pool size (2, 2))) \.add(Flatten()).add(Dense(10, activation 'softmax')))5. Train model using Spark ML PipelinesEstimater NNEstimater(model, CrossEntropyCriterion(), transformer) och(1) nModel estimater.fit(df)Strata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale POJO model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

Working with Image1. Read images into local or distributed ImageSetfrom zoo.common.nncontext import *from zoo.feature.image import *spark init nncontext()local image set ImageSet.read(image path)distributed image set ImageSet.read(image path, spark, 2)2. Image augmentations using built-in ImageProcessing operationstransformer Jitter(),ImageExpand(max expand ratio 2.0),ImageResize(300, 300, -1),ImageHFlip()])new local image set transformer(local image set)new distributed image set transformer(distributed image set)Image Augmentations Using Built-in Image Transformations (w/ OpenCV on Spark)Strata2019

Working with Text1. Read text into local or distributed TextSetfrom zoo.common.nncontext import *from zoo.feature.text import *spark init nncontext()local text set TextSet.read(text path)distributed text set TextSet.read(text path, spark, 2)2. Build text transformation pipeline using built-in operationstransformedTextSet uence(len).generateSample()\\\\\Strata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale POJO model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

POJO Model Serving APIimport tInferenceModel;public class TextClassification extends AbstractInferenceModel {public RankerInferenceModel(int concurrentNum) {super(concurrentNum);}.}public class ServingExample {public static void main(String[] args) throws IOException {TextClassification model new TextClassification();model.load(modelPath, weightPath);texts List JTensor inputs preprocess(texts);for (JTensor input : inputs) {List Float result model.predict(input.getData(), input.getShape());.}}Strata2019

OpenVINO Support for Model Servingfrom zoo.common.nncontext import init nncontextfrom zoo.feature.image import ImageSetfrom zoo.pipeline.inference import InferenceModelsc init nncontext("OpenVINO Object Detection Inference Example")images ImageSet.read(options.img path, sc,resize height 600, resize width 600).get image().collect()input data np.concatenate([image.reshape((1, 1) image.shape) for image in images], axis 0)model InferenceModel()model.load tf(options.model path, backend "openvino", model type options.model type)predictions model.predict(input data)# Print the detection result of the first image.print(predictions[0])Transparently support OpenVINO in model serving,which deliver a significant boost for inference speedStrata2019

Analytics ZooBuild end-to-end deep learning applications for big data Distributed TensorFlow on Spark Keras-style APIs (with autograd & transfer learning support) nnframes: native DL support for Spark DataFrames and ML Pipelines Built-in feature engineering operations for data preprocessingProductionize deep learning applications for big data at scale POJO model serving APIs (w/ OpenVINO support) Support Web Services, Spark, Storm, Flink, Kafka, etc.Out-of-the-box solutions Built-in deep learning models and reference use casesStrata2019

Built-in Deep Learning Models Object detection E.g., SSD, Faster-RCNN, etc. Image classification E.g., VGG, Inception, ResNet, MobileNet, etc. Text classification Text classifier (using CNN, LSTM, etc.) Recommendation E.g., Neural Collaborative Filtering, Wide and Deep Learning, etc. Anomaly detection Unsupervised time series anomaly detection using LSTM Sequence-to-sequenceStrata2019

Object Detection API1. Load pretrained model in Detection Model Zoofrom zoo.common.nncontext import *from zoo.models.image.objectdetection import *spark init nncontext()model ObjectDetector.load model(model path)2. Off-the-shell inference using the loaded modelimage set ImageSet.read(img path, spark)output model.predict image set(image set)3. Visualize the results using utility methodsconfig model.get config()visualizer Visualizer(config.label map(), encoding "jpg")visualized visualizer(output).get image(to chw False).collect()Off-the-shell Inference Using Analytics Zoo Object Detection rata2019

Sequence-to-Sequence APIbridgeencoderdecoderSequence to sequence modelencoder RNNEncoder.initialize(rnn type, nlayers, hidden size, embedding)decoder RNNDecoder.initialize(rnn type, nlayers, hidden size, embedding)seq2seq Seq2seq(encoder, decoder)Strata2019

Reference Use Cases Anomaly Detection Using LSTM network to detect anomalies in time series data Fraud Detection Using feed-forward neural network to detect frauds in credit card transaction data Recommendation Use Analytics Zoo Recommendation API (i.e., Neural Collaborative Filtering, Wide and Deep Learning) forrecommendations on data with explicit feedback. Sentiment Analysis Sentiment analysis using neural network models (e.g. CNN, LSTM, GRU, Bi-LSTM) Variational Autoencoder (VAE) Use VAE to generate faces and digital numbers Web services Use Analytics Zoo model serving APIs for model inference in web s-zoo/tree/master/appsStrata2019

Real-World ApplicationsObject detection and image feature extraction at JD.comProduce defect detection using distributed TF on Spark in MideaNLP based customer service chatbot for Microsoft AzureImage similarity based house recommendation for MLSlistingLSTM-Based Time Series Anomaly Detection for BaosightFraud Detection for Payment Transactions for UnionPayStrata2019

Object Detection and Image Feature Extraction atJD.comStrata2019

ApplicationsLarge-scale image feature extraction Object detect (remove background, optional) Feature extractionApplication Similar image search Image Deduplication Competitive price monitoring IP (image copyright) protection systemSource: “Bringing deep learning into big data analytics using BigDL”, Xianyan Jia and Zhenhua Wang, Strata Data Conference Singapore 2017Strata2019

Similar Image SearchQuerySearch ResultQuerySearch Result------Source: “Bringing deep learning into big data analytics using BigDL”, Xianyan Jia and Zhenhua Wang, Strata Data Conference Singapore 2017Strata2019

Challenges of Productionizing Large-ScaleDeep Learning SolutionsProductionizing large-scale seep learning solutions is challenging Very complex and error-prone in managing large-scale distributed systems E.g., resource management and allocation, data partitioning, task balance, fault tolerance,model deployment, etc. Low end-to-end performance in GPU solutions E.g., reading images out from HBase takes about half of the total time Very inefficient to develop the end-to-end processing pipeline E.g., image pre-processing on HBase can be very complexStrata2019

Production Deployment with Analytics Zoo forSpark and BigDL Reuse existing Hadoop/Spark clusters for deep learning with no changes (image search, IP protection, etc.) Efficiently scale out on Spark with superior performance (3.83x speed-up vs. GPU severs) as benchmarked by t-jdcom Strata2019

Produce Defect Detection using Distributed TF onSpark in -distributed-tensorflow-on-analyticsStrata2019

Produce Defect Detection using Distributed TF onSpark in MideaStrata2019

NLP Based Customer Service Chatbot for t-1

Image Similarity Based HouseRecommendation for MLSlistingsMLSlistings built image-similarity based house recommendationsusing BigDL on Microsoft ommendationsStrata2019

Image Similarity Based HouseRecommendation for MLSlistingsRDD of housephotos(is exterior, style,Is house exterior?Is house exterior?Is houseTagsexterior?floors) of imagesImage preprocessing{0, 1}{0, 1}{0, 1}Three pre-trained Inception v1 models (finetuned as classifiers)ImageRDD of rocessingprocessingIs house exterior?Image featuresHouse StyleHouse StylePre-trained VGG16 model (to extractfeatures)Store image tags andHouse Style feature in table storage{0, 1}{0, 1, 2, 3}{0, 1, 2, 3}{0, 1, 2, 3}Strata2019

LSTM-Based Time Series Anomaly Detectionfor ytics-zoofor-apache-spark-and-bigdlStrata2019

Fraud Detection for Payment Transactionsfor UnionPayTestSpark ndidateselected featuresall featuresFeatureEngineeringTrain one nmodelPre-processingModelEvaluation& Fine TunemodelPostProcessingTrain one modelsampledpartitionSpark PipelineTraining DataTest emodelTrain one modelPredictions fraudHive TableSpark DataFrameNeural Network Model Using BigDLhttps://mp.weixin.qq.com/s? biz MzI3NDAwNDUwNg &mid 2648307335&idx 1&sn 8eb9f63eaf2e40e24a90601b9cc03d1fStrata2019

Unified Analytics AI PlatformDistributed TensorFlow, Keras and BigDL on Apache zooStrata2019

Strata2019

Legal Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabledhardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences inhardware, software, or configuration will affect actual performance. Consult other sources ofinformation to evaluate performance as you consider your purchase. For more completeinformation about performance and benchmark results, visit http://www.intel.com/performance.Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S.and/or other countries.*Other names and brands may be claimed as the property of others. 2019 Intel CorporationStrata2019

Strata2019 Bridging the Chasm Make deep learning more accessible to big data and data science communities Continue the use of familiar SW tools and HW infrastructure to build deep learning applications Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the data are stored Add deep learning