Recorded at SpringOne2GX 2014.
Speakers: Vineet Goel, Girish Lingappa, Rodrigo Meneses
Big Data Track
As Hadoop goes mainstream in enterprise big data deployments, IT organizations expect and demand enhanced operational management of their Hadoop clusters in production. Admins require more than just cluster health monitoring; they need the ability to do real time workload analysis for performance tuning and troubleshooting. Real-time log analysis of jobs at a user or application level can allow admins to manage and tune workloads better, especially in multi-tenancy Hadoop cluster services. Join us to learn how Pivotal team leveraged Spring XD data ingestion and batch processing framework, GemFire XD & other components to solve this interesting challenge on a large 1000-node (Analytics Workbench) cluster. Using Spring XD to ingest YARN service and MapReduce application logs through a real-time data pipeline into HDFS, the team leveraged familiar SQL-based queries to analyze fine-grained cluster utilization.