Introduction To Apache Cassandra

Speaker: Christopher Batey Time series data is everywhere: IoT, sensor data, financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. However data is pointless without being able to process it in near real time or do batch analytics. That's where Spark combined with Cassandra comes in, what was one just your storage system can be transformed into your analytics system, and you'll be surprised how easy it is! Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra also places a high value on performance. In 2012, University of Toronto researchers studying NoSQL systems concluded that "In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments" although "this comes at the price of high write and read latencies Apache Spark is a fast and general engine for large-scale data processing. Venue: Wilkins Gustave Tuck Lecture Theatre, UCL ---- video by Meetupvideo (http://www.meetupvideo.com) real-time nosql statistics talks