心如止水

2015年11月29日星期日

SVN file marked as a binary type 问题处理

http://bigsec.net/one/tool/svn.html

Java7 ForkJoin框架

概述：
http://www.importnew.com/2279.html
example:
http://www.javacodegeeks.com/2013/02/java-7-forkjoin-framework-example.html

读取sequenceFile

http://hadooptutorial.info/reading-and-writing-sequencefile-example/
http://www.programcreek.com/java-api-examples/index.php?api=org.apache.hadoop.io.DataInputBuffer

跨域登录技术

淘宝跨域获取Cookie分析
http://www.oschina.net/question/4873_18517

说说JSON和JSONP，也许你会豁然开朗，含jQuery用例
http://www.cnblogs.com/dowinning/archive/2012/04/19/json-jsonp-jquery.html

jsonp实现跨域读写cookie
http://blog.csdn.net/pangudashu/article/details/19078535

跨域登录技术讨论
http://www.liaoqiqi.com/post/245

Java原子变量

比AtomicLong还高效的LongAdder 源码解析

http://ifeve.com/atomiclong-and-longadder/

2015年11月28日星期六

星巴克不使用两阶段提交

原文: http://www.eaipatterns.com/ramblings/18_starbucks.html

译文：http://blog.csdn.net/linzhiqi07/article/details/8483784?utm_source=jiancool

2015年11月15日星期日

Spark Streaming DirectAPI

为了实现HA，spark Streaming App需要满足三个条件：

WAL: write ahead log
checkpoint
Reliable Receiver

以上条件可以满足an-least-once语义，同时，WAL的性能消耗较大。对于已经做好数据持久化和安全性的上游系统而言，WAL略显多余(所有数据会被再持久化一次)，只需要记录数据的metadata即可.出现Failures时，重新读取数据即可。

Kafka的DirectAPI提供了一种方案：不使用Receiver，而是直接实现InputDStream，将Kafka中的数据读取为RDD，依靠Spark自身提供的RDD HA保证了输入端的exactly-once语义。同时，因为没有Receiver，所有的数据不会被WAL，性能也更好了。

备注：DirectKafkaInputDStream读取Kafka中的数据为RDD，按照Kafka的Topic和Partition进行Partition。所有的数据即读即取，也就没有了Active Batches，不需要额外的RDD数据备份。

KafkaDirectAPI实现要点：

DirectKafkaInputDStream-->{KafkaRDD(partitions) ...}-->Kafka Simple Consumer
依据RDD的容错机制，省去了WAL的性能消耗

DirectKafkaInputDStream重新设计的CheckPoint需要保存的数据，即Kafka读取的offset。

当然，对于其他类似的上游系统，也可以设计类似的DirectApi,前提条件是：能够提供和Kafka类似的offset机制，即消息有唯一的ID标识，且ID是顺序的，可以保证重复读取。