心如止水: 十一月 2015

2015年11月29日星期日

SVN file marked as a binary type 问题处理

http://bigsec.net/one/tool/svn.html

Java7 ForkJoin框架

概述：
http://www.importnew.com/2279.html
example:
http://www.javacodegeeks.com/2013/02/java-7-forkjoin-framework-example.html

读取sequenceFile

http://hadooptutorial.info/reading-and-writing-sequencefile-example/
http://www.programcreek.com/java-api-examples/index.php?api=org.apache.hadoop.io.DataInputBuffer

跨域登录技术

淘宝跨域获取Cookie分析
http://www.oschina.net/question/4873_18517

说说JSON和JSONP，也许你会豁然开朗，含jQuery用例
http://www.cnblogs.com/dowinning/archive/2012/04/19/json-jsonp-jquery.html

jsonp实现跨域读写cookie
http://blog.csdn.net/pangudashu/article/details/19078535

跨域登录技术讨论
http://www.liaoqiqi.com/post/245

Java原子变量

比AtomicLong还高效的LongAdder 源码解析

http://ifeve.com/atomiclong-and-longadder/

2015年11月28日星期六

星巴克不使用两阶段提交

原文: http://www.eaipatterns.com/ramblings/18_starbucks.html

译文：http://blog.csdn.net/linzhiqi07/article/details/8483784?utm_source=jiancool

2015年11月15日星期日

Spark Streaming DirectAPI

为了实现HA，spark Streaming App需要满足三个条件：

WAL: write ahead log
checkpoint
Reliable Receiver

以上条件可以满足an-least-once语义，同时，WAL的性能消耗较大。对于已经做好数据持久化和安全性的上游系统而言，WAL略显多余(所有数据会被再持久化一次)，只需要记录数据的metadata即可.出现Failures时，重新读取数据即可。

Kafka的DirectAPI提供了一种方案：不使用Receiver，而是直接实现InputDStream，将Kafka中的数据读取为RDD，依靠Spark自身提供的RDD HA保证了输入端的exactly-once语义。同时，因为没有Receiver，所有的数据不会被WAL，性能也更好了。

备注：DirectKafkaInputDStream读取Kafka中的数据为RDD，按照Kafka的Topic和Partition进行Partition。所有的数据即读即取，也就没有了Active Batches，不需要额外的RDD数据备份。

KafkaDirectAPI实现要点：

DirectKafkaInputDStream-->{KafkaRDD(partitions) ...}-->Kafka Simple Consumer
依据RDD的容错机制，省去了WAL的性能消耗

DirectKafkaInputDStream重新设计的CheckPoint需要保存的数据，即Kafka读取的offset。

当然，对于其他类似的上游系统，也可以设计类似的DirectApi,前提条件是：能够提供和Kafka类似的offset机制，即消息有唯一的ID标识，且ID是顺序的，可以保证重复读取。

2015年11月11日星期三

OLAP简单记

OLAP(On-line Analytical Processing，联机分析处理)是在基于数据仓库多维模型的基础上实现的面向分析的各类操作的集合。

事实表(Fact Table)和维表(Dimension Table)

事实表是用来记录具体事件的，包含了每个事件的具体要素，以及具体发生的事情；维表则是对事实表中事件的要素的描述信息。比如一个事件会包含时间、地点、人物、事件，事实表记录了整个事件的信息，但对时间、地点和人物等要素只记录了一些关键标记，比如事件的主角叫“Michael”，那么Michael到底“长什么样”，就需要到相应的维表里面去查询“Michael”的具体描述信息了。基于事实表和维表就可以构建出多种多维模型，包括星形模型、雪花模型和星座模型。事实表往往是针对一个维度的数据信息，维度表是多个事实表的交叉。

OLAP的基本操作
OLAP的操作是以查询——也就是数据库的SELECT操作为主，但是查询可以很复杂，比如基于关系数据库的查询可以多表关联，可以使用COUNT、SUM、AVG等聚合函数。OLAP正是基于多维模型定义了一些常见的面向分析的操作类型是这些操作显得更加直观。

　　OLAP的多维分析操作包括：钻取（Drill-down）、上卷（Roll-up）、切片（Slice）、切块（Dice）以及旋转（Pivot），下面还是以上面的数据立方体为例来逐一解释下：

钻取（Drill-down）：在维的不同层次间的变化，从上层降到下一层，或者说是将汇总数据拆分到更细节的数据，比如通过对2010年第二季度的总销售数据进行钻取来查看2010年第二季度4、5、6每个月的消费数据，如上图；当然也可以钻取浙江省来查看杭州市、宁波市、温州市……这些城市的销售数据。

上卷（Roll-up）：钻取的逆操作，即从细粒度数据向高层的聚合，如将江苏省、上海市和浙江省的销售数据进行汇总来查看江浙沪地区的销售数据，如上图。

切片（Slice）：选择维中特定的值进行分析，比如只选择电子产品的销售数据，或者2010年第二季度的数据。

切块（Dice）：选择维中特定区间的数据或者某批特定值进行分析，比如选择2010年第一季度到2010年第二季度的销售数据，或者是电子产品和日用品的销售数据。

旋转（Pivot）：即维的位置的互换，就像是二维表的行列转换，如图中通过旋转实现产品维和地域维的互换。

zookeeper学习使用

基于zookeeper的服务发现。
http://www.jdon.com/artichect/zookeeper.html

2015年11月9日星期一

如何序列化一个non serializable的类?

进行Spark开发时，经常会遇到这种问题？如何序列化一个没有实现Serializable接口的类呢？
以下将进行简单说明：Non Serializable记为A。

主要步骤：
0.使用一个外部类Outer将A作为一个成员变量，标记为@transient，且Outer能够通过getter或者其他方式访问A的状态；或者Outer继承A；
1.Outer实现Serializable接口，并且重写readObject和writeObject方法，实现定制化的序列化；
2.Outer在writeObject方法中，将A的状态进行序列化，并且readObject方法中，将A的状态读取出来，对A进行构造。
3.具体实现时，在writeObject方法中，首先调用defaultWriteObject方法，保存所有的non-transient成员，然后保存A的可序列化的状态；在readObject方法中，首先调用defaultReadObject方法，获取所有的non-transient成员，然后读取A的状态。

example：

public class App {

    int quantity;
    int count;

    public int getQuantity() {
        return quantity;
    }

    public void setQuantity(int quantity) {
        this.quantity = quantity;
    }

    public App(int quantity, int count) {
        super();
        this.quantity = quantity;
        this.count = count;
    }

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }

}

public class UnSerializeItem {

    private App nonSerializableProperty;

    public void setNonSerializableProperty(App nonSerializableProperty) {
        this.nonSerializableProperty = nonSerializableProperty;
    }

    public App getNonSerializableProperty() {
        return nonSerializableProperty;
    }
}

public class OuterItem extends UnSerializeItem implements Serializable{

    private static final long serialVersionUID = 1L;

    private int field;

    public int getField() {
        return field;
    }

    public void setField(int field) {
        this.field = field;
    }

    public OuterItem(int quantity, int count, int field) {
        setNonSerializableProperty(new App(quantity, count));
        this.field = field;
    }

    private void writeObject(java.io.ObjectOutputStream out)
            throws IOException {
        out.defaultWriteObject();
        out.writeInt(super.getNonSerializableProperty().getQuantity());
        out.writeInt(super.getNonSerializableProperty().getCount());
    }

    private void readObject(java.io.ObjectInputStream in)
            throws IOException, ClassNotFoundException {
        in.defaultReadObject();
        super.setNonSerializableProperty(new App(in.readInt(), in.readInt()));
    }
}