LogManager分析

来源:互联网 发布:大卫雕塑知乎 编辑:程序博客网 时间:2024/06/14 23:18

在一个broker上的log都是通过LogManger来管理的,LogManager主要负责日志管理,包括日志创建,日志获取,日志清理,所有的读写操作都要委托的那个日志实例

一 核心字段

logDirs: 日志目录

flushCheckMs: flush检查时间

retentionCheckMs: 日志保留检查时间

scheduler:KafkaScheduler调度器,用于调度线程

brokerState: broker状态

ioThreads: 线程

logs:Pool[TopicAndPartition, Log] 用于管理TopicAndPartition和Log之间的对应关系

dirLocks:FileLock集合这些FileLock用来在文件系统层面为每一个log目录加文件锁。在LogMangaer对象初始化的时候,就会将所有log的目录加锁

recoveryPointCheckpoints:Map[File,OffsetCheckpoint] 类型,用于管理每一个log目录与其下的RecoveryPointCheckpoint文件之间的映射关系。在LogManager对象初始化时,会在每一个log目录下创建一个对应的RecoveryPointCheckpoint文件。这个Map的value是OffsetCheckpoint类型的对象,其中封装了对应log目录下RecoveryPointCheckpoint文件,并提供对RecoveryPointCheckpoint文件的读写操作。RecoveryPointCheckpoint文件则记录了该log目录下所有Log的recoveryPointCheckpoint

logCreationOrDeletionLock:创建或者删除Log时需要加锁进行同步


二 重要方法

2.1 LogManager中的定时任务

startup方法会启动三个线程定期执行执行日志清理,日志刷新和recovery 检查点刷新


defstartup() {
  /* Schedule thecleanup task to delete old logs */
 
if(scheduler!= null) {
    info("Starting log cleanup with a period of %d ms.".format(retentionCheckMs))
    scheduler.schedule("kafka-log-retention",
                       cleanupLogs,
                       delay = InitialTaskDelayMs,
                       period = retentionCheckMs,
                       TimeUnit.MILLISECONDS)
    info("Starting log flusher with a default period of %d ms.".format(flushCheckMs))
    scheduler.schedule("kafka-log-flusher",
                       flushDirtyLogs,
                       delay = InitialTaskDelayMs,
                       period = flushCheckMs,
                       TimeUnit.MILLISECONDS)
    scheduler.schedule("kafka-recovery-point-checkpoint",
                       checkpointRecoveryPointOffsets,
                       delay = InitialTaskDelayMs,
                       period = flushCheckpointMs,
                       TimeUnit.MILLISECONDS)
  }
  if(cleanerConfig.enableCleaner)
    cleaner.startup()
}

 

# 清理日志的源码

def cleanupLogs() {  debug("Beginning log cleanup..."var total = val startMs = time.milliseconds  for(log <- allLogs; if !log.config.compact) {    debug("Garbage collecting '" + log.name + "'")    total += log.deleteOldSegments()  }  debug("Log cleanup completed. " + total + " files deleted in " +                (time.milliseconds - startMs) / 1000 + " seconds")}

 

def deleteOldSegments(): Int = {  // 如果日志清理策略时删除,才会删除,如果是合并,则直接返回  if (!config.delete) return deleteRetenionMsBreachedSegments() + deleteRetentionSizeBreachedSegments()}// 删除超过最长存活时间的segments,也就是过期的segmentprivate def deleteRetenionMsBreachedSegments() : Int = {  if (config.retentionMs < 0) return val startMs = time.milliseconds  deleteOldSegments(startMs - _.largestTimestamp > config.retentionMs)}// 根据log的大小决定是否删除最旧的segmentprivate def deleteRetentionSizeBreachedSegments() : Int = {  if (config.retentionSize < 0 || size < config.retentionSize) return var diff = size - config.retentionSize  def shouldDelete(segment: LogSegment) = {    if (diff - segment.size >= 0) {      diff -= segment.size      true    } else {      false    }  }  deleteOldSegments(shouldDelete)}

 

# 刷新日志的源码

private def flushDirtyLogs() = {  debug("Checking for dirty logs to flush..."// 遍历(topicAndPartition, log)  for ((topicAndPartition, log) <- logs) {    try {      // 获取现在距离上次日志刷新的时间      val timeSinceLastFlush = time.milliseconds - log.lastFlushTime      debug("Checking if flush is needed on " + topicAndPartition.topic + " flush interval  " + log.config.flushMs +            " last flushed " + log.lastFlushTime + " time since last flush: " + timeSinceLastFlush)      // 如果到期,则开始刷新日志,调用的是Log#flush方法      if(timeSinceLastFlush >= log.config.flushMs)        log.flush    } catch {      case e: Throwable =>        error("Error flushing topic " + topicAndPartition.topic, e)    }  }}

 

# 定时将每一个Log的recoveryPoint写入RecoveryPointCheckpoint文件的源码

def checkpointRecoveryPointOffsets() {  this.logDirs.foreach(checkpointLogsInDir)}private def checkpointLogsInDir(dir: File): Unit = {  val recoveryPoints = this.logsByDir.get(dir.toStringif (recoveryPoints.isDefined) {    this.recoveryPointCheckpoints(dir).write(recoveryPoints.get.mapValues(_.recoveryPoint))  }}

 

// 先将log目录下所有的LogrecoverPoint写入临时文件,然后再用临时文件替换掉原来的// RecoveryPointCheckpoint文件def write(offsets: Map[TopicAndPartition, Long]) {  lock synchronized {    // write to temp file and then swap with the existing file    val fileOutputStream = new FileOutputStream(tempPath.toFile)    val writer = new BufferedWriter(new OutputStreamWriter(fileOutputStream))    try {      writer.write(CurrentVersion.toString)      writer.newLine()      writer.write(offsets.size.toString)      writer.newLine()      offsets.foreach { case (topicPart, offset) =>        writer.write(s"${topicPart.topic} ${topicPart.partition} $offset")        writer.newLine()      }      writer.flush()      fileOutputStream.getFD().sync()    } catch {      case e: FileNotFoundException =>        if (FileSystems.getDefault.isReadOnly) {          fatal("Halting writes to offset checkpoint file because the underlying file system is inaccessible : ", e)          Runtime.getRuntime.halt(1)        }        throw e    } finally {      writer.close()    }    Utils.atomicMoveWithFallback(tempPath, path)  }}

 

2.2 日志压缩

对于Log的清理策略,主要是为了避免大量日志占满磁盘的情况,log-retention配置的阀值可以是全局的,也可以是基于topic的。Kafka提供2种方式:

第一种:delete,即如果LogSegment到期了删除之,或者LogSegment+这次要添加的消息 > LogSegment的最大容量则删除最老的的LogSegment

第二种:compact,进行log压缩,也可以有效减少日志文件文件大小,缓解磁盘紧张情况。

在有些场景中,key和对应的value的值是不断变化的,就像数据库的记录一样,可以对同一条记录进行修改,在kafka中表现就是在某一时刻key=value1,可能在后续某一时刻,添加了一个key=value2的消息,如果消费者只关心最新的值,那么日志压缩就显得很有用,如下图所示:


# 如果启用日志压缩,并不会针对active log segment,即active log segment不会参加日志压缩,只是针对只读的log segment,避免active log segment成为热点,既要读写还要压缩

# 日志压缩的过程是通过多个cleaner线程来完成的,所以我们可以通过调整cleaner线程数来并发压缩性能,减少对整个服务器端的影响

# 一般情况下,Log数据量很大,为了避免cleaner线程和其他业务线程长时间竞争CPU,并不会将active log segment之外的其他可读LogSegment在一次压缩中全部处理掉,而是将这些LogSegment分批处理

 

# 每一个log都会被cleanercheckpoint分成clean和dirty2部分,如下图所示:


clean: 表示已经压缩的部分,压缩之后,offset是断断续续的,不是增量递增的

dirty: 表示未压缩部分,offset依然是连续递增的

# 每一个clean线程根据日志文件cleanableRatio值来优先压缩该值较效的log,也就说dirty占整个日志的比率越大优先级越高。

 

# clean线程在选定需要清理的log后,首先为dirty部分消息建立一个<key,key对应的last offset>的映射关系,该映射通过SkimpyOffset

Map维护,然后重新复制LogSegment,只保留SkimpyOffsetMap中记录的消息,抛弃掉其他消息

 

# 经过日志压缩后,日志文件和索引文件会不断减小,cleaner线程还会对相邻的LogSegmeng进行合并,避免出现过小的日志和索引文件

 


LogCleaner持有一个cleaners字段,用于管理CleanerThread,这个类的主要方法都委托给了LogCleanerManager。

LogCleanerManager主要用于负责每一个Log的压缩状态管理以及cleaner checkpoint的维护和更新:

offsetCheckpointFile: cleaner checkpoint要写入的文件

checkpoints:Map[File,OffsetCheckpoint]类型,用来维护数据目录和cleaner-offset-checkpoint文件之间的对应关系

inProgress:HashMap[TopicAndPartition, LogCleaningState],记录正在进行的清理的TopicAndPartition的压缩状态

pausedCleaningCond: 暂停分区的清理工作

dirtiestLogCleanableRatio:dirty部分占log文件的比例

lock: 保护checkpoints和inProgress集合的锁

 

当开始进行日志压缩的时候会先进入LogCleaningInProgress状态;压缩任务可以被暂停,状态被变为LogCleaningPaused,直到有线程恢复它;压缩任务被中断,则进入LogCleaningAborted状态


# updateCheckpoints方法:

用于修改cleaner-offset-checkpoint

defupdateCheckpoints(dataDir:File, update:Option[(TopicAndPartition,Long)]) {
  inLock(lock) {
    // 获取指定目录cleaner-offset-checkpoint文件
   
val checkpoint= checkpoints(dataDir)
    // 更新会对相同的keyvalue进行覆盖
   
val existing= checkpoint.read().filterKeys(logs.keys)++ update
   
// 更新cleaner-offset-checkpoint文件
   
checkpoint.write(existing)
  }
}

 

# grabFilthiestCompactedLog

选择要清理的日志文件,接下来把它添加到in-progress

//选择要清理的日志文件,接下来把它添加到in-progress
def grabFilthiestCompactedLog(time:Time): Option[LogToClean] = {
  inLock(lock) {
    val now = time.milliseconds
   
// 获取全部logcleaner checkpoint
   
val lastClean= allCleanerCheckpoints
   
// 过滤掉clean.policy配置项为deletedlog,因为他们不会进行压缩
   
val dirtyLogs= logs.filter{
      case (_, log) =>log.config.compact
   
}.filterNot {//跳过已经在in-progress集合里的Log
     
case (topicAndPartition, _) =>inProgress.contains(topicAndPartition)// skip anylogs already in-progress
   
}.map {
      case (topicAndPartition,log) => // 为每一个日志创建一个对应的LogToClean实例
        //
获取log中可以进行cleandirty offset和不能进行的dirty offset
       
val (firstDirtyOffset,firstUncleanableDirtyOffset) = LogCleanerManager.cleanableOffsets(log,topicAndPartition,
          lastClean, now)
        LogToClean(topicAndPartition,log, firstDirtyOffset,firstUncleanableDirtyOffset)
    }.filter(ltc=> ltc.totalBytes> 0) // skip any empty logs
    //
获取dirtyLogs集合中cleanableRatio最大值的集合,可以理解为清理工作做得最多的比例
   
this.dirtiestLogCleanableRatio= if (dirtyLogs.nonEmpty)dirtyLogs.max.cleanableRatioelse 0
   
// 过滤掉cleanableRatio小于配置的minCleanableRatiolog
   
val cleanableLogs= dirtyLogs.filter(ltc=> ltc.cleanableRatio> ltc.log.config.minCleanableRatio)
    if(cleanableLogs.isEmpty) {
      None
   
} else {
      val filthiest= cleanableLogs.max// 选择要压缩的log
      //
添加或更新此分区对应的状态
     
inProgress
.put(filthiest.topicPartition,LogCleaningInProgress)
      Some(filthiest)
    }
  }
}

 

Cleaner类时真正进行压缩的工作类:每一个CleanerThread会创建一个Cleaner来负责压缩工作:

private[log]def clean(cleanable:LogToClean): Long = {
  stats.clear()
  info("Beginning cleaning of log %s.".format(cleanable.log.name))
  // 获取要清理的log
 
val log= cleanable.log
 
info("Building offset map for %s...".format(cleanable.log.name))
  // 确定日志压缩最大上限的offset
 
val upperBoundOffset= cleanable.firstUncleanableOffset
 
// 构建一个key_hash -> offset的映射,就是遍历LogSegment,并填充到OffsetMap
  //
在填充OffsetMap的过程中,之后追加的log中消息的offset会覆盖之前的相同keyoffset
  // Offset
填满后,就确定了此次日志压缩的结束位置endOffset
 
buildOffsetMap(log,cleanable.firstDirtyOffset,upperBoundOffset, offsetMap)
  // 获取key_hash -> offset的映射最后一个offset
 
val endOffset= offsetMap.latestOffset+ 1
 
stats.indexDone()
 
  // figure outthe timestamp below which it is safe to remove delete tombstones
  // this position is defined to be aconfigurable time beneath the last modified time of the last clean segment
  //
计算科科员安全删除 '删除标志'Log
Segmnet
 
val deleteHorizonMs=
    log.logSegments(0,cleanable.firstDirtyOffset).lastOptionmatch {
      case None => 0L
     
case Some(seg) =>seg.lastModified - log.config.deleteRetentionMs
 
}
  // 决定要被clean的日志的时间戳上限
 
val cleanableHorizionMs= log.logSegments(0,cleanable.firstUncleanableOffset).lastOption.map(_.lastModified).getOrElse(0L)

  // 对要压缩的segment进行分组,分组进行clean,分组是按照LogSegment来分的
 
info("Cleaning log %s (cleaning prior to %s, discarding tombstones priorto %s)...".format(log.name,new Date(cleanableHorizionMs),new Date(deleteHorizonMs)))
  for (group <- groupSegmentsBySize(log.logSegments(0,endOffset), log.config.segmentSize,log.config.maxIndexSize))
    cleanSegments(log, group,offsetMap, deleteHorizonMs)

  // recordbuffer utilization
 
stats.bufferUtilization= offsetMap.utilization
 
  stats
.allDone()

  endOffset
}

 

buildOffsetMap:将消息的key和offset添加到OffsetMap中,并返回full是否已经填满的状态

private[log]def buildOffsetMap(log:Log, start: Long,end: Long, map:OffsetMap) {
  map.clear()
  // startend所有的LogSegment
 
val dirty= log.logSegments(start,end).toBuffer
  info
("Building offset map for log %s for %d segments in offset range [%d,%d).".format(log.name,dirty.size,start, end))
 
  // Add all thecleanable dirty segments. We must take at least map.slots * load_factor,
  // but we may be able to fit more (ifthere is lots of duplication in the dirty section of the log)
 
var full= false //判断offset是否被填满了
  //
循环遍历dirty集合,条件是offsetMap未被填满
 
for (segment <- dirty if !full) {
    // 检查LogCleanerManager记录的该分区的压缩状态
   
checkDone(log.topicAndPartition)
    // 处理单个LogSegment,将消息的keyoffset添加到OffsetMap,并返回full是否已经填满的状态
   
full = buildOffsetMapForSegment(log.topicAndPartition, segment,map, start,log.config.maxMessageSize)
    if (full)
      debug("Offset map is full, %d segments fully mapped, segment with baseoffset %d is partially mapped".format(dirty.indexOf(segment), segment.baseOffset))
  }
  info("Offset map for log %s complete.".format(log.name))
}

 

  private def buildOffsetMapForSegment(topicAndPartition:TopicAndPartition, segment: LogSegment, map: OffsetMap, start: Long, maxLogMessageSize: Int): Boolean = {
    var position = segment.index.lookup(start).position
   
val maxDesiredMapSize= (map.slots* this.dupBufferLoadFactor).toInt
    while (position< segment.log.sizeInBytes) {//遍历LogSegment
     
checkDone(topicAndPartition)// 检查压缩状态
     
readBuffer
.clear()
      // LogSegment中读取消息
     
val messages= new ByteBufferMessageSet(segment.log.readInto(readBuffer,position))
      throttler.maybeThrottle(messages.sizeInBytes)
      val startPosition= position
     
for (entry <- messages) {
        val message= entry.message
       
if (message.hasKey&& entry.offset >= start) {// 只处理有key的消息
         
if (map.size< maxDesiredMapSize)
            // key offset放入OffsetMap
           
map.put(message.key, entry.offset)
          else
            return true
// 表示填满了返回
       
}
        stats.indexMessagesRead(1)
      }
      position += messages.validBytes //移动position,准备下一次读取
     
stats.indexBytesRead(messages.validBytes)

      // if wedidn't read even one complete message, our read buffer may be too small
     
if(position== startPosition) // position没有移动,表是没有读取到一个完整的消息,对readbuffer进行扩容重新读取
       
growBuffers(maxLogMessageSize)
    }
    restoreBuffers()
    return false
 
}
}

 

cleanSegments:分组压缩

private[log]def cleanSegments(log:Log, segments:Seq[LogSegment],map: OffsetMap,deleteHorizonMs: Long) {
  // 创建一个'.cleaned'后缀的日志文件和Offset索引文件和Time索引文件
 
val logFile= new File(segments.head.log.file.getPath+ Log.CleanedFileSuffix)
  logFile.delete()
  val indexFile = new File(segments.head.index.file.getPath+ Log.CleanedFileSuffix)
  val timeIndexFile= new File(segments.head.timeIndex.file.getPath+ Log.CleanedFileSuffix)
  indexFile.delete()
  timeIndexFile.delete()
  // 创建对应的FileMessageSet对象 OffsetIndex对象 TimeIndex对象和LogSegment对象
 
val messages= new FileMessageSet(logFile,fileAlreadyExists = false, initFileSize = log.initFileSize(), preallocate = log.config.preallocate)
  val index = new OffsetIndex(indexFile,segments.head.baseOffset,segments.head.index.maxIndexSize)
  val timeIndex = new TimeIndex(timeIndexFile,segments.head.baseOffset,segments.head.timeIndex.maxIndexSize)
  val cleaned = new LogSegment(messages,index, timeIndex,segments.head.baseOffset,segments.head.indexIntervalBytes,log.config.randomSegmentJitter,time)

  try {
    // 遍历每一个组
   
for (old <- segments) {
      // 判定此LogSegment中的删除标记是否可以安全删除
     
val retainDeletes= old.lastModified > deleteHorizonMs
     
info("Cleaning segment %s in log %s (largest timestamp %s) into %s, %sdeletes."
         
.format(old.baseOffset,log.name,new Date(old.largestTimestamp),cleaned.baseOffset,if(retainDeletes)"retaining" else "discarding"))
      // 进行日志压缩操做
     
cleanInto(log.topicAndPartition, old,cleaned, map,retainDeletes, log.config.maxMessageSize)
    }

    // 截断多余的索引项
   
index.trimToValidSize()

    // LogSegment变成inactive segment,添加最大的即最后的time index entrytime index
    //
这个entry将会在决定是否删除segment的时候被使用
   
cleaned.onBecomeInactiveSegment()

    // trim timeindex
   
timeIndex.trimToValidSize()

    // 执行fluash操作,刷新到磁盘
   
cleaned.flush()

    // 更新最后修改时间
   
val modified= segments.last.lastModified
   
cleaned.lastModified= modified

   
//log.replaceSegments首先会将文件后缀'.cleaned'改为'.swap',b并将'.cleaned'对象加入跳跃表管理
    //
之后,将分组中LogSegmentsegments中删除,最后将文件的'.
swap'删除
   
info("Swapping in cleaned segment %d for segment(s) %s in log %s.".format(cleaned.baseOffset,segments.map(_.baseOffset).mkString(","),log.name))
    log.replaceSegments(cleaned,segments)
  } catch {
    case e:LogCleaningAbortedException =>
      cleaned.delete()
      throw e
 
}
}

 

private[log]def cleanInto(topicAndPartition:TopicAndPartition, source: LogSegment, dest: LogSegment,
    map: OffsetMap,retainDeletes: Boolean, maxLogMessageSize: Int) {
  var position = 0
 
while (position< source.log.sizeInBytes) {//遍历待压缩的LogSegment
   
checkDone(topicAndPartition)// 检测压缩状态
    // read a chunk of messages and copyany that are to be retained to the write buffer to be written out
   
readBuffer
.clear()
    writeBuffer.clear()
    var maxTimestamp= Message.NoTimestamp
   
var offsetOfMaxTimestamp= -1L
   
// 读取消息
   
val messages= new ByteBufferMessageSet(source.log.readInto(readBuffer,position))
    throttler.maybeThrottle(messages.sizeInBytes)

    var messagesRead= 0
   
// 遍历每一个消息,检测他是否应该被保留
   
for (shallowMessageAndOffset <-messages.shallowIterator) {
      val shallowMessage= shallowMessageAndOffset.message
     
val shallowOffset= shallowMessageAndOffset.offset
     
val size= MessageSet.entrySize(shallowMessageAndOffset.message)
      /*
       * shouldRetainMessage:
判断保留的依据
       * 1 Offset
是否包含相同keyoffset更大的消息
       * 2
此消息是'删除标记'的且这个LogSegment中的删除标记可以安全删除
       */
     
stats.readMessage(size)
      if (shallowMessage.compressionCodec== NoCompressionCodec) {
        if (shouldRetainMessage(source,map, retainDeletes, shallowMessageAndOffset)) {
          // 将需要保留的信息写入writebuffer
         
ByteBufferMessageSet.writeMessage(writeBuffer,shallowMessage, shallowOffset)
          stats.recopyMessage(size)
          if (shallowMessage.timestamp> maxTimestamp) {
            maxTimestamp = shallowMessage.timestamp
           
offsetOfMaxTimestamp= shallowOffset
         
}
        }
        messagesRead += 1
     
} else {
        /*
         *
对于压缩消息,使用深层迭代器迭代内部消息,逻辑与非压缩类型相似
         *
如果整个外层消息都需要保留,则不需要重新压缩,直接将整个外层消息写入WriteBuffer
         *
如果一个外层消息中,只有部分内层消息需要保留,需要对保留的内层重新压缩后写入WriteBuffer
         */
       
var writeOriginalMessageSet= true
        val
retainedMessages= new mutable.ArrayBuffer[MessageAndOffset]
        val shallowMagic= shallowMessage.magic

       
for (deepMessageAndOffset <- ByteBufferMessageSet.deepIterator(shallowMessageAndOffset)) {
          messagesRead += 1
         
if (shouldRetainMessage(source,map, retainDeletes, deepMessageAndOffset)) {
            // Check forlog corruption due to KAFKA-4298. If we find it, make sure that we overwrite
            // the corrupted entry withcorrect data.
           
if (shallowMagic!= deepMessageAndOffset.message.magic)
              writeOriginalMessageSet= false

           
retainedMessages+=deepMessageAndOffset
            // We need themax timestamp and message offset for time index
           
if (deepMessageAndOffset.message.timestamp> maxTimestamp) {
              maxTimestamp = deepMessageAndOffset.message.timestamp
             
offsetOfMaxTimestamp= deepMessageAndOffset.offset
           
}
          } else {
            writeOriginalMessageSet= false
         
}
        }
        // There areno messages compacted out and no message format conversion, write the originalmessage set back
       
if (writeOriginalMessageSet)
          ByteBufferMessageSet.writeMessage(writeBuffer,shallowMessage, shallowOffset)
        else
         
compressMessages(writeBuffer,shallowMessage.compressionCodec,retainedMessages)
      }
    }

    position += messages.validBytes
   
// 如果有需要保留的消息则将其追加导压缩后的LogSegment
   
if (writeBuffer.position> 0) {
      writeBuffer.flip()
      val retained= new ByteBufferMessageSet(writeBuffer)
      dest.append(firstOffset= retained.head.offset,largestTimestamp = maxTimestamp,
        offsetOfLargestTimestamp= offsetOfMaxTimestamp, messages = retained)
      throttler.maybeThrottle(writeBuffer.limit)
    }
   
    // 未读到一个完整的消息,表示readbuffer过下,需要扩容
   
if (readBuffer.limit> 0 && messagesRead== 0)
      growBuffers(maxLogMessageSize)
  }
  restoreBuffers() //重置readbufferwritebuffer
}

 


原创粉丝点击