Class CreateIndexJson

java.lang.Object
All Implemented Interfaces:
java.lang.Runnable, java.util.concurrent.Callable<java.lang.Object>

public class CreateIndexJson
extends CollecTorMain
Create an index file called index.json containing metadata of all files in the indexed/ directory and update the htdocs/ directory to contain all files to be served via the web server.

File metadata includes:

  • Path for downloading this file from the web server.
  • Size of the file in bytes.
  • Timestamp when the file was last modified.
  • Descriptor types as found in @type annotations of contained descriptors.
  • Earliest and latest publication timestamp of contained descriptors.
  • SHA-256 digest of the file.

This class maintains its own working directory htdocs/ with subdirectories like htdocs/archive/ or htdocs/recent/ and another subdirectory htdocs/index/. The first two subdirectories contain (hard) links created and deleted by this class, the third subdirectory contains the index.json file in uncompressed and compressed forms.

The main reason for having the htdocs/ directory is that indexing a large descriptor file can be time consuming. New or updated files in indexed/ first need to be indexed before their metadata can be included in index.json. Another reason is that files removed from indexed/ shall still be available for download for a limited period of time after disappearing from index.json.

The reason for creating (hard) links in htdocs/, rather than copies, is that links do not consume additional disk space. All directories must be located on the same file system. Storing symbolic links in htdocs/ would not have worked with replaced or deleted files in the original directories. Symbolic links in original directories are allowed as long as they target to the same file system.

This class does not write, modify, or delete any files in the indexed/ directory. At the same time it does not expect any other classes to write, modify, or delete contents in the htdocs/ directory.

  • Field Summary

    Fields inherited from class org.torproject.metrics.collector.cron.CollecTorMain

    config, mapPathDescriptors, SOURCES

    Fields inherited from class org.torproject.metrics.collector.sync.SyncManager

    SYNCORIGINS
  • Constructor Summary

    Constructors 
    Constructor Description
    CreateIndexJson​(Configuration configuration)
    Initialize this class with the given configuration.
  • Method Summary

    Modifier and Type Method Description
    protected org.torproject.metrics.collector.indexer.IndexerTask createIndexerTask​(java.nio.file.Path fileToIndex)
    Create an indexer task for indexing the given file.
    java.lang.String module()
    Returns the module name for logging purposes.
    protected java.lang.String obtainBuildRevision()
    Obtain and return the build revision string that was generated during the build process with git rev-parse --short HEAD and written to collector.buildrevision.properties, or return null if the build revision string cannot be obtained.
    void startProcessing()
    Run the indexer by (1) adding new files from indexed/ to the index, (2) adding old files from htdocs/ for which only links exist to the index, (3) scheduling new tasks and updating links in htdocs/ to reflect what's contained in the in-memory index, and (4) writing new uncompressed and compressed index.json files to disk.
    protected void startProcessing​(java.time.Instant now)
    Helper method to startProcessing() that accepts the current execution time and which is used by tests.
    protected java.lang.String syncMarker()
    Returns property prefix/infix/postfix for Sync related properties.

    Methods inherited from class org.torproject.metrics.collector.cron.CollecTorMain

    call, checkAvailableSpace, readProcessedFiles, run, syncMapPathsDescriptors, writeProcessedFiles

    Methods inherited from class org.torproject.metrics.collector.sync.SyncManager

    merge

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • CreateIndexJson

      public CreateIndexJson​(Configuration configuration)
      Initialize this class with the given configuration.
      Parameters:
      configuration - Configuration values.
  • Method Details

    • module

      public java.lang.String module()
      Description copied from class: CollecTorMain
      Returns the module name for logging purposes.
      Specified by:
      module in class CollecTorMain
    • syncMarker

      protected java.lang.String syncMarker()
      Description copied from class: CollecTorMain
      Returns property prefix/infix/postfix for Sync related properties.
      Specified by:
      syncMarker in class CollecTorMain
    • startProcessing

      public void startProcessing()
      Run the indexer by (1) adding new files from indexed/ to the index, (2) adding old files from htdocs/ for which only links exist to the index, (3) scheduling new tasks and updating links in htdocs/ to reflect what's contained in the in-memory index, and (4) writing new uncompressed and compressed index.json files to disk.
      Specified by:
      startProcessing in class CollecTorMain
    • startProcessing

      protected void startProcessing​(java.time.Instant now)
      Helper method to startProcessing() that accepts the current execution time and which is used by tests.
      Parameters:
      now - Current execution time.
    • obtainBuildRevision

      protected java.lang.String obtainBuildRevision()
      Obtain and return the build revision string that was generated during the build process with git rev-parse --short HEAD and written to collector.buildrevision.properties, or return null if the build revision string cannot be obtained.
      Returns:
      Build revision string.
    • createIndexerTask

      protected org.torproject.metrics.collector.indexer.IndexerTask createIndexerTask​(java.nio.file.Path fileToIndex)
      Create an indexer task for indexing the given file.

      The reason why this is a separate method is that it can be overriden by tests that don't actually want to index files but instead provide their own index results.

      Parameters:
      fileToIndex - File to index.
      Returns:
      Indexer task.