SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

FORGOT YOUR PASSWORD?

FORGOT YOUR DETAILS?

AAH, WAIT, I REMEMBER NOW!

    PaloProPaloPro

    • Services
      • Analytics & Insights
      • PaloPro Platform
      • Reporting
      • Content Distribution API
    • FAQs
    • Blog
    • Contact
    • Ελληνικά
    DEMO LOGIN logo ESPA
    • Home
    • Blog
    • Technology
    • Data Processing
     

    Data Processing

    Data Processing

    by Palo Analytics Team / Monday, 02 October 2017 / Published in Technology
    Share
    Tweet
    Share
    0 Shares

    Here, at Palo, data is the most vital asset we have. Data goes hand-in-hand with the processing pipelines the are in place, to serve our mission, which is to offer quality data services related to news and data analytics.

    This is the first article in the series of our Technology articles. The aim of this series is to share the knowledge we have gained throughout the years of dealing with data related problems. More articles will follow, so stay tuned because as we progress we will delve deeper into technical issues and will go from general problem outlining to technical solution propositions.

    A buzz term has made its appearance lately to outshine the term “Big Data“. Buzz terms are very frequent in the technology and most of them have a short life span. This does not seem to be the case for “Fast Data” because it seems to answer to a real need: retrieving, processing, storing and serving huge amounts of data is not enough. This needs to be done fast, reliably and in many cases in a continuous manner.

    Let’s break a data processing pipeline to it’s most important parts.

    First and foremost is the ingestion phase. The data must come into the system somehow and this is the responsibility of the ingestion components. The sources, from which data is retrieved, can be numerous and can range from social media APIs, web pages via crawling, integration with third party data stores and services etc. The job of this component is to bring the data in and possibly apply a pre-processing step before making it available to the next phase.

    The breakthrough towards the “Fast Data” era comes, though, mainly from advances in regard to the next phase of the pipeline: the processing phase. Although this article is not meant to talk about specific technologies, Apache Spark cannot go unmentioned, because it is one of the key elements that keep pushing to a paradigm shift. Before Spark, Hadoop was dominant in the processing phase. One thing that Hadoop does not do well is speed and this is due to its batch-oriented nature. Batch jobs usually take a long time to complete and this can result in delays in response, of the whole system. Spark introduced a mini batch processing approach and helped alleviate this bottleneck, by providing the means to create a stream-like processing experience.

    Last but not least is the actual data storage and of course the accompanying components, services and orchestration processes of the whole pipeline. While storage has its own caveats, orchestration and services are vital and deserve their own article. The problems to solve are many and these are some keywords in random order to help with our appetite: temporal decoupling, scaling out, resiliency, messaging, actor model, reactive applications, backpressure… and the list goes on…

    Share
    Tweet
    Share
    0 Shares
    Avatar

    About Palo Analytics Team

    We search, monitor and analyse sentiment for all news, posts, discussions and videos of the Web and Social Media, in real time. PaloPro is a simple, friendly and useful information and analysis tool.

    What you can read next

    Social Media Intelligence with Artificial Intelligence: Paloservices at BEYOND
    90% sentiment analysis
    Sentiment Analysis: how we achieve 90% accuracy.
    mwc18
    Palo Services at Mobile World Congress 2018, in Barcelona
    PaloPro

    The most effective Online & Social Listening Tool

    We Search,  Monitor and Analyse in real time sentiment for all news, posts, discussions and videos of the Web and Social Media into a simple, friendly and useful information and analysis tool.

    Learn More

    Continue Reading

    • Social Media Intelligence with Artificial Intelligence: Paloservices at BEYOND

    • Digital trends and brand strategy

    • The latest trends in the greek online ecosystem for June

    • The latest trends in the greek online ecosystem for May

    • Social data trends 2021

    TOP Skip to content
    Open toolbar

    Accessibility Tools

    • Increase Text
    • Decrease Text
    • Grayscale
    • High Contrast
    • Negative Contrast
    • Light Background
    • Links Underline
    • Readable Font
    • Reset