Skip to:
Content
Pages
Categories
Search
Top
Bottom

Activity DB Design Discussion

  • @apeatling

    Keymaster

    This thread has been created from this thread since the topic of discussion changed from syncing content to database and component architecture.


    First of all, let’s clear a few things up:

    1. The activity table is not a cache table, so the preconceived notions of what “cache” means don’t apply when considering solutions. It may contain data that has been pre formatted, but it still contains all the pointers back to the original content. It is quite possible to reconstruct all of the data from the information stored in the activity table. Remember “format_activity” functions?

    The table is named “_cached” because this is a leave over from how the component used to work. I have not found a decent reliable way to change the name of the table that will not break a percentage of installations. This leads me onto number 2.

    2. What people have talked about so far is the way the activity component used to work. It used to just be a table containing only pointers back to the original content. The stream would be re-built every four hours. The result would be populated in the “_cache” table which is a small part of what you see today. This would then be used to render the stream.

    In reality this was just too expensive. Of course we can normalize the database until the cows come home, but in reality this isn’t going to cut it on a low end server. What I don’t want is for BuddyPress to be limited to high end servers with beautiful caching solutions. In reality, I want this thing to run on cheap hosts where someone can just throw it up there and it doesn’t take the server down.

    Now, none of that is an excuse for poor database design – and I’m not saying BuddyPress has poor database design – but I’ll be the first to admit it could be better. Rome wasn’t built in a day, and I have no doubt that the schema is going to improve significantly over the next several versions and your help as a developer is appreciated. My philosophy is release early, often and iterate. No one is going to build a perfect solution straight off the bat, let’s get it out there and get people using it.

    What we don’t want to do is start talking and discussing changes that fit the way things work right now. Here’s something to consider. I think the activity stream is going to be “the” central point to everything in BuddyPress. As an example, when the photos/album/gallery component is built, images will just be uploaded as an activity stream entry and appear on the activity stream. Any comments on the photo/photos will be activity stream comments. When you look at the permalink for that photo, it will just be an activity permalink with the comments showing.

    What I’m saying is *everything* is an activity, regardless of what the content actually is. Rather than thinking of the activity stream as just a place where everything is aggregated, the activity stream will be the place where everything actually happens/is stored. The activity stream has almost everything you need now, powerful filtering, threaded commenting, permalinks and the ability to delete (and soon edit).

    Separate interfaces could be built to perhaps display uploaded images in a different fashion, or private messages in a typical email style. However, in reality all of these are still activities.

    You could compare this to the posts table in WordPress which is the heart of the software. Everything on the database level is considered a post, but on all levels above the database the content is defined by the “post_type”. Rather than creating multiple new tables for each content type, why not just use the activity table but be able to denote different types of content using identifiers or types?

    These are all just ideas and nothing is anywhere near set in stone, however this all something to consider when talking about changes to the schema or debating how it works.

    Let’s start some positive discussions about suggestions for small iterative improvements. But — I don’t want to hear about how WP does it wrong. At the end of the day WP is wildly successful and runs on most servers, fulfilling most people’s needs without a problem. Let’s keep it constructive.

Viewing 22 replies - 1 through 22 (of 22 total)
  • @erich73

    Participant

    many thanks for clearing this up !

    Just go ahead with it and move on !

    The other professional programmers who are complaining could still work on their ideas and suggestions and show their finished working examples to you once they have got it running. Only then you decide whether you gonna take it, change it or leave it.

    Thanks again for all your efforts and moving the idea of BP ahead !

    @21cdb

    Participant

    Rather than thinking of the activity stream as just a place where everything is aggregated, the activity stream will be the place where everything actually happens/is stored.

    I really like this idea. It will encourage users to be more active. It would be great if developers could hook into the “What’s new John Doe?” box and add their own interface. For example MrMaz “Links” component could add a Link Button to the box where users could directly post a “Link”. I think facebook is a good example for this.

    @mrmaz

    Participant

    @andy

    Not sure if you took my post on the other thread the wrong way, but it was meant to bring attention to the real problem, not to flame. Hopefully, I am not being labeled a troll here.

    I am happy that you recognize the real problem, though still disagree with you on the data normalization issue. I think you were headed in the right direction with the format_activity() way of doing things. It would be prudent to take another look at this and see if we can put our heads together to solve any of the problems you were having that led you to think about abandoning this.

    I think it is a mistake to defend WP way of handling things because its the way it has always been, or because its popular. BP may be a WP plugin, but it has its own set of complex problems that need to be solved, and more advanced solutions need to be designed. There are many ways to refactor code using wrapper classes, etc, and maintain backwards compatibility.

    My two cents on the activity stream is that it should be designed as a registry of component activities. It must have a clearly defined class interface that must be adhered to. Any time the activity stream is called/viewed, the registry calls template methods (which are defined by the class interface) for each record found. This is very similar to how the format_activity() idea works, except all of the components are currently hard coded into the activity component, which is not very extensible.

    I don’t think there is a way around caching/denormalizing to solve the performance issue. The options are to keep the data normalized, and call it on demand, with the option to cache, OR to pre-format it the way you like, which is really just denormalized data but not in the traditional sense. In any case, the original data needs to be normalized, and any cached or denormalized data can’t be treated as a stand alone “object.” Its only there to speed up performance.

    @toregus

    Participant

    “Rather than thinking of the activity stream as just a place where everything is aggregated, the activity stream will be the place where everything actually happens/is stored.”

    Praise the lord! (sorry for the evangelic speak but I think this is a major positive change)

    “small iterative improvements”: Regarding the forums

    I’ve been debating to stop viewing the forums the way we’ve been. Established BP-websites using it will be less interested in this. I’m still starting out and would like to change my website to 1.2-style and ahead.

    Currently (1.1) the forums are available when walking to groups and then the forums. You have to click a lot to get to the posts.

    The new theme (1.2) is trying to place the activity stream as the place to _do_ it all. I’ve been using testBP.org like crazy because it’s like a mini Facebook/FriendFeed. So I like it.

    So what happens to the forums? Why will anyone even bother to go all the way down to groups/forums and post something in the forum (or even at the website.com/forum/-page) with 1.2? Buddypress, out of the box, isn’t using some kind of TinyMCE or else that would allow you to enter things in a different way (most forums have some kind of quoting system). It’s just a box to write in. But still it’s a categorised box and replies to the box is shown more egalitarian with the same size (comments to posts in the stream aren’t).

    My guess is that communities with very active forums will still be used. But newcome BP-installations will not use them that much unless they’re really big communities (+5000).

    This does change the way that we interact and talk. Nothing in the activity stream (1.2) will be important enough to be put at the top of the list. It’s just the stream with the latest new activity at the top. This is another change away from the old forum style. In a regular forum, the latest post with a new comment will trigger the topic to move to the top of the forum topic list (the old activity stream).This is where BP will be very different. There’s no retriggering of topics to the top if you aren’t using the forums the old way (which people will probably do less and less).

    Facebook is nice but you loose the use of forums since they aren’t presented in the stream. The stream rules… This way BP will be focused on social relationships that mainly are filled with activity updates consisting of bits and fragments instead of intense debate of old topics. Why? Because there’s no retriggering of topics to the top.

    Am I alone to prophecy the death of the forum? …and the need of some kind of solution to this merge of everything into the status update, which I also aplaud…

    @wpmuguru

    Participant

    @Andy – A DB suggestion:

    Make a register component api so that all components that are going to record in the SWA have to register. The registration process should occur at plugin activation and the component name strings are stored in a component table.

    Once that is in place then you can eliminate the varchar component fields and replace with an int(11). This would reduce space in the table and improve lookup speed significantly when filtering.

    @mrmaz

    Participant

    @Ron

    YES! I am glad someone agrees with me that it should be a registry. It could start out as a very basic class and grow over time.

    The monkey wrench is the idea that Andy threw out about activity being the hub and components the spokes. Right now the activity stream is a service that components use, but with his idea this role is (sort of) reversed. So it will be tough to come up with a final solution until this part is ironed out. I am not agreeing or disagreeing with the idea, but its critical to decide on it first.

    With Andy’s idea, a registry pattern might still work well, but it will depend on exactly how much of the other component’s functionality is handled by the activity component. Will it be passed data, or will it be passed callback functions, or will it call template methods etc.

    Personally, I don’t think the activity stream should have any idea what it is displaying. It should only be passing item_ids to template methods or callbacks and expecting that data to be formatted in a very specific way (a combination of Registry and Template Method patterns).

    @apeatling

    Keymaster

    Not sure if you took my post on the other thread the wrong way, but it was meant to bring attention to the real problem, not to flame. Hopefully, I am not being labeled a troll here.

    No no, I wasn’t taking anything the wrong way. I simply didn’t want to start a thread that encouraged flaming. The previous posts were constructive.

    My reason for thinking the stream could become the hub was because it solves a lot of things that other components will want. Many components will want commenting, permalinks, edit/delete and the filtering the activity stream provides. It seems stupid to reproduce all these features for every component.

    @erich73

    Participant

    @Tore,

    if the page “testbp.org/forums” could be re-designed towards something like the following website, then I think the Group-Forums would get more action and discussion ongoing.

    http://vancouver.en.craigslist.org/forums/?forumID=81

    Just think about it: you can make a Group-Forum-post directly at the page “testbp.org/forums” without the need of many clicks to go deep into the specific Group-Forum.

    @jeffsayre

    Participant

    Okay, as I was thinking about and writing a reply to Andy’s OP, there have been many posts. So, I will post what I have to say first before reading the rest of this thread. I’ll post again if I have anything to say with regards to other people’s comments


    First of all, I want to say thank you for starting this thread. This type of discussion is not something that can been done effectively on IRC. Secondly, it’s clear that the trajectory of the referring thread has struck a nerve in you.

    My comments were an attempt to start a healthy debate and not meant as a major criticism. Your work on the BuddyPress core is very commendable and much appreciated. So, let me reiterate, as I stated in my first post in that thread, I’m excited with the direction BuddyPress is headed. I believe that v1.2 will be a major, beneficial update to the platform.

    The vision that you’ve outlined here helps frame the discussion going forward. Up until now, we, or at least I, did not have a clear idea about your vision. The roadmap is of course only a listing of features. It is not a statement of your design philosophy or design goals. All that the majority of us could discern about your vision (and here I particularly mean developers) was from what we saw in each changeset and a few morsels gleamed from IRC here and there.

    With that said, I think this is the most crucial statement:

    What I don’t want is for BuddyPress to be limited to high end servers with beautiful caching solutions. In reality, I want this thing to run on cheap hosts where someone can just throw it up there and it doesn’t take the server down.

    This is a lofty but worthy goal! In my mind, I look at WPMU and BuddyPress as platforms best utilized on more robust setups. To that end, my ideas, my suggested approaches, have been geared toward a higher-end user. But now that this particular goal of yours has been stated, I have a very clear idea of what guides your BuddyPress design decisions.

    Your ideas about the activity stream are interesting.

    Rather than creating multiple new tables for each content type, why not just use the activity table but be able to denote different types of content using identifiers or types?

    This will require some thought if blog posts, forum posts, and other content types are to be successfully integrated into this vision. I guess for purposes of backend discussion, it might be best to refer to this envisioned table, or set of related tables, as content tables instead of activity tables.

    Activity of any type would be recorded in the appropriate content table and thought of as primary content and secondary content. Primary content is the object of the activity—a post (in a blog or forum), a picture (any piece of media), an accepted friendship, etcetera. Secondary content would be any response to the primary content—comments, retweets, etcetera.

    As far as proposed schema changes, that all depends on how this vision is charted out. So, perhaps it’s best to figure that out before debating, or even professionally arguing, about DB design. ;)

    @mrmaz

    Participant

    @Andy

    Cool man, I don’t want anyone to think I only just complain, hehe. I have never been good at sugar coating stuff, and my wife gives me hell all the time about it. Btw, I have a 2 y/o and a pregnant wife, so I reserve the right to be snarky, lol.

    Check out my registry for auto-embedding rich media services. I think it could be adapted to work for activity streams, although I guess it would have to be de-PHP5’d :(

    Registry & Service Template Methods

    https://plugins.trac.wordpress.org/browser/buddypress-links/tags/0.2-RC1/bp-links-embed-classes.php

    Services

    https://plugins.trac.wordpress.org/browser/buddypress-links/tags/0.2-RC1/bp-links-embed-services.php

    To solve the performance problem, each “service” would need to define a method that denormalizes itself for storage in the table that is queried (it gets hammered). This is very similar to format_activity(), except the abstract activity registry class would define strict rules for the format of data that is returned, then format the output itself from this “cached” denormalized data. We are back to square one with the need to refresh the denormalized data periodically, but I can think of a few ways to lessen the hit. One would be to have a timestamp (or other freshness indicator) column so only records that have changed are refreshed. It would be ideal to refresh the data with a pure SQL query, but then the components would have to provide the SQL to do it, and that would get very messy.

    In the end you have an activity stream class interface that basically says… you want to record activity? Ok, extend me and define these methods. You have to return data in exactly this format. Add yourself to my registry, and if you followed directions, I will do all of the heavy lifting for you. Otherwise, get lost.

    LMK if any of this sounds good to you, or if we are way off of the same page.

    @djpaul

    Keymaster

    My visualisation of this discussion is to forget the concept of what A. Streams were, and instead envisage the new SWA as a kind of super-Wire; with threaded commenting, permalinks and supporting more media types than just text, whilst retaining the ability to be able to use it anywhere (you were able to use the Wire in custom components easily).

    Is that where the discussion is up to, and moving on to discuss how to map forum posts and blog comments into this?

    @mikepratt

    Participant

    WHile I enjoy and learn quite a bit by reading the db design debate I cannot lend the insight that MrMaz et all do.

    To @Tore’s point: As much as everyone will love doing all their activity from the activity stream, keep several things in mind. As Groups grow and become enhanced by plugins, users will find themselves also hanging out and revisiting those Groups due to the topical relevance to them. The My Groups and My friends filters on the SWA will also be huge hits but I think it’s a pretty safe bet that people will ask the question “What’s going on in Group X” and I think that highlights the need to make an action on the main activity stream eg the SWa remain a par of it’s originating Group. Indeed, such a stream may kill off the /forums aggregation but certainly not Groups themselves. This also lends credence to the much needed blog-post-like capability to be added to groups (purposely not said Blogs added… merely richer content capabilities that forum postings don’t provide for)

    Great thread going on here. Hats off to all

    @apeatling

    Keymaster

    In the end you have an activity stream class interface that basically says… you want to record activity? Ok, extend me and define these methods. You have to return data in exactly this format. Add yourself to my registry, and if you followed directions, I will do all of the heavy lifting for you. Otherwise, get lost.

    I like this idea a lot, it introduces set structure which is important. It has to be easy to grasp though, I don’t want to increase the barrier for entry.

    Once 1.2 is completed, let’s take a look at this is greater detail and come up with a solid activity stream API that we can move forward with.

    @johnjamesjacoby

    Keymaster

    @ron/@maz@andy, I think that’s the way to go, and not unlike what already happens in terms of how activity gets in there.

    I do like the idea of private messages going in there too… Huh.

    Is it possible Andy that you just made one component to eliminate all the others? haha! I mean really if components can register themselves, then there’s no need for an activitymeta table, since the very act of interacting with an activity stream is in itself an activity. Think of the traditional facebook like/dislike setup. When I “dislike” a comment, that creates an activity that I disliked it, which is attached to the activity item I disliked.

    It’s rather genius in a way. Huh again…

    @mrmaz

    Participant

    @jjj

    I don’t think it will necessarily replace the other components, but what it will do is handle all of the generic functionality that is now handled by the components themselves. Since each component might handle it a little bit differently, it is difficult to get them all to play nice with each other.

    What is being proposed is that all of the complexities of the inner guts of buddypress start to move more towards the heart of the app, so new components and plugins don’t have to worry about implementing so many redundant functions. Recording activity and updating meta data are two obvious ones, but avatar handling is another. Even though there are core functions to handle most of this, you still have to create a bunch of wrapper functions in order to support filters and actions, etc. Just some basic polymorphism would elimiate thousands of lines of code. Imagine instead of copying pasting 30 functions, you just extend the class and define your slug in the slug() method, and then whenever you call a parent class method, all of them automagically store your data the right way for you.

    In the future if a new component comes along that has some cool functionality that all other components can benefit from, then this could be refactored “up” into the activity component, or whatever it ends up being called, so the new functionality could become instantly available to everything else.

    @johnjamesjacoby

    Keymaster

    I’d ultimately love to see things go in that direction, and have been sifting through code the past few days thinking of how to get more red in the trac than green.

    However, if this was the case, and activity had a component scope and a serialized array of return values, then you could replace private messaging with a threaded activity stream that only the users involved in that thread can view. You could send activity to multiple groups at a time, or only 1, or all of your groups, or the entire site, or any other registered components serialized set of values that mean whatever they mean.

    You can almost replace the notifications class all together too, because if you take a count of the number of activities a user has directed at them and store it in usermeta, when the user returns if that number is higher, there’s your total notifications. Filter those new activities and now you can see how many of each. Actually, the code already exists within both WP and BP to do this.

    In that regard, the activity stream does start to replace core communication components like forums, private messages, status updates, notifications, and the wire. At that same time, BuddyPress becomes less about separate components with their own API’s and classes and subsets of functions, but more about creating new methods for users to interact with each other in specific scopes of communication.

    You could take things like twitter lists and make them work both ways, where you could send a message/comment/update/picture to only specific users instead of only viewing tweets from users. You could create an endless array of ways and names for collaborative tools to develop the platform from within the platform. It’s basically the blog post_types concept attached to people’s activity instead of re-categorizing blog posts.

    The way to further develop groups would need a whole new topic, but I’ve got some ideas for that too. :)

    @mrmaz, what you propose is the fundamentals of how I’ve approached my development before finding WP; start with small basic classes and work my up in functionality. There very easily could be a basic “bp_component” class where every new component just extends off of that, and sets the needed vars and assigns the needed functions accordingly.

    @erich73

    Participant

    sorry to interrupt this very interesting discussion (I anyway only understand 1% of what has been said, as I am not a programmer).

    Just a question:

    So if I start to develop my website based on BP-version 1.2 and set up a few hundred Groups, then you might go ahead with implementing above mentioned “new things” in the future BP-version e.g. BP 1.3

    Will my website and the few hundred Groups I have set-up (with BP 1.2), will this still be working with the stuff mentioned above or will I run into any issues ?

    IF you are going to implement above mentioned things, would it be better to wait for this or is it rather safe to start with a website based on BP-version 1.2 ?

    @mrmaz

    Participant

    @Erich73

    What is being discussed is all just talk right now. No matter what happens I am sure there will be backwards compatibility. I wouldn’t hold off on anything.

    @apeatling

    Keymaster

    We should really keep developer discussions such as this on trac, otherwise we will worry non-developers. None of the things we have talked about are certain, and there will always be some level of backwards compatibility.

    @jeffsayre

    Participant

    Andy-

    I agree. This discussion should be held in a developers area, away from the regular forum. Unfortunately, Trac and IRC are not great places to hold in-depth conversations, nor do either provide a mechanism with which to do so effectively.

    In the past, there was a discussion about creating a developers’ resource section on BP.org–a place that would not only contain a library of code snippets, but provide an arena for the type of discussion that we’re having in this thread.

    That discussion happened 8 months ago and lost steam. Read the entire thread as it starts out as a discussion about snippets and evolves from there:

    https://buddypress.org/forums/topic/buddypressorg-needs-a-common-place-to-share-code-snippets#post-11852

    @wpmuguru

    Participant

    I haven’t thought through how practical or difficult it would be to support the existing component structure. But, what if there was a base component class that all components extended (the way widgets extend the base widget).

    In the child class you would need a registration process providing the component identifiers & possibly the URI to catch, activity logging, settings maintenance handler, a display handler, etc.

    The end result would be that BuddyPress would become an activity based framework that the features plugged into.

    @johnjamesjacoby

    Keymaster

    Check trac ticket #1479 for continued discussion on having a base component.

Viewing 22 replies - 1 through 22 (of 22 total)
  • The topic ‘Activity DB Design Discussion’ is closed to new replies.
Skip to toolbar